Hi all, hope you can help.
I have a multifasta file containing genomes of 730something procaryotic genomes (5-6Mb); all contigs/chromosomes are named in headers (as so)
>CP009335.1 Bacillus thuringiensis strain HD1011, complete genome
TCCTGATGGAACTTTAATTGATGAAAAGAGTCGTGTAAACTTTTTCCATCTTTCAACCCATCAATCATGC
GCTGCAATTGTACTTTCTTTTCTAAAGGTAATTGAAACCGTAAAAATTCTAATGCCTGCAAAAGGGAGTA
TCCTTTTTCTAATAGTTCTCCTAATCGTTTCAGTAATATGACTTGATCACTTAAACTCCATATTTCCTTA
AACATAAACATCTTCTTCTAAAAACCCTAAAGCGTATCCTTTTCGTATCGAAGATTGTAATGTTTCGTGC
TTGTATGTGACACATTCCCCGTTTGCTTCTTTAATCGCTTGTTTTAACTCATATCCATATAACAACTCAT
AAATACTCGCTTGCCTTACTTGCCTCATTGATTT
I have a bedfile created orginally in excel, saved as tabdelimited, converted using dos2unix and contains one line for each genome, like so:
CP009335.1 1984592 1992438 CP009335.1_genome.tsv B_thuringesis
CP009720.1 3944559 3952406 CP009720.1_genome.tsv B_thuringesis
ABDL02000007.1 228801 234535 GCA_000171035.2_ASM17103v2_genomic.tsv B_cereus
CP026376.1 1520664 1528500 GCA_002952815.1_ASM295281v1_genomic.tsv B_cereus
NZ_CM000714.1 757305 765101 GCF_000003645.1_ASM364v1_genomic.tsv B_cereus
I ran the following codes on the bed file to make sure it was tab delimited:
awk '{ for(i=1;i<=NF;i++){if(i==NF){printf("%s\n",$NF);}else {printf("%s\t",$i)}}}' "mybedfile"
And when I manually check it, it looks ok.
When I run bedtools getfasta, I only get results from my top row. No error messages or nothing, just one result. I tried to copy paste some rows from the long bed file to a new file, and then it worked after manually editing in tabs and such. But I am hoping to avoid that. So, can anybody help med make bedtools read the whole bed file, and if there is something wrong with the bedfile (which seems to be the issue) how can I make it good and tab delimited?
THANK YOU
Hello ann-katrin.llarena and welcome to biostars.
That's a good proof that you original bed file isn't complete tab separated. You can convert als spaces to tabs using
sed
fin swimmer
Thank you for helping, but still it just gives out the hit for the first row in the bed file and ignores the remainder of the file. Eh...any other takes on the issue?
Could you please upload an extract of your bed file that doesn't work to somewhere, so we can take a closer look on that?
Also please show the exact bedtoools command you were using.
Thanks.
Did you prepare the file in Windows? If so, the line terminators might be wrong, and you can use
dos2unix
to fix your file.Thank you, I already did. It belongs to the story that I made in on mac - excel and in the file it says "converted from mac format"