Hello,
I'm trying to extract some sequences from a multifasta file (a genome) using the following command:
bedtools getfasta -fi T_aestivum_genomeA.fa -bed urartuAestivum_blocks_sort.bed12 -split -name -fo blocks_aestivumA.fa
I didn't get any kind of error from the program but, in the output multifasta file, for some sequences, there is only the header. I checked the bed12 file and I didn't find any anomaly in the rows corresponding to the missing sequences. I also manually checked the coordinates on the genome of some missing sequences and there wasn't anything strange (Ns or something). I got the correct output if I don't use the -split option but I don't want the entire sequence, so I think the problem is in the blocks.
Here is my how my bed12 file looks like:
7A 25225503 25225944 TCONS_00077526_aestivumA * * * * * 1 441, 25225503,
7A 35229975 35230420 TCONS_00076940_aestivumA * * * * * 1 445, 35229975,
7A 35501306 35501751 TCONS_00170589_aestivumA * * * * * 2 139,306, 35501306,35501445,
7A 131421239 131421684 TCONS_00107436_aestivumA * * * * * 2 281,88, 131421239,131421596,
7A 10711045 10711495 TCONS_00150021_aestivumA * * * * * 1 450, 10711045,
7A 167627488 167627939 TCONS_00024036_aestivumA * * * * * 1 451, 167627488,
7A 48932559 48933013 TCONS_00136773_aestivumA * * * * * 1 454, 48932559,
The forth line corresponds to one of the sequence I didn't get.
Anyone experienced a similar problem? Thank you!
Alice
Yes, it looks like the original poster is using absolute coordinates (the block start is equal to chrom start) - none of the lines are correct.
Thanks a lot to both of you, I tried to change the block start column putting the values relative to chromosomes coordinates (the first block always starts with 0) and it worked! I realized that also for the other lines, for which BedTools extracted a sequence, that sequence was actually wrong (because, as you said, the blocks starts were not relative to chrom start), so I don't understand how it managed to extract something. Anyway I will change the last column of every line as you said. Thanks again!