Entering edit mode
11.1 years ago
RossCampbell
▴
140
I'm trying to extract sequences from several fasta chromosome files using Bioperl's Bio::DB::Fasta module, which is working fine for all but one file. As expected, this module indexes each file first...for example 'chr1.fa' is indexed as 'chr1.fa.index'. However for one file (chr11.fa) it creates a file called '_db.chr11.fa.index', which is throwing off the rest of my pipeline and my output has no sequence from that chromosome. Is there anyone who's more familiar with this module than me and knows why this file is being created differently from the others?
Are all the FASTA IDs unique in your files? Did you accidently have >chr1 in your chr11.fa file? What does "grep '>' chr*.fa" say?
They are all unique. grep returned: chr10.fa:>chr10 chr11.fa:>chr11 chr12.fa:>chr12...and down through the list. They all are unique and match their filename.
Update: This issue was resolved when I installed Ubuntu updates this morning. I don't know what fixed, but there was apparently something behind the scenes that wasn't working right, but it's fine now.