Hi,
I have a fasta file from a species that just had a new reference genome published. I downloaded a version of this new reference genome from UCSC but a piece of downstream software that I use is giving me issues because there are uncharacterized chromosomes in the fasta file. I'm not sure why this is all of the sudden a problem because the old reference genome had uncharacterized chromosomes in the fasta file as well. So I know removing the uncharacterized chromosomes solves my problem but my test was a bit labor-intensive and I was hoping to find a more efficient way of solving this problem in the future. My solution consisted of using sed
to remove the header and the first line after the header. The reason for doing this was because I recognized that pattern. So for example, line 101 would be a header and then I removed line 102 because the next header would be at line 103. Here is an example of what some of the uncharacterized chromosome headers look like and some example of normal chromosome headers
>chrM
>chrUn_JAAHUQ010000408v1
>chrUn_JAAHUQ010000409v1
>chrUn_JAAHUQ010000410v1
>chrUn_JAAHUQ010000411v1
....
>chrUn_MU018702v1
>chrUn_MU018703v1
>chrUn_MU018704v1
>chrUn_MU018705v1
>chrUn_MU018706v1
....
>chrX
>chr1
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chr10
All the uncharacterized chromosomes have the same pattern of chrUn_JAAHUQ
or chrUn_MU
What I was wondering was if there was some way to use grep
or some other tool to remove these unwanted chromosomes an easier fashion than what I did
That almost worked perfectly. The
grep
commands worked great butsamtools
gave me trouble.Samtools
was giving me the following errorI looked at the output file and the only line in the file looked like
I just assumed there was an issue the greater than signed in front of the chromosomes so I tried running the following command and it ended up working fine
Ah sorry, stupid me, you have to indeed remove the leading
>
. Edited it, but you already sorted it out, +1.No way! You got me 99% of the way there. I only had to fix the easy part. Thank you for the help!