Hi All,
I have done two viral runs on the same sample with a total of 490224 reads. Since these are viral and extracted from cell culture I know they are contaminated with the host (chicken). So I mapped to the host and pulled all the reads from the sam file with 0x4 flag.
At first I had the current version 4 gallus gallus genome, 18s rRNA, and 28s rRNA fasta's as separate files and then mapped and extracted the unmapped reads from them each in series. Which left me with 159396 reads that didn't match to those files.
Then I thought, why do that three times, concatenated the files into "chicken_genome.fasta" indexed it and re ran the host removal but this time I got 172728 reads leftover.
I do not suppose anyone could help me figure out why I got an extra 13,332 reads the second time around?
Hi Brian,
I used TMAP, the version is 3.4.1, which I believe is the current version or pretty close to it. It looks like Ion has not really updated TMAP itself but they moved it into an Analysis package. I'm also using the parameters for both mappings.
The reason I added those two files to the genome is because before when I would do the host removal, then the viral map I would end up still only using ~35% of the reads. So I de novo assembled the extra and got avian rRNA (chicken, kiwi, some times a salamander) so it made me think that it was not in the draft assembly of the genome. So then I took the chicken blast hits and mapped to them and then redid the viral mapping and increased to 95% of the reads mapping to the target.
That is the reason I added them to the draft genome assembly. I will get BBmap and give it a try as well.