Question

Unknown mapping difference from host removal

0

Entering edit mode

9.5 years ago

skbrimer ▴ 740

Hi All,

I have done two viral runs on the same sample with a total of 490224 reads. Since these are viral and extracted from cell culture I know they are contaminated with the host (chicken). So I mapped to the host and pulled all the reads from the sam file with 0x4 flag.

At first I had the current version 4 gallus gallus genome, 18s rRNA, and 28s rRNA fasta's as separate files and then mapped and extracted the unmapped reads from them each in series. Which left me with 159396 reads that didn't match to those files.

Then I thought, why do that three times, concatenated the files into "chicken_genome.fasta" indexed it and re ran the host removal but this time I got 172728 reads leftover.

I do not suppose anyone could help me figure out why I got an extra 13,332 reads the second time around?

Host-removal Ion-Torrent • 1.9k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.5 years ago by skbrimer ▴ 740

Ram · Answer 1 · 2015-11-07

1

Entering edit mode

9.5 years ago

Brian Bushnell 20k

Sounds like an artifact of the way you did the mapping. What program did you use, and was it the latest version? The settings are also potentially important. If the chicken genome already includes its rRNA sequences (which it should), then concatenating extra copies of them with the main genome could reduce the mapping scores of those reads (because they will map ambiguously), which, depending on the tool and parameters you used, could make them be considered unmapped. It would be prudent to see where those 13k extra reads map. You can use filterbyname.sh (from BBMap) to extract the reads in one file that are not in another file.

If you have references for both chicken (which of course you do) and the virus, I recommend using BBSplit to map to both at once and generate one pile of reads per organism - that's the most accurate method for separation.

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by Brian Bushnell 20k

0

Entering edit mode

Hi Brian,

I used TMAP, the version is 3.4.1, which I believe is the current version or pretty close to it. It looks like Ion has not really updated TMAP itself but they moved it into an Analysis package. I'm also using the parameters for both mappings.

The reason I added those two files to the genome is because before when I would do the host removal, then the viral map I would end up still only using ~35% of the reads. So I de novo assembled the extra and got avian rRNA (chicken, kiwi, some times a salamander) so it made me think that it was not in the draft assembly of the genome. So then I took the chicken blast hits and mapped to them and then redid the viral mapping and increased to 95% of the reads mapping to the target.

That is the reason I added them to the draft genome assembly. I will get BBmap and give it a try as well.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by skbrimer ▴ 740