Question

How to check if the mapping results from tophat2 are correct?

0

Entering edit mode

8.4 years ago

mirza ▴ 180

Hi, I have RNASeq data (paired-end) from a plant-microbe interaction study at different conditions. I mapped the reads to the plant genome using tophat2 and got 32.3% to 89.2% overall read alignment rate (30.6% to 86.4% concordant pair alignment rate). 1. How to ensure that the mapping results are good? 2. How to check whether in fact the mapped reads are correct, I mean how to check that all mapped reads are of the plant and some reads from the bacteria are not contaminating the result in the output file?

RNA-Seq tophat mapping • 2.0k views

ADD COMMENT • link updated 6.6 years ago by Biostar 20 • written 8.4 years ago by mirza ▴ 180

0

Entering edit mode

You could use a metagenomics tool, like Kraken, to quickly estimate what contaminants are present in your sample. It won't tell you about issues mapping reads, but you'll at least know the source of your problem.

ADD REPLY • link 8.4 years ago by pld 5.1k

0

Entering edit mode

thanks joe. I'll try it. Kraken will tell the source of contamination, if it finds such reads? This means I'll have to give my reference. Can you please direct me to some links for Kraken manual and tutorials?

ADD REPLY • link 8.4 years ago by mirza ▴ 180

score 0 · Answer 1 · 2016-08-10

You can always view the alignments with a genome viewer (IGV is easy to use) and confirm that the alignments look reasonable. Using a different aligner (BBMap/STAR are good alternates) will provide an independent confirmation.

As for the origin of the reads (plant/microbe) you can check a few using Blast at NCBI. Other than some chloroplast/mitochondrial DNA (which may have some similarities to microbes) other data should map specifically to plants.

score 0 · Answer 2 · 2016-08-10

0

Entering edit mode

8.4 years ago

Devon Ryan 105k

The only way to check for contamination is to map the reads against likely contaminants and see how many align at least as well there compared to how they align to your plant. I'm guessing that you really want to align your reads against both genomes and then assign them according to how they align. I expect the bbsplit tool from BBMap would work for something like this.

ADD COMMENT • link 8.4 years ago by Devon Ryan 105k

0

Entering edit mode

Yes, you are right. Its a mixture of plant- microbe RNA. I want to map the reads to both the genomes and separate them out. My concern is that the output with the ones mapped to plant might as well by mistake contain some from the bacteria and vice versa. Anyway to compare the output files generated from the two alignments?

ADD REPLY • link 8.4 years ago by mirza ▴ 180

0

Entering edit mode

BBSplit syntax for generating builds for the reference genome and how to call different builds.

Use appropriate references as input. Bacterial sequences should not be aligning to plants (with some exceptions noted above).

ADD REPLY • link 8.4 years ago by GenoMax 148k

0

Entering edit mode

thanks Devon and genomax2 for your advice. @genomax2 The reads are in lakhs, do you think using BLAST will be useful?

ADD REPLY • link 8.4 years ago by mirza ▴ 180

0

Entering edit mode

If you are sure that there should be only one plant (and a limited number of bacteria, do you know how many or is it general contamination from plant surfaces etc) then using the BBsplit method is the fastest way to get this done. lakhs of reads are no problem for BBMap, which can process a couple of thousand reads per second.

ADD REPLY • link 8.4 years ago by GenoMax 148k

0

Entering edit mode

Yes, I am sure as it is a planned experiment- one plant and one bacteria. Is the link you have sent and this one (https://www.biostarhandbook.com/tools/bbmap/bbmap-help.html ), are sufficient or you can direct me to some tutorial, as I am new to linux and transcriptome analysis?