Mapping whole exome sequencing
1
0
Entering edit mode
5.0 years ago
Assa Yeroslaviz ★ 1.9k

I was wondering whether or not it would make more sense to map a Whole exome dataset to the transcriptome of an organism or would it be better to map it the the genome.

Are there any advantages in mapping it to the genome?

thanks

WES exome bwa gatk • 1.5k views
ADD COMMENT
1
Entering edit mode

15-40% of read from WES are actually off-target reads, so they do not come from the designed targeted regions. If you try to forcefully map them into transcriptome - many of them will actually map there (with errors) and cause many false positive results.

ADD REPLY
0
Entering edit mode

this is true, but what if I use stringent mapping parameters? Won't this even indirectly clean the data set, as the off-targets reads won't be mapped?

ADD REPLY
0
Entering edit mode

Your mapping quality is defined as (roughly) number of places where your read can be aligned in the genome. If you remove 98% of the places where your read may be aligned, your mapping accuracy will be not that accurate.

ADD REPLY
0
Entering edit mode

The mapping accuracy is relative to what i am mapping against. If i map against a transcriptome, I already know I will get less results. But if it is cleaner than having a lot of off-set hits, why is it a bad thing?

ADD REPLY
3
Entering edit mode

The most obvious situation - you have a piece of exome which is similar to many pieces in your genome. Normally, most of the reads will have mapping quality 0 and won't be used for calling. But you remove the rest of your genome so your piece of exome became unique. Bam - all the reads are aligned with the highest mapping quality.

Or you have off-target reads coming from the region similar to some piece in your exome. If you align to the full genome, these reads will have small mapping quality, but still will be aligned to the right position. You remove this "true" region - and your reads are aligned to your exonic piece now (with many errors), but they have mapping quality 1 - since the mapping is "unique" now.

And so on.

Upd: I just understood that these situations are almost identical. The generalized statement is provided in one of the comments below.

ADD REPLY
0
Entering edit mode

In addition to the reasons given by WouterDeCoster and kuckunniwid , if you map only to the transcriptome, you will have less information about the quality of the exome library prep and sequencing: when mapping to the genome, there are in-target mapped reads, off-target mapped reads, and unmapped reads. If mapping only to the transcriptome, there would be only "in-target" mapped reads and unmapped reads.

ADD REPLY
4
Entering edit mode
5.0 years ago

Always align to the full genome. You will create false positives and false negatives by preselecting regions. Additionally, your coordinates will be a nightmare.

ADD COMMENT
0
Entering edit mode

Yes, this is true. I didn't think about the gene's (exon's) coordinates.

But I don't understand why the FP and FN should be a problem. If the parameters are correct, wouldn't the reads either map if they belongs or be discarded if not? In every NGS data set we have some FN and FP, but as long as they are under control it should be ok, or not?

ADD REPLY
2
Entering edit mode

An aligner is going to find the best fitting position for a read. That is not necessarily the position where the DNA fragment belongs. If there is a perfect match in the genome and a nearly perfect match in the exome you will get differences if you align against one or the other.

ADD REPLY

Login before adding your answer.

Traffic: 2438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6