Question

Ngs: How Many Reads Should Be Expected To Map To The Reference Sequence

3

Entering edit mode

15.0 years ago

Allpowerde ★ 1.3k

I was wondering how many reads can usually be made to map to the reference sequence? For next-gen sequencing analysis, choosing the right parameters (for QC and mapping) is quite empirical and the choice hugely influences the outcome.

I try to map 1 Million paired-end reads from illumina to a certain region in the genome. I used fastx-toolkit for QC and mapped with SOAP2. After QC I have about 140 000 reads left of which only 200 pairs map to the reference sequence. This can not be a normal outcome, can it?

(I know that I'm trying to align to the right reference sequence because a quick de-novo assembly of these reads maps exclusively to the region's locus when doing a blat search of the assembled sequence against the genome)

Any suggestions (e.g. should I play around with the mapping parameters to get more to map) ?

next-gen-sequencing short-read-aligner • 4.6k views

ADD COMMENT • link updated 17 months ago by Ram 45k • written 15.0 years ago by Allpowerde ★ 1.3k

Ram · Answer 1 · 2010-04-25

Seeing so few reads match indicate problems that may not be solvable via parameter tuning.

Possible explanations include:

The reads also include extra information(for example indices used for multiplexing or other adapters at the beginning or end)
The reference genome is incorrect - although you suggest that you checked that - you could have just found a homologous region
The software or its installation is failing - try a different aligner and don't filter for quality first, see what happens. Same for the filtering.
The control lane was not set properly or the sequences do not have random base distribution over each index. The illumina basecaller has this requirement to function properly.
The library preparation or sequencing has failed (reagents etc) - that is also an option but I would keep it for last

We usually see between 20% to 60% reads match to the genome.

score 0 · Answer 2 · 2012-02-02

0

Entering edit mode

13.2 years ago

Jeremy Leipzig 23k

I think the most common problem is you have phred-64 quals and didn't tell your aligner.

If that's not it please show us one read you think should have mapped but didn't.

ADD COMMENT • link 13.2 years ago by Jeremy Leipzig 23k

score 0 · Answer 3 · 2012-02-02

There is something wrong if your QC rejects 86% of reads in the first place. For mapping quality filtering does not make much sense, since you can map 96bp Illumina reads with string of Bs as quality values with 0-2 mismatches. Check how your reads look in FastQC quality-wise.

Also SOAP2 is not that great at mapping compared to other programs. Check if BWA gives you similar figures.