I was wondering how many reads can usually be made to map to the reference sequence? For next-gen sequencing analysis, choosing the right parameters (for QC and mapping) is quite empirical and the choice hugely influences the outcome.
I try to map 1 Million paired-end reads from illumina to a certain region in the genome. I used fastx-toolkit for QC and mapped with SOAP2. After QC I have about 140 000 reads left of which only 200 pairs map to the reference sequence. This can not be a normal outcome, can it?
(I know that I'm trying to align to the right reference sequence because a quick de-novo assembly of these reads maps exclusively to the region's locus when doing a blat search of the assembled sequence against the genome)
Any suggestions (e.g. should I play around with the mapping parameters to get more to map) ?