Suppose I have a paired-end data set from Illumina with 100 base pairs on each end. If any fragment is shorter than 200 base pairs, the ends of the two sequences will overlap when mapped to the genome. For example, if a particular fragment is 150 base pairs long, then the last 50 base pairs of read 1 will be the reverse complement of the last 50 base pairs of read 2.
So, which short-read mapping programs can handle such a case? And for ones that don't, how can I work around this problem?
bowtie works too
Just to clarify, these mappers will successfully map the reads, but in the overlapping part of the reads, do they get the coverage right for subsequent SNP calling? Meaning, the overlapping reads represent a single molecule, and should represent a single read at that position. If you map a single pair of overlapping reads, does it give you a coverage of two at the overlap location, and one at the non-overlapping portion of the reads?
No, but that's not really the aligner's job. The subsequent analysis tools for finding SNPs and such would have to explicitly consider the overlap. Many do not.