Entering edit mode
6.6 years ago
qwzhang0601
▴
80
When we do alignment of NGS data (i.e., RNA-seq, ChIP-seq) to the genome, we usually allow certain mismatches for the alignment.
Suppose we allow 2 mismatches (we also accept reads mappable to multiple loci) and a read can match to loci A of the genome with 0 mismatch, match loci B with 1 mismatch and match loci C with 2 mismatches, then what we will expect to get from the aligner (e.g., STAR, bowtie, tophat2)? Only the best matched loci were reported, or all three loci will the reported in the SAM file?
Thanks
If these reads have a good mean quality (above 25-30 phred score based) it may means that these reads correspond to a real repetitive locations, which I think is not a common task for RNA-seq or Chip-seq. However, at least in Bowtie2 and HISAT2 you can decide what to do for multi-hit sequence. Read the manual.