Hello all,
I'm new to tophat so I had a few questions. I have a reference genome hg19, and I'm aligning fastq files (pair-ended RNA). These sequences may have SNPs in them that differ from the reference. I'm getting unmapped and mapped reads, and I have a -N of 2 (default), which means that it can handle SNPs (hopefully). I just want to make sure that these reads aren't going into the 'unmapped' bam, but that they're only being given a "lower" mapping quality score in the "mapped.bam"
So far in my unmapped bam, I see a bunch of reads with flag 4. I see flags of 0, and 16, and 272 which are for single-ended and not paired so I'm not too worried about those.
My goal is to have a reasonable mapping quality score, which wouldn't introduce too much junk, but also give me some reads that have an SNP here and there, so that I can go through my reads again and use a program to realign those that need to be realigned.
Does someone have more insight into Tophat and how it handles SNPs give some input on this?