My basic understanding is that "mapped" = placed on a specific location in the ref genome, whereas "aligned" = placed on a specific location with possibly gaps to maximize the similarity with the ref genome.
Does it mean that all aligned reads are also mapped, but not viceversa?
Also, in a typical tophat's output we find alignments information, not read information. So when can I expect to see a read unmapped? If it is not mapped, it won't make it into the output, isn't it?
I think the difference is not specific for TopHat, and it is more about the exact meaning of the words.
Mapped is what you say about a sequence and a larger sequence, where the larger sequence can be a whole chromosome. Mapped as you say means that the sequence is assigned to a specific location. That means that is has to align with the sequence that is present at that location. The alignment could indeed contain gaps.
Aligned is what you say about two sequences of any length, where the alignment indicates how well the sequences compare. Since an alignment will also show where the two sequences start and end to align, the alignment also yields you a mapping.
I am still a bit confused. If the unmapped reads are not reported in TopHat, why is there a flag specifically to denote unmapped reads?
The reason why I am asking is that I have a tophat output with 30 million reads, all mapped according to flagstat. However, the flags indicate some of these reads are unmapped.
Unless something has changed with TopHat that could be an issue with their software. Do the reads flagged as unmapped have valid chromosomes and positions?
Unless something has changed with TopHat that could be an issue with their software. Do the reads flagged as unmapped have valid chromosomes and positions?