I am looking at some PAR-clip data. trimming and mapping went perfectly, however, when I look at the data on IGV, in the plus strand I have a vast majority of T to C mutations (as expected), but in the minus strand I see a lot of T to G mutations, like a vast predominance.
is there any way to explain this? how is it possible that only the minus mapping reads are affected? there must be a technical issue somewhere!
yeah I thought a long while about this and realized that in SAM files the sequence is always given on the forward strand, irrespective of whether it maps to the reverse. this flipping makes it so the T->C in the reverse become A->G in the forward.
i am relatively convinced by this but I would like confirmation. I really thought IGV would be smart and report the correct mismatch depending on which strand the read maps to, but it seems it's not the case, unless I misunderstood something.
IGV is a visualizer and it shows alignments as they are produced.
Showing a different mismatch depending on the strand would be actually counterproductive as the DNA is complementary so there is nothing to be gained from showing one event as being two different ones - it would only add to the information overload.