I am looking at mpileup output from RNA-Seq data, specifically at a location that looks like this.
<<<<<<<<<<<<<<<<<<<<<<<<>>>>>><<<>><>><>>>><<<<<
G>BB#GHJCCD#@#5#E;F##ICFEBIHDBDD;BB?IJGGGHJC##?EC
My understanding is that the ">" and "<" symbols mean these the current location is within an intron and reads span different exons in two directions. ( I also could not find any reference for ">" symbols in Samtools.)
What I do not understand is that the meaning of quality scores for these reads. I thought at these locations within introns there are no reads mapped and there should not be any quality score as well.
Am i completely missing something? Thanks in advance for any help.
Thanks for pointing to the "reference skip" definition. I still have not fully understood the "quality score" aspect of some reference skips. Here is my question. I am looking at mpileup output from RNA-Seq data from one sample. And the pileup output is something like
chr1 3203517 T 30 <<<<<<<<>>>>>><><>>>>>><><><<< IIIIIIHIE@HFGHIFIIIGHHIIDIHHGD
I also looked at the location in IGV and found that the location is intronic. All the reads that map at the location covers the two exons adjacent. Here is a toy example of the scenario, showing three reads that spans two exons.
Basically, no real bases are at the location, but mpileup gives quality scores for the "bases".
Does mpileup come up with random quality scores just to keep the format of mpileup intact?
The qualities are mapping qualities, which are a property (measurement) of the read, not the base. I guess you're thinking that they're base qualities. It's like
-q
versus-Q
as flags tosamtools mpileup
."consisting of chromosome name, coordinate, reference base, read bases, read qualities and alignment mapping qualities". So here it is saying read qualities as well. @matter, if you don't intend to help, why the hell are you answering?
neither you answer nor the samtools manual paragraph is enough to answer the question.