Mpileup Output And Quality Scores
2
1
Entering edit mode
12.2 years ago
NextGenSeek ▴ 290

I am looking at mpileup output from RNA-Seq data, specifically at a location that looks like this.

<<<<<<<<<<<<<<<<<<<<<<<<>>>>>><<<>><>><>>>><<<<<       
G>BB#GHJCCD#@#5#E;F##ICFEBIHDBDD;BB?IJGGGHJC##?EC

My understanding is that the ">" and "<" symbols mean these the current location is within an intron and reads span different exons in two directions. ( I also could not find any reference for ">" symbols in Samtools.)

What I do not understand is that the meaning of quality scores for these reads. I thought at these locations within introns there are no reads mapped and there should not be any quality score as well.

Am i completely missing something? Thanks in advance for any help.

mpileup • 7.6k views
ADD COMMENT
1
Entering edit mode
12.2 years ago
matted 7.8k

The > and < are reference skip symbols and do not (directly) have any particular exon/intron interpretation. They are described in the samtools manual in the paragraph starting "In the pileup format...". The quality score encoding is described there too. The question titled Some help understanding with mpileup output also discusses the mpileup format.

ADD COMMENT
1
Entering edit mode

Thanks for pointing to the "reference skip" definition. I still have not fully understood the "quality score" aspect of some reference skips. Here is my question. I am looking at mpileup output from RNA-Seq data from one sample. And the pileup output is something like

chr1 3203517 T 30 <<<<<<<<>>>>>><><>>>>>><><><<< IIIIIIHIE@HFGHIFIIIGHHIIDIHHGD

I also looked at the location in IGV and found that the location is intronic. All the reads that map at the location covers the two exons adjacent. Here is a toy example of the scenario, showing three reads that spans two exons.

                       |<- location of interest
               EXON1-------EXON2
  R1              AT-------TAG
  R2            ATAT-------TA
  R3              AT-------TAGA

Basically, no real bases are at the location, but mpileup gives quality scores for the "bases".

Does mpileup come up with random quality scores just to keep the format of mpileup intact?

ADD REPLY
1
Entering edit mode

The qualities are mapping qualities, which are a property (measurement) of the read, not the base. I guess you're thinking that they're base qualities. It's like -q versus -Q as flags to samtools mpileup.

ADD REPLY
0
Entering edit mode

"consisting of chromosome name, coordinate, reference base, read bases, read qualities and alignment mapping qualities". So here it is saying read qualities as well. @matter, if you don't intend to help, why the hell are you answering?

ADD REPLY
0
Entering edit mode

neither you answer nor the samtools manual paragraph is enough to answer the question.

ADD REPLY

Login before adding your answer.

Traffic: 1234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6