Samtools Mpileup Output
2
4
Entering edit mode
13.5 years ago
Sam ▴ 40

Hi,

I have looked all over the web and cannot seem to find what the are the definitions of behind the the <, >, and ~ symbols in mpileup output. For example:

chr6 31506624 T 78 >>><><,$cCccccccccCCcCCCCCCCccCcCCCcCccccCccCCCcCccccccccccCCCCcCCcCcCcCCcccC^~c^~C

I appreciate your help!

samtools sam mpileup • 15k views
ADD COMMENT
0
Entering edit mode

since this is still unanswered, i would suggest mailing to samtools support : samtools-help@lists.sf.net or samtools author (heng li)

ADD REPLY
5
Entering edit mode
13.4 years ago
Nina ▴ 400

The symbols "<" and ">" were added to column 5 a few releases ago. These symbols mean that this position is "covered" by a large gap (ie we're inside the "N" element in the cigar of this read). Note that reads that have a gap at this position still contribute to the total coverage reported in column 4.

For completeness I should also mention that "*" is very similar, but in this case it means the position is covered by a small gap (ie a D element in the cigar)

Also, as described in the link drio provided, "^" is always followed by another symbol. This indicates that we are at the start of a read. If you subtract 33 from the ascii value of the symbol that follows "^" it gives you the mapping quality of the read whose first base covers this position.

In a similar vein "$" means one of the reads covers this position with its last base.

In your example, if you ignore "$" and the two instances of "^~" you will find that you have 78 characters remaining in col 5, which matches the coverage depth reported in col 4.

I learned about this because for the analysis I do, we don't want gaps to contribute to the coverage depth. Here's part of an awk command that we use to adjust the coverage depth to exclude gaps

{l=$4; if($5~/>/ || $5~/</ || $5~/*/ ) {gsub(/\^./,"");l-=split($5,a,"<")-1;l-=split($5,a,">")-1;l-=split($5,a,"*")-1}
ADD COMMENT
0
Entering edit mode
13.5 years ago
Drio ▴ 920

chromosome coordinate ref_value num_of_Reads_covering_position alleles_seen_at_that_position base_quality_per_each_base

Details here.

The ascii value of the characters (minus 33) gives you the base qualities.

ADD COMMENT
0
Entering edit mode

Hi Drio, thanks for the response. Could you be more specific about the <, >, and ~ characters?

ADD REPLY
0
Entering edit mode

Edited my answer per your request. Please check the link again. The original link was incorrect. All the details are there.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2353 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6