Meaning of SGB field in samtools mpileup output
1
0
Entering edit mode
10.4 years ago
dutro2 • 0

Does anyone know what the SGB field in SAMtools' mpileup output means? The header field ("Segregation based metric.") doesn't really shed much light on it either. Is there a paper or some other source that details what this value means and the theory behind it/how it is calculated?

SNP • 4.8k views
ADD COMMENT
0
Entering edit mode

Hello dutro2!

It appears that your post has been cross-posted to another site: samtools email. Just wait for one of the authors to reply (or look through the source code).

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode
Sorry about that. I wasn't sure how much overlap there was between the communities.
ADD REPLY
0
Entering edit mode

No worries, you've actually inspired me to look at the source-code for how that's generated to try and figure out what it means (I have an idea, but I'm not sure yet). For reference, that was added by pd3 in April 2013 without a meaningfully related comment.

ADD REPLY
0
Entering edit mode

I'd be interested to know what your idea is. After some Googling, my guess was that it has something to do with Mendelian segregation. I come from a computing background, so if I'm completely off-base on this then please let me know :).

I also found this tool: https://github.com/nc6/segregationBias, but I haven't found anything in there resembling the code in calc_SegBias (plus the Haskell code is practically unreadable IMO).

ADD REPLY
0
Entering edit mode

That's probably not far off. It looks a lot like a log-likelihood calculation, though I can't seem to find what distribution or equation it's derived from. In short, it looks at how the variant reads are distributed over samples and will give higher (more positive) scores in cases are all in one or a few samples. Since this is used in the context of multi-sample variant detection, it's to enable finding false-positive calls that are only present due to low-frequency sequencing errors in multiple samples. That's my guess at least, though it'd be good if Petr Danecek wrote back to your question on the samtools list since he would be the person to actually know.

ADD REPLY
2
Entering edit mode
10.4 years ago
pd3 ▴ 350

Here are math notes for SGB calculation:

http://samtools.github.io/bcftools/
http://samtools.github.io/bcftools/rd-SegBias.pdf

ADD COMMENT
0
Entering edit mode

Thanks, this is exactly what I needed.

ADD REPLY

Login before adding your answer.

Traffic: 2053 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6