how to make confidence calls on SNPs obtained from next gen sequencing data??
how to make confidence calls on SNPs obtained from next gen sequencing data??
although I would expand a little bit this question, in order to know what exactly you want to know about confidence, I'm pretty sure you're asking "how can I be sure that the calls I'm getting are valid?". if that is the case, a couple of things come to my mind:
I've attended several talks where they claim that following the basic restrictions of the SNP calling algorithm on NGS one could be sure that no false positives were to be obtained, but we indeed came through a particular one when dealing with a NGS project that was pointing us to a mutation we were longing to detect, although after sanger sequencing it we came back to the real world. when I dug deep into this mutation I found out that it was barely meeting the minimum requirements for the SNP caller we were using, so our decision was to slightly modify such threshold. it is true that the false positive rates of NGS looks really tiny, but it is my feeling that we are still not as close as we would like to in order to use it for clinical purposes (where false positives and negatives shouldn't appear). my group has certainly put lots of efforts on this matter, so we'll see how this looks like in a year time (at least).
a broader description will definitely be useful. we'll have to return later to see if he follows your advice. I guess sometimes I let myself free to speak about my own interests, hoping to answer the addressed questions while encouraging others to comment my own answer. and it has been very useful, I must say ;)
I worked previously on a project where we wrote our own SNP finder. We didn't use a single score to summarize SNP confidence, but we did have some biological intuition for what we could confidently call a SNP (versus a sequencing error or other artifact).
See my previous post here and let me know if you have any questions about terminology.
the problem arises when dealing with several thousands of variants. when dealing NGS data one must assume that this kind of methodology can almost only be used on small datasets (we've been using it for years and we now have knowledge enough to be confident on it), ie on a reduced set of the raw sequencer results. but to get to that subset you will have to work with some kind of score, or filtering by certain annotation of interest, which at the end is reflecting the "classic" intuitive model mentioned.
Michael, This is the expansion of my question. I am working on NGS data. I have used samtools for variant calling. now I want to test the validity or confidence level of these SNP calls.
Thanks Jorge, your suggestions have been useful.
hi vish, welcome to BioStar. if you want to expand your query (that was Michael's sugestion) you should use the question's "edit" option on the question, and if you want to comment anything you may do so by using the answers' "add comment" option. if you find an answer useful you can certify it by giving a +1, or by accepting the answer on the acceptance tick if it has fulfill your needs. in any case please do not add an answer if you are not properly answering your own question, as this may confuse for future readers (sure that's the reason why you've been negatively evaluated.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Your question could mean a lot of things, but you are not using the right terms to describe your intention. I recommend you first find out what you are really aiming at and then expand your question.