Hi all,
I wanted to validate my pipeline for SNVs discovery. To do that I've downloaded exome fastq and vcf annotation files for each chromosome from http://www.internationalgenome.org/data-portal/sample/HG00119.
At the end I wanted to compare snps and indels in both results only for exonic sequences downloaded from UCSC database. To my surprise there are a lot of snps annotated by 1000g that I don't see in IGV at all! (and they are not annotated by my pipeline).
In addition I've downloaded BAM files for analysed sample from 1000g database, the same page http://www.internationalgenome.org/data-portal/sample/HG00119. And the same here - a lot of annotated variants not seen in Bam file.
I am very confused... is there other way I can validate my pipeline? Also would you recommend to write an email with this problem to 1000g?
Validating to wrong annotations is useless. Maybe there is a reference that is well checked and I can use it for validation of pipeline?
PS. I've selected vcf only for one sample so I am sure that those annotations are related to that patient. Thanks in advance,
Agata
My whole life changed after reading this paper: https://www.ncbi.nlm.nih.gov/pubmed/27535533
Thanks, I will definitely read that, Best, Agata
I strongly suggest you to do so and to try to understand every word (it took me weeks). There's a lot of knowledge in there.
Hi agata88,
Why was this thread deleted?
Not really nice after you have received a helpful suggestion...
Cheers, Wouter
Sorry, I've made that by mistake... thanks for bringing it back :)
I see that I have a lot of snp and indel with AC=0. Which means that it has no different allele but still is annotated in vcf file... that is why I've got confused. When I filtered it looks like everything is good. Now I'm feeling embarrassed because I missed it during analysis...still I hope that this post will help somebody with similar problem in the future :), Best, Agata
Did you filter the output of
samtools mpileup
without doingbcftools call
(just a guess)? When you call, only the positions that actually have something to say are kept.Yes, I know. I am actually using VarScan for variant detection :) Thanks