Hi,
I have annotated vcf files from TCGA. I'm interested in looking at Germline variants in TCGA samples. The TCGA vcf contains variant calls from both normal and primary tumors. I'm trying to understand how to differentiate between germline and somatic variants.
Would I be able to tell germline variants simply from these two information:
##INFO=<ID=SS,Number=1,Type=Integer,Description="Somatic status of sample">
##FORMAT=<ID=SS,Number=1,Type=Integer,Description="Variant status relative to non-adjacent Normal,0=wildtype,1=germline,2=somatic,3=LOH,4=post-transcriptional modification,5=unknown">
And this might sound stupid but TCGA vcf files contain both normal and primary sample but the how can I tell if the annotation in info columns belong to normal or primary samples?
Where did you get the VCF files? Did you apply for access to the protected germline data?