Entering edit mode
12.4 years ago
Jirapong
▴
30
My mpileup output looks like this.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT xyzzy.W1g7rI9gs2.bam
X 533 . C G 25 . DP=42;VDB=0.0033;AF1=0.5;AC1=1;DP4=20,0,18,0;MQ=20;FQ=26.8;PV4=1,7.1e-22,1,1 GT:PL:GQ 0/1:55,0,60:57
X 537 . C T 25 . DP=44;VDB=0.0042;AF1=0.5;AC1=1;DP4=23,0,20,0;MQ=20;FQ=26.6;PV4=1,4.3e-20,1,0.28 GT:PL:GQ 0/1:55,0,59:57
Is it possible to get strand information? or Do the VCF/BCF provide strand information?
I don't think strand information is relevant in a variant format, as the alleles for a variant should be given for the leading (forward, 5'-3') strand, the same direction as the reference sequence. The opposite strand sequence follows base pairing. I don't think any variant which results in imperfect base pairing is viable.
"should be given", well, I am not sure what you are referring to (what level of generality) but sometimes they are given in both forward and reverse strand (e.g.: Comadran et al. 2012). In the VCF format it seems that nothing is really specified according to http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
In the case of GATK you are right "Note that REF and ALT are always given on the forward strand." From http://gatkforums.broadinstitute.org/discussion/1268/how-should-i-interpret-vcf-files-produced-by-the-gatk
However, this doesn't really mean that it is the case of all data from all sources, in my opinion.