Question

Varscan Output Strandness

0

Entering edit mode

12.0 years ago

GPR ▴ 390

Hello, I am carefully performing an SNP analysis, with BWA, SamTools, Picard, Bcftools and VarScan, primarily. The one thing I am not sure I have under control is the strand information in my data. My reads are 75bp paired-end from a cDNA library not strand-specific. My question is, how does varscan handle this? Will the input have info about this? I know that some A-to-G calls will be lost in the absence of strand information. Other than that, will the final vcf file contain this data? whether the SNP is on the plus or minus strand? If not? how would I best tackle this problem? Thanks, G.

snp strand • 3.0k views

ADD COMMENT • link updated 12.0 years ago by Sean Davis 27k • written 12.0 years ago by GPR ▴ 390

score 1 · Answer 1 · 2012-12-13

1

Entering edit mode

12.0 years ago

Sean Davis 27k

SNVs are always on both strands since an SNV on the forward strand implies the complement base on the reverse. I believe VarScan reports SNVs on the forward strand.

ADD COMMENT • link 12.0 years ago by Sean Davis 27k

0

Entering edit mode

Yes, this is true.

ADD REPLY • link 12.0 years ago by Chris Miller 22k

0

Entering edit mode

OK, thanks. I guess my confusion stems from the fact that in VarScan.jar filter, there is the option (--min-strands2) to be set as 1 or 2. What's the recommended value here? Should the variant be observed in the two strands? or is 1 enough?

ADD REPLY • link 12.0 years ago by GPR ▴ 390

1

Entering edit mode

Only you can answer this question. The "strand" in this case has to do with the strandedness of the READ. If the variant is only on one strand, that may represent a false positive finding. I would recommend using VarScan liberally (let everything through) and then filter after-the-fact.

ADD REPLY • link 12.0 years ago by Sean Davis 27k

0

Entering edit mode

Yep, Sean nailed it. SNVs that only appear on one strand or the other are often false-positives. That said, if sequence coverage is low at that site, you may only see reads from one strand just by chance. It's up to you to decide what acceptable thresholds are.