Question

Problem Analyzing Tumor-Normal Pairs With Varscan

1

Entering edit mode

13.0 years ago

tommivat ▴ 250

I am using Varscan 2.2.11 to analyze a tumor-normal pair of whole exome NGS data. I used 1000genomes reference genome (from here) and my .bam files are sorted. I use shell script similar to this to call VarScan and get the following summary after my run:

2 015 987 104 positions in tumor
2 015 519 315 positions shared in normal
   90 913 571 had sufficient coverage for comparison
            0 were called Reference
          496 were mixed SNP-indel calls and filtered
   90 834 408 were called Germline
            0 were called LOH
            0 were called Somatic
        78667 were called Unknown
            0 were called Variant

Obviously, there are some problems, since almost all positions with sufficient coverage are called Germline and none are called Reference. Can you point out what am I doing wrong, please.

varscan cancer samtools next-gen • 4.4k views

ADD COMMENT • link updated 4.8 years ago by Biostar 20 • written 13.0 years ago by tommivat ▴ 250

2

Entering edit mode

I suspect that the reference used in the alignment (which I haven't done myself) and paired analysis has to be same. In this case, I do not (yet) know which reference was used in alignment. Can you confirm if this can create the problem above.

ADD REPLY • link 13.0 years ago by tommivat ▴ 250

2

Entering edit mode

The header of your bam file should contain information about which reference was used for alignment.

ADD REPLY • link 13.0 years ago by Chris Miller 22k

score 2 · Answer 1 · 2012-08-14

2

Entering edit mode

13.0 years ago

Matt Shirley 10k

You most likely need to realign one or both of your samples to the same reference. To confirm that the BAM files are actually using different references:

samtools view -H normal.bam | egrep "@SQ" > normal
samtools view -H tumor.bam | egrep "@SQ" > tumor
diff -y normal tumor

The above is just a simple way of grabbing the sequence dictionary from the header and comparing. If there are any differences between your files, then you will need to align them both to the same reference.

ADD COMMENT • link 13.0 years ago by Matt Shirley 10k

0

Entering edit mode

Thank you for the answer! There are no differences in headers. Do you still think I need to realign both samples to my current reference? How is such a realignment done? (I realize this must be a simple task but I haven't works with samtools too much.)

ADD REPLY • link 13.0 years ago by tommivat ▴ 250

1

Entering edit mode

I think that you are probably using a reference genome sequence that does not match what you two BAM files were aligned to. Try tracking down the reference that your BAM files were aligned to and specifying that as the reference for VarScan2.