Hello all, I am trying to call somatic variants using VarScan.v2.4.3 from merged bam files. Here is the sript used to run the same.
java -Xmx64g -jar VarScan.v2.4.3.jar somatic <(samtools mpileup -l AmpliSeqExome.hg38.bed --no-BAQ -f "$bwa_ref".fa Normal_merged.sorted.bam Tumor_merged.sorted.bam) varscan_vcf/variant --mpileup 1 --output-vcf
The run is executed without any errors and the out put is pasted below.
Min coverage: 8x for Normal, 6x for Tumor
Min reads2: 2
Min strands2: 1
Min var freq: 0.2
Min freq for hom: 0.75
Normal purity: 1.0
Tumor purity: 1.0
Min avg qual: 15
P-value thresh: 0.99
Somatic p-value: 0.05
Reading input from /dev/fd/63
Reading mpileup input...
23064925 positions in mpileup file
22855806 had sufficient coverage for comparison
22796450 were called Reference
0 were mixed SNP-indel calls and filtered
0 were removed by the strand filter
54812 were called Germline
560 were called LOH
3979 were called Somatic
5 were called Unknown
0 were called Variant
The generated vcf however has only variants till chr7 and no more. The total line counts in vcf files for both the generated snp and indel vcf files is not more than 36000 (~18k + ~18k) and the file size is just around 3 MB.
I have run the same on multiple files and every single run has produced similar output.
I have also checked the presence of other chromosomes in our bam file. Can anyone point to a reason there are no calls after chr7?
Thanks for the reply. I just reconfirmed again by opening Ampliseq file. It contains all chromosomes. Infact, the log files has total number of variants called but the saved vcf files do not have them.
Would you mind to count the lines of the output vcf and see how many variants you got using
Sometimes if you try to open a vcf with with text editor or excel you missed lines since there is a delay in loading.
In your case I would also run separately or in pipeline (|) the mpileup and the varscan commands.
wc -l on the output vcf files shows approximately 18k each. That makes it a total of 36k polymorphisms called. The log file has more than 58k polymorphisms called. I have separately generated mpileup file and run the analysis. Same output. Currently, I am running the same on Varscan v.2.4.2 to check.
That is so strange! Does your mpileup contain positions after chromosome 7?
Can you try to remove 0 coverage from mpileup
or one line
I remember when I was running VarScan for CNV calling I had to deal with issue of O coverage in mpileup despite using -q 1
Seems like I figured out where the problem lies. I just did a grep on the mpileup file.
No output for any chromosome after chr7. The mpileup file is generated from a merged bam file (4 samples in each group). Any idea why is my mpileup file incomplete?
would you mind to have a look at your bam?
https://gatkforums.broadinstitute.org/gatk/discussion/7571/errors-in-sam-bam-files-can-be-diagnosed-with-validatesamfile
or
Thanks for the suggestion. I am currently re-sorting the merged bams. Will update this thread once done.
Update: quickcheck executed without errors. I tried resorting the bam and ran mpileup. I also tried running mpileup on another machine. All of the steps have generated only an incomplete pileup file. It is confirmed something is going wrong with mpileup step. I have posted the issue on github, let me wait for a suggestion.
How did you merge bam files?
Maybe something is missing or is wrong in the header, like a read group value
I used samtools merge
My reference file also had non standard chromosomes in them.
i have now cleaned up the bam as well as the bam header for all non standard chromosomes. i am running mpileup now. Keeping my fingers crossed :)
Now this is frustrating. Even after cleaning the headers, I am having the same issue.
I am stumped. Not really sure how to proceed from here.
Indeed!!!
Last think before realignment of fastq.
what is the line of last read aligned in chr 7?
And finally... Can you try to output a bam file containing reads after chromosome 7 and run mpileup?
on two different machines have given me identical output.
Thats a good suggestion. I will try to run the same.
Actually I was talking about the last reads of chr7 in sorted bam file.
Are those the last lines of chr7 in mpileup?
The file stops too early in chr7 (chr7 100894305).
The last cordinate on chr7 in my bam is
I think I figured out the problem. I went back to bed file to inspect it near the region samtools exits. There seems to be a blank line separating two entries.
I also found three more blank lines in the bed file.
Yes, this makes it clear.
It is exactly the coordinate that mpileup stops.
Finally!!!