VarScan.v2.4.3 generating incomplete or partial VCF file
1
0
Entering edit mode
5.4 years ago
Sandeep ▴ 260

Hello all, I am trying to call somatic variants using VarScan.v2.4.3 from merged bam files. Here is the sript used to run the same.

java -Xmx64g -jar VarScan.v2.4.3.jar somatic <(samtools mpileup -l AmpliSeqExome.hg38.bed --no-BAQ -f "$bwa_ref".fa Normal_merged.sorted.bam Tumor_merged.sorted.bam) varscan_vcf/variant --mpileup 1 --output-vcf

The run is executed without any errors and the out put is pasted below.

Min coverage:   8x for Normal, 6x for Tumor
Min reads2: 2
Min strands2:   1
Min var freq:   0.2
Min freq for hom:   0.75
Normal purity:  1.0
Tumor purity:   1.0
Min avg qual:   15
P-value thresh: 0.99
Somatic p-value:    0.05
Reading input from /dev/fd/63
Reading mpileup input...
23064925 positions in mpileup file
22855806 had sufficient coverage for comparison
22796450 were called Reference
0 were mixed SNP-indel calls and filtered
0 were removed by the strand filter
54812 were called Germline
560 were called LOH
3979 were called Somatic
5 were called Unknown
0 were called Variant

The generated vcf however has only variants till chr7 and no more. The total line counts in vcf files for both the generated snp and indel vcf files is not more than 36000 (~18k + ~18k) and the file size is just around 3 MB.

I have run the same on multiple files and every single run has produced similar output.

I have also checked the presence of other chromosomes in our bam file. Can anyone point to a reason there are no calls after chr7?

SNP variant calling exomeSeq varscan • 3.6k views
ADD COMMENT
1
Entering edit mode
5.4 years ago
2nelly ▴ 350

Hi Sandeep,

Can you please check if AmpliSeqExome.hg38.bed contains only regions from chr1 to chr7?

ADD COMMENT
0
Entering edit mode

Thanks for the reply. I just reconfirmed again by opening Ampliseq file. It contains all chromosomes. Infact, the log files has total number of variants called but the saved vcf files do not have them.

ADD REPLY
0
Entering edit mode

Would you mind to count the lines of the output vcf and see how many variants you got using

grep -v "#" file.vcf | wc -l

Sometimes if you try to open a vcf with with text editor or excel you missed lines since there is a delay in loading.

In your case I would also run separately or in pipeline (|) the mpileup and the varscan commands.

ADD REPLY
0
Entering edit mode

wc -l on the output vcf files shows approximately 18k each. That makes it a total of 36k polymorphisms called. The log file has more than 58k polymorphisms called. I have separately generated mpileup file and run the analysis. Same output. Currently, I am running the same on Varscan v.2.4.2 to check.

ADD REPLY
0
Entering edit mode

That is so strange! Does your mpileup contain positions after chromosome 7?

Can you try to remove 0 coverage from mpileup

samtools mpileup -B -q 1 -f ref.fa normal.bam tumor.bam 

##Remove 0 coverage from mpileup file###

awk -F"\t" '$4 > 0 && $7 > 0' normal-tumor.mpileup > normal-tumor_no_0_coverage.mpileup

or one line

samtools mpileup -B -q 1 -f reference.fasta normal.bam tumor.bam | awk -F"\t" '$4 > 0 && $7 > 0' | java -jar VarScan.jar .....

I remember when I was running VarScan for CNV calling I had to deal with issue of O coverage in mpileup despite using -q 1

ADD REPLY
0
Entering edit mode

Seems like I figured out where the problem lies. I just did a grep on the mpileup file.

 grep 'chr1' normal-tumor.mpileup
chr1    68929   A   88  ..............,..............................................,,,,,,,,,,,,,,,,,,,,,,,,,,^!.  ><588=<7>?;948/>===<<;<=@<;=@?>=/<?<@<;<<<9=@</<<;><==9;<;:8<883866684717488
86577853/15/    73  ..,.........,,,,.................................,,,,,,,,,,,,,,,,,,,,,,,^!. 8>.98<48//9;3714:08/<5<8A@</;=8@;8/=<=99<5<<<6=:@88668858874424335073868/
chr1    68930   A   87  .............,,............................................,,,,,,,,,,,,,,,,,,,,,,,,,,,, 95/0187489463=68999778679679999;667816786389771795997825678=A:=8<=5=@=87@;9==:5<;=77
57; 69  ..,........,,,,..............................,,,,,,,,,,,,,,,,,,,,,,,,   664448.7258:=9;714684898877800983158269613848==;;=<:<=;;:=95:=7=9?<>5
chr1    68931   G   90  ..............,,..............................................,,,,,,,,,,,,,,,,,,,,,,,,,,,.  =<575<<:=<:;4;868<<<<<<;;<;;;<;<>/;;<=89<;;8==</<8<=<==<<7:;<<;A;<:;:4=7=97<
69==:5;<=717:/  74  ..,.........,,,,.................................,,,,,,,,,,,,,,,,,,,,,,,.,  :<5;:=9<7/9=3=:9//5/;:=8=<</<<<<77/<;889;<<<58=9;=<69=<:==;<;>65:=7=7===/5

grep 'chr7' normal-tumor.mpileup
chr7    193426  c   0   *   *   0   *   *
chr7    193427  c   0   *   *   0   *   *
chr7    193428  g   0   *   *   0   *   *
chr7    193429  c   0   *   *   0   *   *
chr7    193430  c   0   *   *   1   *   .

grep 'chr8' normal-tumor.mpileup

No output for any chromosome after chr7. The mpileup file is generated from a merged bam file (4 samples in each group). Any idea why is my mpileup file incomplete?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks for the suggestion. I am currently re-sorting the merged bams. Will update this thread once done.

ADD REPLY
0
Entering edit mode

Update: quickcheck executed without errors. I tried resorting the bam and ran mpileup. I also tried running mpileup on another machine. All of the steps have generated only an incomplete pileup file. It is confirmed something is going wrong with mpileup step. I have posted the issue on github, let me wait for a suggestion.

ADD REPLY
0
Entering edit mode

How did you merge bam files?

Maybe something is missing or is wrong in the header, like a read group value

ADD REPLY
0
Entering edit mode

I used samtools merge

samtools merge -@ 32 Normal_merged.bam Normal1.bam Normal2.bam Normal3.bam

My reference file also had non standard chromosomes in them.

SN:chr11_KI270721v1_random
SN:chr14_GL000009v2_random
SN:chr14_GL000225v1_random
SN:chr14_KI270722v1_random
SN:chr14_GL000194v1_random
SN:chr14_KI270723v1_random

i have now cleaned up the bam as well as the bam header for all non standard chromosomes. i am running mpileup now. Keeping my fingers crossed :)

ADD REPLY
0
Entering edit mode

Now this is frustrating. Even after cleaning the headers, I am having the same issue.

$ cut -f 1 sample_merged.sorted.mpileup | uniq
chr1
chr2
chr3
chr4
chr5
chr6
chr7

I am stumped. Not really sure how to proceed from here.

ADD REPLY
0
Entering edit mode

Indeed!!!

Last think before realignment of fastq.

what is the line of last read aligned in chr 7?

And finally... Can you try to output a bam file containing reads after chromosome 7 and run mpileup?

ADD REPLY
0
Entering edit mode
tail sample_merged.sorted.mpileup

on two different machines have given me identical output.

chr7    100894296   A   50  ,................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,  @47362967899964:7<>7>>9<>;?B9<?9?=?9<<=<:=55><4:7<  33  .........,,,,,,,,,,,,,,,,,.,,..,,   086474599;<;<;>=B<5<5<:8959?=89=8
chr7    100894297   G   52  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,^].  @5<5;5:;<<<4<=8;=;<=8<<9<<;?A9<@8>=<=><?<:<587<:879<    37  ............,,,,,,,,,,,,,.,,,,.,,..,,   7<;./8<:;=.=<><=;>?C?8</9.5887=>=;9=8
chr7    100894298   C   52  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,.    ;5<5;59>=<<8==8<=<997;975789;0794@98>9798571738:454>    36  ...........,,,,,,,,,,,,.,,,,.,,..,,^].  :=;.:<;;=8=7969998;7071./440<:8<9989
chr7    100894299   C   52  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,.    <08568698793994777<@<==<<>=?98==8==<9>><<9<:/9=48:89    37  ..........,,,,,,,,,,,,,.,,,,.,,..,,,.   7884765518<B=><><B:9<7835<:;9==61=9<5
chr7    100894300   A   53  ,...........+1G......,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,.^].  <6<<:<;<<<=3<>8<;7=@:A<<<<?;89=>8==<><8<<8;978=;8;8<5   37  ..........,,,,,,,,,,,,,.,,,,.,,..,,,.   :;<:8;;:9<:=8=<>:<;8<7875<:5<=>;8<7</
chr7    100894301   G   53  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,..   94;<;>;=<>=3<=8<;79:6977698635994998?5389467;69<455>8   39  ............,,,,,,,,,,,,,.,,,,.,,..,,,. 9=<56:8?<:<<49098949749297487/?99<=748/
chr7    100894302   G   52  ,................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,..    A17/5778899784781=@:=<<<>=<>>=><?<@9<9<=:;;6?>69;991    38  ............,,,,,,,,,,,,,.,,,,.,,..,,,  478.16217677:>9<>>:=/;=/=4:=:59>>61=:<
chr7    100894303   A   53  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,..   <3;39<<=<<=3;<8<=7<<7<;=<<=;<8<><?<==;9<=9;;6;<<9;7>7   39  ............,,,,,,,,,,,,,.,,,,.,,..,,,. :;<56:55:::;9>8=<=5</;=/B77=:5=<>;8<:;/
chr7    100894304   G   50  ,.........A.......,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,..  73<5;<====96;<;<=<7674787534785888=7887374727=22><  35  ............,,,,,,,,,,.,,,.,,..,,,. <=<38;55;:;<59176891989850<77;7836/
chr7    100894305   G   53  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,..   <3==;?<====6=@;<=<>;8<;=>==;:8<@>@=>=<>=<:?=69<@9:7>:   38  ............,,,,,,,,,,,,,.,,,,.,,..,,,  <@<3>;5;;::<<>;<<=7>:;?9C78=<9=<>B;>=;

Can you try to output a bam file containing reads after chromosome 7 and run mpileup?

Thats a good suggestion. I will try to run the same.

ADD REPLY
0
Entering edit mode

Actually I was talking about the last reads of chr7 in sorted bam file.

Are those the last lines of chr7 in mpileup?

The file stops too early in chr7 (chr7 100894305).

ADD REPLY
0
Entering edit mode

The last cordinate on chr7 in my bam is

0IN8L:03162:11340   16  chr7    159335889   0   34M *   0   0
ADD REPLY
0
Entering edit mode

I think I figured out the problem. I went back to bed file to inspect it near the region samtools exits. There seems to be a blank line separating two entries.

chr7    100893945   100894175   ACHE_80133.3.2392   0   +   .   GENE_ID=ACHE;Pool=10;TRIM_LEFT=0;TRIM_RIGHT=0
chr7    100894084   100894305   ACHE_80133.3.7298   0   +   .   GENE_ID=ACHE;Pool=11;TRIM_LEFT=0;TRIM_RIGHT=48

chr7    100990480   100990689   MUC12_80135.1.27552 0   +   .   GENE_ID=MUC12;Pool=1;TRIM_LEFT=1001;TRIM_RIGHT=0
chr7    100990677   100990894   MUC12_80135.1.26417 0   +   .   GENE_ID=MUC12;Pool=2;TRIM_LEFT=1001;TRIM_RIGHT=0
chr7    100990882   100991081   MUC12_80135.1.5617  0   +   .   GENE_ID=MUC12;Pool=3;TRIM_LEFT=1001;TRIM_RIGHT=0

I also found three more blank lines in the bed file.

ADD REPLY
0
Entering edit mode

Yes, this makes it clear.

It is exactly the coordinate that mpileup stops.

Finally!!!

ADD REPLY

Login before adding your answer.

Traffic: 1573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6