Hello Everyone,
I'm new to the Whole Exome Sequencing analytics and keep running into the following issue -
I'm running Varscan2.4.2 and invoking the copynumber command on an mpileup file and I keep receiving a "parsing exception error":
[mpileup] 2 samples in 2 input files
<mpileup> Set max per-file depth to 4000
Min coverage: 10
Min avg qual: 15
P-value thresh: 0.01
Reading input from Normal.sort.rmdup_Tumor.sort.rmdup.mpileup
Reading mpileup input...
Parsing Exception on line:
chr1 12956 T 0 * * 0
8
A little info on the pre-processing: I have downloaded Bam files from the TCGA, used Samtools to sort the Bam files, used Picard tools to mark duplicates, samtools to build the mpileup and now running varscan2 for the copy number output.
Anyone have any idea on what could be causing the Parsing Exception error and how I might possibly circumvent this? I've done a little googling and saw that in other cases people have tried removing 0 coverage lines by piping the mpileup through the awk command, but I thought this issue was addressed in the later versions of Varscan.
Thank you kindly for reading!
In case more info is needed, Here are the commands and options I am running:
#Sort
samtools sort -l 0 -O bam -T Normal.sort -o Normal.sort.bam Normal.bam
samtools sort -l 0 -O bam -T Tumor.sort -o Normal.sort.bam Tumor.bam
#MarkDuplicates
java -jar Picard.jar MarkDuplicates I=Normal.sort.bam O=Normal.sort.mkdp.bam METRICS_FILE=dup1 TMP_DIR=tmp1
java -jar Picard.jar MarkDuplicates I=Tumor.sort.bam O=Tumor.sort.mkdp.bam METRICS_FILE=dup2 TMP_DIR=tmp2
#Mpileup
samtools mpileup -q 1 -B -f GRCh38.d1.vd1.fa Normal.sort.mkdp.bam Tumor.sort.mkdp.bam > Normal.sort.mkdp_Tumor.sort.mkdp.mpileup
#Varscan
java -jar VarScan.v2.4.2.jar copynumber Normal.sort.mkdp_Tumor.sort.mkdp.mpileup Normal.sort.mkdp_Tumor.sort.out --mpileup 1 --data-ratio 0.8916044227
How does that line look different from the others? For example use the following on your file
grep -w -A 5 -B 5 '12956' yourfile.txt
with -w (only searching for whole occurences of the string, 129568 is not a match), with -A (printing n lines after the match), with -B (printing n lines before the match).Thank you for the fast reply! After running what you stated with the grep command, chr1 gave me this output with lines above and below 12956 showing a value of "1" (Does that mean that chr4 would also give me an issue?):
chr1 12951 A 0 * * 1 . ? chr1 12952 T 0 * * 1 . > chr1 12953 G 0 * * 1 . C chr1 12954 G 0 * * 1 . A chr1 12955 G 0 * * 1 . A chr1 12956 T 0 * * 0
chr1 12957 C 0 * * 1 . B chr1 12958 A 0 * * 1 . @ chr1 12959 T 0 * * 1 . = chr1 12960 C 0 * * 1 . @ chr1 12961 C 0 * * 1 . @
Does that mean chr 4 would give me even more issues or are the two unrelated issues? chr4 12951 C 1 , F 0 * * chr4 12952 T 1 , @ 0 * * chr4 12953 A 1 , @ 0 * * chr4 12954 C 1 , E 0 * * chr4 12955 A 1 , @ 0 * * chr4 12956 C 1 , F 0 * * chr4 12957 T 1 , @ 0 * * chr4 12958 A 1 , @ 0 * * chr4 12959 C 1 , E 0 * * chr4 12960 T 1 , B 0 * * chr4 12961 C 1 , E 0 * *
Thank you very much! (Sorry I don't know how to properly format here just yet so I'll just post as an image)
That image is apparently a good way of sharing your result ;) Another step to assist in troubleshooting would be to remove the offending line using
grep -w -v 12956 yourfile.txt > outfile.tsv
with -v (for inverse, keep everything that doesn't match). Off course that's not desired to remove positions, but it might help to nail down the problem. If's its a larger problem you'll see plenty of other positions with errors, if it's just this... then it's a weird position worth investigating.I'm not familiar with Varscan so I looked for a manual and found this: http://dkoboldt.github.io/varscan/copy-number-calling.html I can't find where it specifies you should use
-B
in the mpileup command. Maybe you are following another resource?Thank you very much for your input, I'll definitely try it out!
The "-B" command was something I tried after looking through posts on seqanswers (Posted by "dkoboldt"):
"Hello, and thanks for posting this issue. Yes, VarScan does not expect to see a line with coverage=0 in a single-sample pileup file. VarScan v2.3.6 addresses this issue and should not crash. In either case, I recommend using two-sample mpileups for normal/tumor comparisons (somatic and copynumber), but doing so with the -B parameter (in samtools mpileup) to disable BAQ computation."
The -B command is a samtools option listed under "mpileup" on the samtools manual (http://www.htslib.org/doc/samtools.html). I figured, why not let's give it a try since I've been at my wit's end trying to get Varscan to work wit these samples =)
So I've tried just about everything and no success and I'm about to give up on ever graduating.
I've even tried cleaning my bam files with java -jar Picard.jar FixMateInformation I=Tumor.sort.mkdup.bam SO=coordinate
and that has still resulted in a "Parsing Exception Error".
I've posted this problem on the Varscan help forums but got no support. I even E-mailed Dan Koboldt, creator of the Varscan program, to no avail. My apologies if I seem to have lost my patience here, but it's a problem I've been struggling with since April to no avail and it's the one thing that's keeping me from graduating.
Would removing "chrMT" and "chrUnknown" from my bam files help solve this issue? If so, is there a proper way to remove those from my bam files? Would I also have to remove them from my reference fasta?
Thanks for your patience.