Hello,
I am sorry for the very basic and newbie question but I am confusing and getting lost in my thoughts. I have WES data of tumor samples with matched ones extracted from 13 patients (paired-end, illumina). I used BWA mem to align them against hg38, and used VarScan2 to call somatic variations. Now I have 6 files (fpfilter_Passed.vcf) for each patient (Somatic, LOH, Germline for each of snp and indel variations). My question is, what I have to do in this step? Should I merge all 6 files together?
Afterwards should I merge the VCF file of all patients together and then use SnpEff for downstream analysis?
This is the all commands that I used till this step:
#Trim using Trimmomatic
java -jar Trimmomatic-0.38/trimmomatic-0.38.jar PE normal_1_1.fastq normal_1_2.fastq -baseout sample_1 LEADING:30 TRAILING:30 MINLEN:50
#Align against ref using bwa
bwa index hg38.fa
bwa mem hg38.fa normal_1_1P normal_1_2P > normal_1.sam
#Convert sam to bam
samtools view -bS normal_1.sam > normal_1.bam
samtools sort normal_1.bam -o normal_1.sorted.bam
samtools index normal_1.sorted.bam
# Get the raw variants:
samtools mpileup -q 20 -Q 25 -B -d 1000 -f hg38.fa normal_1.sorted.bam tumor_1.sorted.bam | java -jar VarScan.v2.4.3.jar somatic /dev/stdin outputName -mpileup --strand-filter 1 --output-vcf
# Classify into Germ, LOH and Somatic:
java -jar VarScan.v2.4.3.jar processSomatic *.snp.vcf --max-normal-freq 0.01
java -jar VarScan.v2.4.3.jar processSomatic *.indel.vcf --max-normal-freq 0.01
#Preparing BED file
egrep -hv "^#" Germline.hc.vcf | awk 'OFS="\t" {print $1, $2-1, $2+1}' | sort -k1,1 -k2,2n | bedtools merge -i - > Germline.hc.bed
#Run bam-readcount:
bam-readcount -f hg38.fa -q 20 -b 25 -d 1000 -l Germline.hc.bed -w 1 normal_1.bam > Germline.hc.bamRC
#Run fpfilter for all somatics, germlines and LOHs for both snp and indel:
java -jar VarScan.v2.4.3.jar fpfilter outputName.snp.Germline.hc.vcf Germline.hc.bamRC --output-file Germline.hc.fpfilterPassed.vcf --filtered-file Germline.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.snp.LOH.hc.vcf LOH.hc.bamRC --output-file LOH.hc.fpfilterPassed.vcf --filtered-file LOH.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.snp.Somatic.hc.vcf Somatic.hc.bamRC --output-file Somatic.hc.fpfilterPassed.vcf --filtered-file Somatic.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.indel.Germline.hc.vcf Germline.hc.bamRC --output-file indel.Germline.hc.fpfilterPassed.vcf --filtered-file indel.Germline.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.indel.LOH.hc.vcf LOH.hc.bamRC --output-file indel.LOH.hc.fpfilterPassed.vcf –filtered-file indel.LOH.hc.fpfilterFailed.vcf
java -jar VarScan.v2.4.3.jar fpfilter outputName.indel.Somatic.hc.vcf Somatic.hc.bamRC --output-file indel.Somatic.hc.fpfilterPassed.vcf --filtered-file indel.Somatic.hc.fpfilterFailed.vcf
Can anyone help me out? Thanks!
Hello,
samtools mpileup
can merge multiple files, then usevarscan
to analyse, I don't know if this fits you.Thank you MatthewP. But my question is should I merge the files of all 13 patients together? I mean at the end for downstream analysis, there must be only one VCF file? Or for every patient, vcf file should be analysed separately?