Is it necessary to mark the duplicates, realign the data and recalibrate base quality score (VarScan2)?
2
0
Entering edit mode
6.0 years ago
Raheleh ▴ 260

Hello,

I have WES data of tumor samples with matched ones (paired-end, illumina). I trimmed them using trimmomatic (LEADING: 30, TRAILING:30, MINLEN:50) and aligned them against hg38 using bwa mem. I wann use VarScan2 to call somatic and germline variants. Is it necessary to mark the duplicates, realign the data and recalibrate base quality score before using VarScan2?

As I found mark duplicates and realign indels are not necessary, right?

I’d really appreciate any help!

VarScan2 WES data duplicate realignment • 3.3k views
ADD COMMENT
5
Entering edit mode
6.0 years ago
ATpoint 85k

As VarScan2 depends on samtools mpileup which allows specifying minimum requirements on base- and mapping quality, no strict filtering is IMHO required. Marking duplicates is a good and accepted option, removing them is unneccessary as mpileup ignores them if flagged appropriately. I prefer samblaster for on-the-fly marking of duplicates. It may or may not save you from some false-positives where PCR has over-amplified certain fragments. There is literature out there that shows that the overall effect is minimal. My preferred pipeline is basically:

bwa mem ${BWA_IDX} in_1.fastq.gz in_2.fastq.gz | \
  samtools fixmate -m -O SAM - - | \
  samblaster --ignoreUnmated | \
  sambamba view -f bam -S -l 0 -o /dev/stdout /dev/stdin | \
  sambamba sort --tmpdir=./ -l 5 -o out_sorted /dev/stdin

This gives you a sorted and duplicate-marked BAM file without any intermediate files.

As for VarScan2, also see this VarScan2 publication (but mind that is is quiet old and some options might be deprecated). Base recalibration and realignment is not explicitly recommended for VarScan and has not (to my knowledge) been shown to be truely beneficial, especially considering the computational expenses. There is quiet some literature on this available. I do not personally use it.

ADD COMMENT
0
Entering edit mode

Note that I edited the post and removed the /dev/stdin from the line with the BWA command, sorry was there by mistake.

ADD REPLY
0
Entering edit mode

Thanks ATpoint. I really appreciate if you answer my question here, as well?

ADD REPLY
2
Entering edit mode
6.0 years ago

Hello R.A. ,

you will find all type of opinions about these steps. Here are mine ;):

  • trimming your data isn't neccessary as long as you overall basequalitys are fine. You will throw away to much data that is usable, especially if the paired reads overlap
  • mark duplicates is fine, even if the impact to the whole dataset might be small. There are always region that are prefered amplified.
  • realign data and recalibrate base quality are method recommended in the best practice guidelines by GATK. The programs use for that are optimized to work together with the other programs provided by GATK. It isn't said, that it is useful when using other variant caller. Especially the impact of recalibrate the base quality is very low if you haven't low complexity data.

fin swimmer

ADD COMMENT
0
Entering edit mode

Many Thanks fin swimmer for your suggestion and quick reply. I used this command to remove the duplicates samtools rmdup sample.sorted.bam sample_rmdup.bam Is it fine or I just have to mark them not remove?

ADD REPLY
0
Entering edit mode

You're welcome.

Whether to remove or mark duplicates depends on your own attitude. I'm not a fan of removing anything from my alignment file. So I just mark them.

fin swimmer

ADD REPLY
0
Entering edit mode

Got it. Thanks!

Dear fin swimmer, could you please answer my question here. I really need help. Many thanks in advance!

ADD REPLY

Login before adding your answer.

Traffic: 2443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6