I have just received targeted deep sequencing fastq and bam files which have been generated on ion torrent platform. I am familiar with the GATK best practices pipeline, and have processed numerous whole exome and genomes. I would like to know if there are any key factors/pipeline differences that I need to be aware of, given the different platform of the new batch.
An exploratory look at one of my samples indicates very high rate of read duplicates (> 90%). I did expect a high rate of duplicates given the PCR nature of the experiment, but was still surprised by the above figure.
Currently, my pipeline performs the following steps in sequence:
- indexing the bam file
- reordering and sorting
- fixing read mate information
- de-duplication of reads [I wonder if this step should still be there]
- generating realignment intervals, and realigning reads around indels
- base quality score recalibration [I have doubts about this step, as on gatk page ion torrent is not explicitly listed as a supported platform http://goo.gl/DI93Ao]
The purpose of my analysis is to validate an initial set of mutation calls and look at variant allele frequency distribution.
I apologize if this post is a duplicate of an existing thread that I could not locate.
Thank you!
What sort of targeted sequencing was performed? Whole exome, disease biomarker, cancer panel? Any of these methods is expected to produce large numbers of exact duplicates, so I wouldn't worry too much about that.
You can also remove the "fix read mate" step since torrent sequencing won't be paired-end.
Thank you. The data is generated using ion torrent PGM. Do you know if base quality recalibration is applicable/valid here?
it is an old post but I would be curious to know how the analysis was carried out in the end. I have the same doubts regarding the duplicates, I also have torrent data and an AmpliSeq panel has been used and I have seen that by eliminating duplicates the number of variants found drops a lot but I'm not sure that eliminating duplicates is a step to take.