Question

NGS preprocessing pipleine for ion torrent data

0

Entering edit mode

8.8 years ago

Noushin N ▴ 600

I have just received targeted deep sequencing fastq and bam files which have been generated on ion torrent platform. I am familiar with the GATK best practices pipeline, and have processed numerous whole exome and genomes. I would like to know if there are any key factors/pipeline differences that I need to be aware of, given the different platform of the new batch.

An exploratory look at one of my samples indicates very high rate of read duplicates (> 90%). I did expect a high rate of duplicates given the PCR nature of the experiment, but was still surprised by the above figure.

Currently, my pipeline performs the following steps in sequence:

indexing the bam file
reordering and sorting
fixing read mate information
de-duplication of reads [I wonder if this step should still be there]
generating realignment intervals, and realigning reads around indels
base quality score recalibration [I have doubts about this step, as on gatk page ion torrent is not explicitly listed as a supported platform http://goo.gl/DI93Ao]

The purpose of my analysis is to validate an initial set of mutation calls and look at variant allele frequency distribution.

I apologize if this post is a duplicate of an existing thread that I could not locate.

Thank you!

targeted-sequencing ion-torrent • 3.9k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 8.8 years ago by Noushin N ▴ 600

0

Entering edit mode

What sort of targeted sequencing was performed? Whole exome, disease biomarker, cancer panel? Any of these methods is expected to produce large numbers of exact duplicates, so I wouldn't worry too much about that.

You can also remove the "fix read mate" step since torrent sequencing won't be paired-end.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.8 years ago by ciclistadan ▴ 30

0

Entering edit mode

Thank you. The data is generated using ion torrent PGM. Do you know if base quality recalibration is applicable/valid here?

ADD REPLY • link 8.8 years ago by Noushin N ▴ 600

0

Entering edit mode

it is an old post but I would be curious to know how the analysis was carried out in the end. I have the same doubts regarding the duplicates, I also have torrent data and an AmpliSeq panel has been used and I have seen that by eliminating duplicates the number of variants found drops a lot but I'm not sure that eliminating duplicates is a step to take.

ADD REPLY • link 4.0 years ago by sarastrafella.ss ▴ 20