NGS preprocessing pipleine for ion torrent data
0
0
Entering edit mode
8.8 years ago
Noushin N ▴ 600

I have just received targeted deep sequencing fastq and bam files which have been generated on ion torrent platform. I am familiar with the GATK best practices pipeline, and have processed numerous whole exome and genomes. I would like to know if there are any key factors/pipeline differences that I need to be aware of, given the different platform of the new batch.

An exploratory look at one of my samples indicates very high rate of read duplicates (> 90%). I did expect a high rate of duplicates given the PCR nature of the experiment, but was still surprised by the above figure.

Currently, my pipeline performs the following steps in sequence:

  1. indexing the bam file
  2. reordering and sorting
  3. fixing read mate information
  4. de-duplication of reads [I wonder if this step should still be there]
  5. generating realignment intervals, and realigning reads around indels
  6. base quality score recalibration [I have doubts about this step, as on gatk page ion torrent is not explicitly listed as a supported platform http://goo.gl/DI93Ao]

The purpose of my analysis is to validate an initial set of mutation calls and look at variant allele frequency distribution.

I apologize if this post is a duplicate of an existing thread that I could not locate.

Thank you!

targeted-sequencing ion-torrent • 3.9k views
ADD COMMENT
0
Entering edit mode

What sort of targeted sequencing was performed? Whole exome, disease biomarker, cancer panel? Any of these methods is expected to produce large numbers of exact duplicates, so I wouldn't worry too much about that.

You can also remove the "fix read mate" step since torrent sequencing won't be paired-end.

ADD REPLY
0
Entering edit mode

Thank you. The data is generated using ion torrent PGM. Do you know if base quality recalibration is applicable/valid here?

ADD REPLY
0
Entering edit mode

it is an old post but I would be curious to know how the analysis was carried out in the end. I have the same doubts regarding the duplicates, I also have torrent data and an AmpliSeq panel has been used and I have seen that by eliminating duplicates the number of variants found drops a lot but I'm not sure that eliminating duplicates is a step to take.

ADD REPLY

Login before adding your answer.

Traffic: 1723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6