Question

Ion Torrent Mapping

1

Entering edit mode

13.1 years ago

Ian 6.1k

I am new to Ion Torrent mapping, but have come to the conclusion that TMAP is the mapper of choice at the moment. Would anyone disagree with this statement?.

I have been looking at my Ion Torrent reads with FASTQC and have noticed an odd nucleotide distribution to the first nine bases. It almost looks like primer/linker, but is different for each sample. Has anyone else experienced this? Should the first N bases be removed from Ion Torrent reads?

UPDATE: A suggestion was made to use the --nogroup flag to avoid grouping together values of individual positions when reads are >50bp. However, this did not change the "odd" profile i see. I have now included a snapshot (truncated by me at 54bp).

enter image description here

ion-torrent mapping • 10.0k views

ADD COMMENT • link updated 11.2 years ago by Biostar 20 • written 13.1 years ago by Ian 6.1k

0

Entering edit mode

Just to double check, try running FASTQC with the --nogroup and see if the problem is still in the first 9 bases. By default it shows the first 9 positions ungrouped, and the remainder just get an average in nucleotide content.

ADD REPLY • link 13.1 years ago by John St. John ★ 1.2k

0

Entering edit mode

Thanks John, i tried your suggestion but the same odd distribution is now seen. I will edit my question to include a snapshot.

ADD REPLY • link 13.1 years ago by Ian 6.1k

0

Entering edit mode

wow, when you say it looks odd you really mean it.

ADD REPLY • link 13.1 years ago by John St. John ★ 1.2k

0

Entering edit mode

did you not get a mapping file with barcodes and primers with your data? what files did they provide you with?

ADD REPLY • link 13.1 years ago by caseyr547 • 0

0

Entering edit mode

I would not think too much and just removed first 23 bases. You can not make a mistake with this approach - worst thing that can happen is that you loose small piece of data and well, you can easily live with that :-)

ADD REPLY • link 12.2 years ago by Biomonika (Noolean) 3.2k

score 4 · Answer 1 · 2012-04-18

Given your nucleotide distribution, I do not see how the beginnings of these reads could be genomic. Perhaps your samples were multiplexed, and that is the barcode you are seeing? That would explain why the sequences are different in your different samples. At the very least I highly doubt that sequence is genomic, unless the reads all start at a very specific N-mer in the genome that is different for each sample (that seems like a very improbable explanation). Although I have never worked with ion torrent data before, I would definitely recommend getting rid of that part of those reads. It is just too weird.

Even the first 22 or 23 bases look fishy in terms of biases away from certain nucleotide calls. Quite a few programs out there work under the assumption that the beginnings of the reads are of the highest quality. Perhaps the Ion torrent software is built knowing that these kind of oddities can happen? I would probably just strip off the first 23 bases, and then use a read mapper that can handle indels like bowtie2 (not bowtie) or bwa. I might lean toward bowtie2 or bwa bwasw (rather than bwa aln followed by bwa sampe) for these since they are on the longer side. I really don't know anything about TMAP, is there literature stating that it is better for ion torrent reads than something like bwa or bowtie2, and showing a performance comparison?

Also what do you want to do with this data? If you are doing variant calling, then you want a really clean dataset, so err on the side of caution. Having strong position specific read biases like this can bias your variant calls, which is always embarrassing if you think you found something exciting when it is just data noise. After mapping your reads, I would feed the alignment through a pipeline like the raw data processing step that comes before the UnifiedGenotyper in Broad's variant calling pipeline (In the Genome Analysis Toolkit). This alignment processing pipeline has stages that attempt to identify these kind of position specific biases in reads, and then re-adjusts quality scores accordingly.

Anyways, good luck with this dataset!

score 4 · Answer 2 · 2012-04-19

4

Entering edit mode

13.1 years ago

Nick Loman ▴ 610

Almost certainly these are barcode sequences, have a chat with whoever ran the instrument for you to confirm.

ADD COMMENT • link 13.1 years ago by Nick Loman ▴ 610

score 2 · Answer 3 · 2013-02-21

2

Entering edit mode

12.2 years ago

pablo.riesgo ▴ 150

Sorry about getting back to this discussion, but seeing the per base sequence content diagram I interpret that the first 13 bases are almost exactly the same for every read. For every base position in this first 13 a single base gets to almost 100% occurrence. Not specific of Ion Torrent, I have already seen this before as primers with 454 data. I would say it is the primers that we are seeing, removing them would be the solution.

Cheers, Pablo.

ADD COMMENT • link 12.2 years ago by pablo.riesgo ▴ 150

0

Entering edit mode

Hi pablo.riesgo,

I have the same issue as you.

I don't know what to do.

Bernardo

ADD REPLY • link 12.1 years ago by biotech ▴ 570