Question

Do Any Next-Generation Sequencing Platforms Generate Reads With Ambiguous Bases (Other Than N)?

2

Entering edit mode

14.5 years ago

Bio_X2Y ★ 4.4k

I've never seen them in our own Illumina data, but I don't know if that's because the platform (including software) doesn't generate them, or if the users of the GA Pipeline in our core facilities selected certain parameters to suppress them.

next-gen sequencing • 3.6k views

ADD COMMENT • link updated 14.5 years ago by Aaron Statham ★ 1.1k • written 14.5 years ago by Bio_X2Y ★ 4.4k

score 1 · Answer 1 · 2010-11-11

1

Entering edit mode

14.5 years ago

Rm 8.3k

I dont think so.. Even I never say them our Illumina reads. Sequencer's algorithm are tuned in a way that if the base signal is poor or ambigous then report it as N or only one with higher signal.

ADD COMMENT • link 14.5 years ago by Rm 8.3k

1

Entering edit mode

http://genepool.bio.ed.ac.uk/nextgenbug/_media/meeting/20081007/illumina_workflow.ppt

ADD REPLY • link 14.5 years ago by Rm 8.3k

1

Entering edit mode

For illumina, the intensity text files have four scores per base call (_int.txt) in this format: [?]t{%5.1f %5.1f %5.1f %5.1f}+ with value range [ -16384.0,16383.0]. you can use it to decide whether ot not ambiguous base

ADD REPLY • link 14.5 years ago by Rm 8.3k

0

Entering edit mode

Many algorithms will only use the nucleotide with the highest probability and ‘call’ that location in the read, ignoring the other three probabilities.

ADD REPLY • link 14.5 years ago by Rm 8.3k

0

Entering edit mode

Thanks for the link, it looks useful

ADD REPLY • link 14.5 years ago by Bio_X2Y ★ 4.4k

0

Entering edit mode

Cheers for that, it ties in with Daniel's answer.

ADD REPLY • link 14.5 years ago by Bio_X2Y ★ 4.4k

score 1 · Answer 2 · 2010-11-11

1

Entering edit mode

14.5 years ago

Pierre Lindenbaum 166k

I don't think it would be useful as a correct coverage under a given position will quickly solve the ambiguity.

ADD COMMENT • link 14.5 years ago by Pierre Lindenbaum 166k

score 1 · Answer 3 · 2010-11-11

1

Entering edit mode

14.5 years ago

Daniel Standage 4.1k

I'm not 100% sure, but I do know that Illumina does create raw files at some point in the processing pipeline that have four probabilities at each position: 1 for each nucleotide. At some point in the pipeline, however, one of the 4 nucleotides is chosen as the most probable and its corresponding quality value is the one you see downstream (in, say, a Fastq file). I'm not sure how ambiguities are resolved if two nucleotides have nearly identical probabilities.

ADD COMMENT • link 14.5 years ago by Daniel Standage 4.1k

0

Entering edit mode

thanks for the background info

ADD REPLY • link 14.5 years ago by Bio_X2Y ★ 4.4k

score 1 · Answer 4 · 2010-11-12

1

Entering edit mode

14.5 years ago

Aaron Statham ★ 1.1k

The Illumina pipeline did at some point create .prb files, which had the probability for each of the 4 nucleotides. Some aligners take .prb files as input (eg slider using the extra information to try to make better SNP calls. I never really saw the point though!