Do Any Next-Generation Sequencing Platforms Generate Reads With Ambiguous Bases (Other Than N)?
4
2
Entering edit mode
14.1 years ago
Bio_X2Y ★ 4.4k

I've never seen them in our own Illumina data, but I don't know if that's because the platform (including software) doesn't generate them, or if the users of the GA Pipeline in our core facilities selected certain parameters to suppress them.

next-gen sequencing • 3.3k views
ADD COMMENT
1
Entering edit mode
14.1 years ago
Rm 8.3k

I dont think so.. Even I never say them our Illumina reads. Sequencer's algorithm are tuned in a way that if the base signal is poor or ambigous then report it as N or only one with higher signal.

ADD COMMENT
1
Entering edit mode

For illumina, the intensity text files have four scores per base call (_int.txt) in this format: [?]t{%5.1f %5.1f %5.1f %5.1f}+ with value range [ -16384.0,16383.0]. you can use it to decide whether ot not ambiguous base

ADD REPLY
0
Entering edit mode

Many algorithms will only use the nucleotide with the highest probability and ‘call’ that location in the read, ignoring the other three probabilities.

ADD REPLY
0
Entering edit mode

Thanks for the link, it looks useful

ADD REPLY
0
Entering edit mode

Cheers for that, it ties in with Daniel's answer.

ADD REPLY
1
Entering edit mode
14.1 years ago

I don't think it would be useful as a correct coverage under a given position will quickly solve the ambiguity.

ADD COMMENT
1
Entering edit mode
14.1 years ago

I'm not 100% sure, but I do know that Illumina does create raw files at some point in the processing pipeline that have four probabilities at each position: 1 for each nucleotide. At some point in the pipeline, however, one of the 4 nucleotides is chosen as the most probable and its corresponding quality value is the one you see downstream (in, say, a Fastq file). I'm not sure how ambiguities are resolved if two nucleotides have nearly identical probabilities.

ADD COMMENT
0
Entering edit mode

thanks for the background info

ADD REPLY
1
Entering edit mode
14.1 years ago
Aaron Statham ★ 1.1k

The Illumina pipeline did at some point create .prb files, which had the probability for each of the 4 nucleotides. Some aligners take .prb files as input (eg slider using the extra information to try to make better SNP calls. I never really saw the point though!

ADD COMMENT
0
Entering edit mode

GNUMAP algorithm also does it similarly

ADD REPLY

Login before adding your answer.

Traffic: 1536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6