I've never seen them in our own Illumina data, but I don't know if that's because the platform (including software) doesn't generate them, or if the users of the GA Pipeline in our core facilities selected certain parameters to suppress them.
I've never seen them in our own Illumina data, but I don't know if that's because the platform (including software) doesn't generate them, or if the users of the GA Pipeline in our core facilities selected certain parameters to suppress them.
I dont think so.. Even I never say them our Illumina reads. Sequencer's algorithm are tuned in a way that if the base signal is poor or ambigous then report it as N or only one with higher signal.
I don't think it would be useful as a correct coverage under a given position will quickly solve the ambiguity.
I'm not 100% sure, but I do know that Illumina does create raw files at some point in the processing pipeline that have four probabilities at each position: 1 for each nucleotide. At some point in the pipeline, however, one of the 4 nucleotides is chosen as the most probable and its corresponding quality value is the one you see downstream (in, say, a Fastq file). I'm not sure how ambiguities are resolved if two nucleotides have nearly identical probabilities.
The Illumina pipeline did at some point create .prb files, which had the probability for each of the 4 nucleotides. Some aligners take .prb files as input (eg slider using the extra information to try to make better SNP calls. I never really saw the point though!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
http://genepool.bio.ed.ac.uk/nextgenbug/_media/meeting/20081007/illumina_workflow.ppt
For illumina, the intensity text files have four scores per base call (_int.txt) in this format: [?]t{%5.1f %5.1f %5.1f %5.1f}+ with value range [ -16384.0,16383.0]. you can use it to decide whether ot not ambiguous base
Many algorithms will only use the nucleotide with the highest probability and ‘call’ that location in the read, ignoring the other three probabilities.
Thanks for the link, it looks useful
Cheers for that, it ties in with Daniel's answer.