Regarding the SNP detection part of my question I just read through the Maq-FAQ.
If there are poly-allelic SNPs in the databases, they must have been detect somehow, and
yes, Maq can be used to detect poly-allelic SNPs, at least according to the user manual the call is:
maq cns2snp consensus.cns >cns.snp
Extract list of SNPs
Then following the FAQ:
Consensus Calling
- What do those "S", "M" and so on mean in the cns2snp output?
They are IUB codes for heterozygotes. Briefly:
M=A/C, K=G/T, Y=C/T, R=A/G, W=A/T, S=G/C,
D=A/G/T, B=C/G/T, H=A/C/T, V=A/C/G,
N=A/C/G/T
I still have to test this on real data but at least in theory this should work and all types of SNPs can be predicted. This does not tell you anything about the validity of the called SNPs though. Should have a look at the papers mentioned by Jarretinha.
Edit: SNP-detection
Did some more reading/thinking to get this right.
In principle polymorphism are studied by taking samples from members of (possibly multiple) populations (see e.g.: NCBI SNP primer, HapMap, Nature (2005)). If a second allele for a genomic position is prevalent in a significant (however this is defined, e.g. there was a >=1% criterium) part of the population, then it becomes a SNP. If a third or fourth allele is discovered at the same locus and meeting the detection criteria and is submitted, then this becomes what we find in the databases given the searches above.
To detect point mutations from a single sample by high-throughput sequencing is rather new and something very different.
Can this be called a SNP? Not immediately, because the prevalence in a population is not assessed. As Jarretinha stated, the number of point mutations that can be found for a single position depends on the ploidy of the organism. For human somatic cells (diploid) there at most two different alleles possible in the consensus sequence (found by Maq) if the marker is heterozygous (e.g. A/C). If the reference is different at that point, that might give rise to (e.g. T/A/C) that there exist are more than three alleles.
If the sample has higher ploidy, or of course due to sequencing/alignment errors, then more than more heterozygotes can be in the consensus, and that is why aligners support this.
That depend on your ploidy level. Cancer cell lines can be highly polyploid for certain regions/chromosomes. So, it's possible that a given locus carry 3 or more SNPs. That's common in cultivated plants, too. Wheat cultivars are typically dodecaploid. So, you question is relevant, but not easy.
When you say de novo, it means no reference sequence/population? Or smth like genotyping a intra-patient HIV population?
I meant using a reference sequence, maybe the use of de-novo is not very good here, I admit. MAQ can for example call SNPs from sequencing reads as far as I know. but would it also detect tri-allelic SNPS?
To correct myself, the 'de-novo' part of my question was stupid: of course, from a single individual at most two different alleles can be discovered. See my answer below.