After I sequence tens of individuals, for example, I would like to summarize the SNP information in a matrix table (x: samples, y: SNPs).
I guess that most tools call only variant alleles for each individuals.
My question is, when some allele for one particular individual is missing, how to know if it is a reference allele OR just missing due to the absence of mapping reads in that area.
I think looking at coverage of mapping reads is one possible option, but expect that other researchers already developed some methods for this issue.
It will be greatly thankful if you suggest any ideas, methods, or reference papers.
Are you asking how people are representing null or no-calls in data output formats ? Or a method for determining whether an allele is no-called or reference. The former is represented as '.' vs. '0' for ref in VCF. The latter is a matter of debate, however heuristics still seem to be the order of the day.
My question was how to determine null (missing due to low or no coverage) and reference allele? For example, one can have A/A, A/B, or B/B where A is ref and B is var. In the NGS data, A/A and N/N can be distinguishable??? N is missing.
I wrote something like that for my (new-old-beta-private-public-i-don't-know )package "variation toolkit". This package contains a program called groupbysnp. Here is an example (here the output has been 'verticalized' )
Are you asking how people are representing null or no-calls in data output formats ? Or a method for determining whether an allele is no-called or reference. The former is represented as '.' vs. '0' for ref in VCF. The latter is a matter of debate, however heuristics still seem to be the order of the day.
My question was how to determine null (missing due to low or no coverage) and reference allele? For example, one can have A/A, A/B, or B/B where A is ref and B is var. In the NGS data, A/A and N/N can be distinguishable??? N is missing.