Question

How To Distinguish Ref Allele And Null (Not Sequenced) In Ngs Snp Calling?

3

Entering edit mode

13.7 years ago

H. Won ▴ 30

Dear all, Hello.

After I sequence tens of individuals, for example, I would like to summarize the SNP information in a matrix table (x: samples, y: SNPs).

I guess that most tools call only variant alleles for each individuals.

My question is, when some allele for one particular individual is missing, how to know if it is a reference allele OR just missing due to the absence of mapping reads in that area.

I think looking at coverage of mapping reads is one possible option, but expect that other researchers already developed some methods for this issue.

It will be greatly thankful if you suggest any ideas, methods, or reference papers.

Thank you very much.

next-gen snp calling reference coverage • 3.7k views

ADD COMMENT • link updated 13.7 years ago by Pierre Lindenbaum 166k • written 13.7 years ago by H. Won ▴ 30

1

Entering edit mode

Are you asking how people are representing null or no-calls in data output formats ? Or a method for determining whether an allele is no-called or reference. The former is represented as '.' vs. '0' for ref in VCF. The latter is a matter of debate, however heuristics still seem to be the order of the day.

ADD REPLY • link 13.7 years ago by Greg Tyrelle ▴ 70

0

Entering edit mode

My question was how to determine null (missing due to low or no coverage) and reference allele? For example, one can have A/A, A/B, or B/B where A is ref and B is var. In the NGS data, A/A and N/N can be distinguishable??? N is missing.

ADD REPLY • link 13.7 years ago by H. Won ▴ 30

score 1 · Answer 1 · 2011-10-24

I wrote something like that for my (new-old-beta-private-public-i-don't-know )package "variation toolkit". This package contains a program called groupbysnp. Here is an example (here the output has been 'verticalized' )

$  cat sample2vcf.tsv | scanvcf | grep -v "##"  |\ #concatenate all the VCF/sample
   sed 's/^#CHROM/#/' |\ #hack: i want the HEADER at the top after sorting
   sort -t '       ' -k1,1 -k2,2n -k4,4 -k5,5 -k11,11 |\ #sort on CHROM/POS/REF/ALT/SAMPLE
   sed 's/^#/#CHROM/' |\ #hack: i want the HEADER at the top after sorting
   groupbysnp -L 1,2,3,4,5 -T 6,7,8,9,10 --sample 11  -n Sample1,Sample2,Sample3,Sample4  |\#create the pivot table
   verticalize #as it is said...

>>> 2
$1  #CHROM          1
$2  POS             753405
$3  ID              rs61770173
$4  REF             C
$5  ALT             A
$6  Sample1             .
$7  Sample1:QUAL        .
$8  Sample1:FILTER      .
$9  Sample1:INFO        .
$10 Sample1:FORMAT      .
$11 Sample1:CALL        .
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             Sample3
$19 Sample3:QUAL        99
$20 Sample3:FILTER      0
$21 Sample3:INFO        AC=2;DB=3;ST=0:0,3:32;DP=35;NC=-0.76;UM=3;CQ=...
$22 Sample3:FORMAT      GT:GQ:DP:FLT
$23 Sample3:CALL        1/1:99:35:0
$24 Sample4             .
$25 Sample4:QUAL        .
$26 Sample4:FILTER      .
$27 Sample4:INFO        .
$28 Sample4:FORMAT      .
$29 Sample4:CALL        .
$30 count.samples   1
<<< 2

>>> 3
$1  #CHROM          1
$2  POS             876499
$3  ID              rs4372192
$4  REF             A
$5  ALT             G
$6  Sample1             .
$7  Sample1:QUAL        .
$8  Sample1:FILTER      .
$9  Sample1:INFO        .
$10 Sample1:FORMAT      .
$11 Sample1:CALL        .
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             .
$19 Sample3:QUAL        .
$20 Sample3:FILTER      .
$21 Sample3:INFO        .
$22 Sample3:FORMAT      .
$23 Sample3:CALL        .
$24 Sample4             Sample4
$25 Sample4:QUAL        45
$26 Sample4:FILTER      0
$27 Sample4:INFO        AC=2;DB=1;ST=0:0,6:0;DP=6;NC=-3.05;UM=3;CQ=...
$28 Sample4:FORMAT      GT:GQ:DP:FLT
$29 Sample4:CALL        1/1:45:6:0
$30 count.samples   1
<<< 3

>>> 4
$1  #CHROM          1
$2  POS             877831
$3  ID              rs6672356
$4  REF             T
$5  ALT             C
$6  Sample1             .
$7  Sample1:QUAL        .
$8  Sample1:FILTER      .
$9  Sample1:INFO        .
$10 Sample1:FORMAT      .
$11 Sample1:CALL        .
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             .
$19 Sample3:QUAL        .
$20 Sample3:FILTER      .
$21 Sample3:INFO        .
$22 Sample3:FORMAT      .
$23 Sample3:CALL        .
$24 Sample4             Sample4
$25 Sample4:QUAL        39
$26 Sample4:FILTER      0
$27 Sample4:INFO        AC=2;DB=1;ST=0:0,2:2;DP=4;NC=0.40;UM=3;CQ=...
$28 Sample4:FORMAT      GT:GQ:DP:FLT
$29 Sample4:CALL        1/1:39:4:0
$30 count.samples   1
<<< 4

>>> 5
$1  #CHROM          1
$2  POS             879317
$3  ID              rs7523549
$4  REF             C
$5  ALT             T
$6  Sample1             CALL
$7  Sample1:QUAL        71
$8  Sample1:FILTER      0
$9  Sample1:INFO        AC=1;DB=1;ST=2:1,3:2;DP=8;NC=2.16;UM=3;CQ=...
$10 Sample1:FORMAT      GT:GQ:DP:FLT
$11 Sample1:CALL        0/1:34:8:0
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             .
$19 Sample3:QUAL        .
$20 Sample3:FILTER      .
$21 Sample3:INFO        .
$22 Sample3:FORMAT      .
$23 Sample3:CALL        .
$24 Sample4             .
$25 Sample4:QUAL        .
$26 Sample4:FILTER      .
$27 Sample4:INFO        .
$28 Sample4:FORMAT      .
$29 Sample4:CALL        .
$30 count.samples   1
<<< 5