How To Distinguish Ref Allele And Null (Not Sequenced) In Ngs Snp Calling?
1
3
Entering edit mode
13.1 years ago
H. Won ▴ 30

Dear all, Hello.

After I sequence tens of individuals, for example, I would like to summarize the SNP information in a matrix table (x: samples, y: SNPs).

I guess that most tools call only variant alleles for each individuals.

My question is, when some allele for one particular individual is missing, how to know if it is a reference allele OR just missing due to the absence of mapping reads in that area.

I think looking at coverage of mapping reads is one possible option, but expect that other researchers already developed some methods for this issue.

It will be greatly thankful if you suggest any ideas, methods, or reference papers.

Thank you very much.

next-gen snp calling reference coverage • 3.4k views
ADD COMMENT
1
Entering edit mode

Are you asking how people are representing null or no-calls in data output formats ? Or a method for determining whether an allele is no-called or reference. The former is represented as '.' vs. '0' for ref in VCF. The latter is a matter of debate, however heuristics still seem to be the order of the day.

ADD REPLY
0
Entering edit mode

My question was how to determine null (missing due to low or no coverage) and reference allele? For example, one can have A/A, A/B, or B/B where A is ref and B is var. In the NGS data, A/A and N/N can be distinguishable??? N is missing.

ADD REPLY
1
Entering edit mode
13.1 years ago

I wrote something like that for my (new-old-beta-private-public-i-don't-know )package "variation toolkit". This package contains a program called groupbysnp. Here is an example (here the output has been 'verticalized' )

$  cat sample2vcf.tsv | scanvcf | grep -v "##"  |\ #concatenate all the VCF/sample
   sed 's/^#CHROM/#/' |\ #hack: i want the HEADER at the top after sorting
   sort -t '       ' -k1,1 -k2,2n -k4,4 -k5,5 -k11,11 |\ #sort on CHROM/POS/REF/ALT/SAMPLE
   sed 's/^#/#CHROM/' |\ #hack: i want the HEADER at the top after sorting
   groupbysnp -L 1,2,3,4,5 -T 6,7,8,9,10 --sample 11  -n Sample1,Sample2,Sample3,Sample4  |\#create the pivot table
   verticalize #as it is said...

>>> 2
$1  #CHROM          1
$2  POS             753405
$3  ID              rs61770173
$4  REF             C
$5  ALT             A
$6  Sample1             .
$7  Sample1:QUAL        .
$8  Sample1:FILTER      .
$9  Sample1:INFO        .
$10 Sample1:FORMAT      .
$11 Sample1:CALL        .
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             Sample3
$19 Sample3:QUAL        99
$20 Sample3:FILTER      0
$21 Sample3:INFO        AC=2;DB=3;ST=0:0,3:32;DP=35;NC=-0.76;UM=3;CQ=...
$22 Sample3:FORMAT      GT:GQ:DP:FLT
$23 Sample3:CALL        1/1:99:35:0
$24 Sample4             .
$25 Sample4:QUAL        .
$26 Sample4:FILTER      .
$27 Sample4:INFO        .
$28 Sample4:FORMAT      .
$29 Sample4:CALL        .
$30 count.samples   1
<<< 2

>>> 3
$1  #CHROM          1
$2  POS             876499
$3  ID              rs4372192
$4  REF             A
$5  ALT             G
$6  Sample1             .
$7  Sample1:QUAL        .
$8  Sample1:FILTER      .
$9  Sample1:INFO        .
$10 Sample1:FORMAT      .
$11 Sample1:CALL        .
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             .
$19 Sample3:QUAL        .
$20 Sample3:FILTER      .
$21 Sample3:INFO        .
$22 Sample3:FORMAT      .
$23 Sample3:CALL        .
$24 Sample4             Sample4
$25 Sample4:QUAL        45
$26 Sample4:FILTER      0
$27 Sample4:INFO        AC=2;DB=1;ST=0:0,6:0;DP=6;NC=-3.05;UM=3;CQ=...
$28 Sample4:FORMAT      GT:GQ:DP:FLT
$29 Sample4:CALL        1/1:45:6:0
$30 count.samples   1
<<< 3

>>> 4
$1  #CHROM          1
$2  POS             877831
$3  ID              rs6672356
$4  REF             T
$5  ALT             C
$6  Sample1             .
$7  Sample1:QUAL        .
$8  Sample1:FILTER      .
$9  Sample1:INFO        .
$10 Sample1:FORMAT      .
$11 Sample1:CALL        .
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             .
$19 Sample3:QUAL        .
$20 Sample3:FILTER      .
$21 Sample3:INFO        .
$22 Sample3:FORMAT      .
$23 Sample3:CALL        .
$24 Sample4             Sample4
$25 Sample4:QUAL        39
$26 Sample4:FILTER      0
$27 Sample4:INFO        AC=2;DB=1;ST=0:0,2:2;DP=4;NC=0.40;UM=3;CQ=...
$28 Sample4:FORMAT      GT:GQ:DP:FLT
$29 Sample4:CALL        1/1:39:4:0
$30 count.samples   1
<<< 4

>>> 5
$1  #CHROM          1
$2  POS             879317
$3  ID              rs7523549
$4  REF             C
$5  ALT             T
$6  Sample1             CALL
$7  Sample1:QUAL        71
$8  Sample1:FILTER      0
$9  Sample1:INFO        AC=1;DB=1;ST=2:1,3:2;DP=8;NC=2.16;UM=3;CQ=...
$10 Sample1:FORMAT      GT:GQ:DP:FLT
$11 Sample1:CALL        0/1:34:8:0
$12 Sample2             .
$13 Sample2:QUAL        .
$14 Sample2:FILTER      .
$15 Sample2:INFO        .
$16 Sample2:FORMAT      .
$17 Sample2:CALL        .
$18 Sample3             .
$19 Sample3:QUAL        .
$20 Sample3:FILTER      .
$21 Sample3:INFO        .
$22 Sample3:FORMAT      .
$23 Sample3:CALL        .
$24 Sample4             .
$25 Sample4:QUAL        .
$26 Sample4:FILTER      .
$27 Sample4:INFO        .
$28 Sample4:FORMAT      .
$29 Sample4:CALL        .
$30 count.samples   1
<<< 5
ADD COMMENT
0
Entering edit mode

Thank you for the answer. In your output, dot (.) means reference allele or not sequenced (null or missing)?

ADD REPLY
0
Entering edit mode

no mutation was called for this variation for the given sample.

ADD REPLY
0
Entering edit mode

OK. Then, we still do not know whether dot(.) is R/R or N/N where R is a ref allele and N is missing due to low coverage.

ADD REPLY

Login before adding your answer.

Traffic: 3025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6