69 Genomes Data Interpretation
1
1
Entering edit mode
12.1 years ago
Pappu ★ 2.1k

I am trying to analyze CompletePublicGenomes69genomesall_testvariants.tsv file. But I could not understand the meaning of keywords: NN , 00, 1N, 01,11,10,N1 etc. Please let me know the meaning of them. Thank you.

variantId       chromosome      begin   end     varType reference       alleleSeq       xRef    HG00731-200-37-ASM      
HG00732-200-37-ASM      HG00733-200-37-ASM      NA06985-200-37-ASM      NA06994-200-37-ASM      NA07357-200-37-ASM      
NA10851-200-37-ASM      NA12004-200-37-ASM      NA12877-200-37-ASM      NA12878-200-37-ASM      NA12879-200-37-ASM           NA12880-200-37-ASM      NA12881-200-37-ASM      NA12882-200-37-ASM      NA12883-200-37-ASM      NA12884-200-37-ASM          NA12885-L2-200-37-ASM   NA12886-L2-200-37-ASM   NA12887-L2-200-37-ASM   NA12888-200-37-ASM      NA12889-L2-200-37-    ASM   NA12890-200-37-ASM      NA12891-200-37-ASM      NA12892-L2-200-37-ASM   NA12893-200-37-ASM      NA18501-200-37-    ASM      NA18502-200-37-ASM      NA18504-200-37-ASM      NA18505-200-37-ASM      NA18508-200-37-ASM      NA18517-200-37-    ASM      NA18526-200-37-ASM      NA18537-200-37-ASM      NA18555-200-37-ASM      NA18558-200-37-ASM      NA18940-200-37-    ASM      NA18942-200-37-ASM      NA18947-200-37-ASM      NA18956-200-37-ASM      NA19017-200-37-ASM      NA19020-200-37-    ASM      NA19025-200-37-ASM      NA19026-200-37-ASM      NA19129-200-37-ASM      NA19238-L2-200-37-ASM   NA19239-L2-    200-37-ASM   NA19240-L2-200-37-ASM   NA19648-200-37-ASM      NA19649-200-37-ASM      NA19669-200-37-ASM      NA19670-    200-37-ASM      NA19700-200-37-ASM      NA19701-200-37-ASM      NA19703-200-37-ASM      NA19704-200-37-ASM      NA19735-    200-37-ASM      NA19834-200-37-ASM      NA20502-200-37-ASM      NA20509-200-37-ASM      NA20510-200-37-ASM      NA20511-    200-37-ASM      NA20845-200-37-ASM      NA20846-200-37-ASM      NA20847-200-37-ASM      NA20850-200-37-ASM      NA21732-200-37-ASM      NA21733-200-37-ASM      NA21737-200-37-ASM      NA21767-200-37-ASM

1       chr1    11013   11014   snp     G       A       dbsnp.125:    NN      NN      NN      NN      NN      NN      NN
  NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      1N
  NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN
  NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN
  NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN

2       chr1    11021   11022   snp     G       A       dbsnp.125:;dbsnp.129:       NN      NN      NN      NN
  NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      00      NN      NN      NN
  NN      NN      1N      00      00      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN
  NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN
  NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN      NN
  NN
genome format • 2.3k views
ADD COMMENT
3
Entering edit mode
12.1 years ago
JC 13k

Those fields refer to the individual genotypes: N=NoCall, 0=ReferenceAllele, 1=AlternativeAllele, therefore NN=NoCalls in both alleles, 00=reference homozygous, 1N=AlternativeAllele+NoCall, 01=heterozygous (Reference+Alternative), 10=heterozygous (Alternative+Reference), 11=AlternativeAllele homozygous, ...

You can learn more about Complete Genomic data formats in their website.

ADD COMMENT

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6