Looking Up Structural Variants Of Genes
1
0
Entering edit mode
12.9 years ago
Abaldwin ▴ 10

I am trying to find known variants of a set of genes. For example, take TRBV10-3. The link below lists four known alleles, whose sequences I can see in GenBank.

http://www.ncbi.nlm.nih.gov/gene/?term=TRBV10-3&SITE=NcbiHome&submit=Go

However, these alleles only differ by SNPs. Is this because the NCBI Gene website does not contain structural variant information?

I tried using the Database of Genomic Variants: http://dgvbeta.tcag.ca/dgv/app/home

But searching for the gene name doesn't work. I tried figuring out the chromosomal coordinates of the gene from the NCBI website but I have no idea what the right numbers are. The so-called 'Range' information in GenBank doesn't work, neither does the 'Accession Number'. I am very confused about what the universal genome location notation system is. The best I can do is 7q34 which however is too big a region.

As you can tell, I have no clue what I'm doing. I'd appreciate any help or links to simple guides.

Thanks!

structural genome • 2.4k views
ADD COMMENT
0
Entering edit mode

Can you explain where on the page from the first link, you see 4 alleles and data showing that they "differ by SNPs"? All I can see is a contig and a putative transcript.

ADD REPLY
1
Entering edit mode
12.9 years ago
Neilfws 49k

I think your confusion is due to the fact that this particular gene is not mapped to the genome.

If you look at the gene page for another gene, e.g. GUCA2B, you'll see in the Genomic context section:

Location : 1p34-p33
Sequence : Chromosome: 1; NC_000001.10 (42619092..42621495)

Whereas for TRBV10-3 you see:

Location : 7q34

and in the Summary section:

Annotation category: not annotated on reference assembly

In addition, the accession for the DNA sequence containing TRBV10-3 is NW_003571040.1. The NW_ prefix tells you that this is a contig, not a chromosome and may contain incomplete data. There are also clues in the gene description; "T cell receptor beta variable 10-3". This kind of gene is located in highly variable regions of the chromosome which are difficult to sequence and map accurately. It's therefore unlikely that there is sufficient, accurate sequence information to determine structural variants.

Note that searches using other HGNC symbols at the DGV database do return results - as an example, KIAA1199.

ADD COMMENT
0
Entering edit mode

Ah, so it's an unfortunate first introduction to these databases. The 4 alleles I mentioned are the sequences linked to on NCBI Gene as: Reference GRCh37.p5 PATCHES Alternate HuRef Alternate CRA_TCAGchr7v2 Some of these sequences match the 4 alleles on IMGT: http://www.imgt.org/IMGT_GENE-DB/GENElect?query=5.2+TRBV10-3&species=Homo+sapiens Except in IMGT, 2 of the alleles are shorter, missing nucleotides at the end, while the above seqs shows the SNPs, but not deletions. I wanted to see the flanking nucleotides of the shorter alleles. I guess these regions are just not well mapped. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6