Annotation Of Strs Using Ucsc Tables
2
1
Entering edit mode
12.5 years ago

Hello Folks!

I have a file with some STRs generated from exome sequencing:

chr1 900683 900726 AAAT 4 10.5 -4,-4 1 1 0 -4:1 1

chr1 926889 926923 AAAAG 5 6.6 0,0 1 1 0 0:1 1

chr1 1112424 1112491 AC 2 33.5 0,0 1 1 0 0:1 1

chr1 1202260 1202285 AAAAT 5 5 0,0 1 1 0 0:1 1

chr1 1437233 1437278 AAAAC 5 9.2 -3,0 2 2 0 -3:1/0:1 1

chr1 1585271 1585316 AC 2 22.5 -2,-2 1 1 0 -2:1 1

chr1 1684347 1684375 AGG 3 9.3 0,3 5 5 0 0:4/3:1 1

chr1 1701408 1701454 AAAAAC 6 7.7 -6,-6 2 2 0 -6:2 1

chr1 1948272 1948311 AAAG 4 10.2 0,0 1 1 0 0:1 1

chr1 2189157 2189192 AAAT 4 8.8 0,0 1 1 0 0:1 1

chr1 2302649 2302680 AAC 3 10.3 0,0 1 1 0 0:1 1

chr1 2380938 2380975 AACC 4 9.2 0,0 1 1 0 0:1 1

And I was asked to annotate this with genes and exons for each line. But the problem is that I don't know which track and table I should use from UCSC.

My options are

-> Track: Refseq, Table: refgene

-> Track CCDS, Table: ccdsgene

-> Track UCSC genes, Table: Knowngene.

Which one I should use and why ? I developed a simple python script for that but I'm wondering, if there is a better way for doing this... :)

annotation • 2.4k views
ADD COMMENT
1
Entering edit mode
12.5 years ago
Mary 11k

If it's human, I'd consider a GENCODE track. There might be more in there than in the other sets because the "biotypes" they are annotating may be broader. http://www.gencodegenes.org/gencode_biotypes.html

But if you have to pick one of your listed ones, I would use UCSC genes/knowngenes because it contains both RefSeq and CCDS. See the description page here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=276948273&c=chr5&g=knownGene

ADD COMMENT
0
Entering edit mode
12.5 years ago
ff.cc.cc ★ 1.3k

You should use knowngene & knowntolocuslink tables with a query like this (SQL) :

SELECT hl.value, min(h.txStart), max(h.txEnd) FROM knowngene h
left join knowntolocuslink hl on h.name = hl.name)
where h.chrom like 'chr_xyz' and (
((h.txStart < bp) and (h.txEnd > bp))
or (h.txEnd > (bp-range)  and (h.txEnd < bp))
or (h.txStart < (bpStr+range) and (h.txStart > bp)) );

where: bp is a base position in chromosome chr_xyz and you want to look for genes around a certain range. hl.value is the geneID

ADD COMMENT

Login before adding your answer.

Traffic: 1792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6