annotate gene position using hg19
1
0
Entering edit mode
8.9 years ago
aydzhouyuan ▴ 40

Is there an online gene position annotation web that I can use to obtain the gene position? Or any other method? I have 1000 genes, below are the examples:

Genes I have:

PIK3CD
MPL

Output I need:

chr1 9711789 9789171 PIK3CD
chr1 43803474 43820134 MPL

Thanks

gene genome • 3.2k views
ADD COMMENT
0
Entering edit mode

Cant you get that from GTF file?

ADD REPLY
0
Entering edit mode

Gencode GTF

A: Converting gtf format to bed format

Ensembl GTF

ADD REPLY
1
Entering edit mode
8.9 years ago
Tej Sowpati ▴ 250

It won't be as simple as that, because there can be multiple transcripts from the same gene. In those cases, which transcript would you consider? Nevertheless, UCSC Table Browser is a good start. It lets you upload/paste a list of IDs and retrieve a variety of information corresponding to those IDs.

In your case, select:

Clade -> Mammal, genome -> Human, version -> hg19, group -> Genes and Gene Predictions, track -> UCSC Genes, table -> knownGene, paste/upload your list of gene names in 'identifiers' section, output format -> Selected fields from primary and secondary tables, give output file name if you want to save the data in a file and click get output.

In the second page, select chrom, txStart, txEnd from hg19.knownGene, and geneSymbol from hg19.kgXref table. I would also recommend selecting kgID from hg19.kgXref (also known as UCSC ID), which are unique for each transcript variant.

Output for your gene names:

#hg19.knownGene.chrom	hg19.knownGene.txStart	hg19.knownGene.txEnd	hg19.kgXref.kgID	hg19.kgXref.geneSymbol
chr1	9711789	9775827	uc001aqa.2	PIK3CD
chr1	9711789	9789172	uc001aqb.4	PIK3CD
chr1	9770162	9789172	uc001aqe.4	PIK3CD
chr1	43803474	43815208	uc001civ.3	MPL
chr1	43803474	43820135	uc001ciw.3	MPL
chr1	43803474	43820135	uc009vwr.3	MPL
chr1	9751524	9789172	uc010oaf.2	PIK3CD
chr1	9751524	9789172	uc021ogb.1	PIK3CD

Cheers,

TEJ

ADD COMMENT
0
Entering edit mode

Thanks, Tej.

I tried as you suggested, in the knownGene part, I pasted the gene names:

PIK3CD
MPL

But doing as you suggested, there is no output, only showed

#hg19.knownGene.chrom   hg19.knownGene.txStart  hg19.knownGene.txEnd    hg19.kgXref.kgID    hg19.kgXref.geneSymbol

I then check the description for Paste in identifiers for UCSC Genes, it showed below,

Please paste in the identifiers you want to include. The items must be values of the name field of the currently selected table, knownGene, or the alias field of the alias table kgAlias. (The "describe table schema" button shows more information about the table fields.) Some example values:

uc001yuc.1
uc002fpj.1
uc010nxl.1
NP_055440
NP_001611
Q9H7G3

Looks like I couldn't directly paste the gene names here?

Any comments?

ADD REPLY
0
Entering edit mode

That's very strange. I'm getting the output at my end. Can you change the table to kgXref and see if your IDs are getting recognized?

ADD REPLY

Login before adding your answer.

Traffic: 1831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6