I used galaxy to look up the nearest gene for a given set of variation,by comparing the variation start and end location with the gene retrieved from Knowngene table in UCSC browser, however I can only get the name as uc001aaa.3 how can I convert this UCSC ID into the ordinary gene symbol?
You probably need the known gene cross-reference table, aka: kgXref
But what do you mean by "ordinary gene name"? Is that a symbol, full name, or description? Official from HGNC, or some other source? Might need another linked kg table. But I'd bet money the one you want is in there. The same Galaxy query of UCSC ought to be able to give you that.
You can use BioMart for this conversion. Select 'ID List Limit' under filters, and pick UCSC ID form the drop down, then you can paste your identifiers into the box, or upload a file containing them. Pick the outputs you want from the 'Attributes' section.
I exported a query for the gene name you give in your question as Perl code, which will allow you to script the retrieval if you like:
# An example script demonstrating the use of BioMart API.
# This perl API representation is only available for configuration versions >= 0.5
use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;
my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/. For Biomart Central Registry navigate to
http://www.biomart.org/biomart/martservice?type=registry";
#
# NB: change action to 'clean' if you wish to start a fresh configuration
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
#
my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;
my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');
$query->setDataset("hsapiens_gene_ensembl");
$query->addFilter("ucsc", ["uc001aaa.3"]);
$query->addAttribute("ensembl_gene_id");
$query->addAttribute("ensembl_transcript_id");
$query->addAttribute("external_gene_id");
$query->formatter("TSV");
my $query_runner = BioMart::QueryRunner->new();
############################## GET COUNT ############################
# $query->count(1);
# $query_runner->execute($query);
# print $query_runner->getCount();
#####################################################################
############################## GET RESULTS ##########################
# to obtain unique rows only
# $query_runner->uniqueRowsOnly(1);
$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
#####################################################################
This is a C&P direct from the BioMart website, YMMV.
an example would help, but you're relying on a lot of people's data being in sync here. Mapping is always going to be a thorny issue, and imperfect in most ordinary scenarios.
OpenGene (https://github.com/OpenGene/OpenGene.jl) can do this very easily. The gencode_locate function will query gencode database to find whicn gene, and which exon/intron the position in.
using OpenGene, OpenGene.Reference
# load the gencode dataset, it will download a file from gencode website if it not downloaded before
# once it's loaded, it will be cached so future loads will be fast
index = gencode_load("GRCh37")
# locate which gene chr:pos is in
gencode_locate(index, "chr5", 149526621)
# it will return
# 1-element Array{Any,1}:
# Dict{ASCIIString,Any}("gene"=>"PDGFRB","number"=>1,"transcript"=>"ENST00000261799.4","type"=>"intron")
Hi,Mary, Thanks a lot! I mean gene symbol such as BRCA1, you are right on that , I will try to query kgXref table.