Hello everybody.
This is my first post, so don't hesitate to tell me if I'm not efficently clear in my explanations.
I would like to annotate a Illumina SNP file and I need to compare it to a Human Genome annotated file with the GRCh37 build (I don't care about de patch, just the build is important).
To be efficient in my comparison , I need several informations in the Human genome file.
I need at least :
- HGNC symbol
- GeneID
- start gene position (bp)
- end gene position (bp)
- chromosomeID
There is no real problem to get these informations, I found it in UCSC or Biomart.
But I have a problem with NCBI symbol starting with LOC (i.e : LOC100287633, LOC100128613 etc...)
I compared NCBI and UCSC informations, and I can find every LOC symbols in NCBI but not in UCSC or Biomart.
I know that there are a lot of LOC symbols which are "discontinued" or not updated, however plenty of these symbols are still reviewed in NCBI but unfindable in Biomart or UCSC or other databases.
I could download them from NCBI, but their "start and end positions (bp)" are updated to the GRCh38, and I absolutely need the GRCh37 positions.
So my question is: Do you know a web link, ftp link, where I can download all this information in a single file, or just to download LOC informations with GRCh37 build?
Thanks for your answers!
Guillaume
Hi Guillaume
Could you let me know how you output HGNC symbol from UCSC. I tried to do the same tasks as you did. But I just need the genes known to HGNC. For example, I used track=UCSC Genes and selected "geneSymbol". But the output listed some genes not known to HGNC in the column of
hg19.kgXref.geneSymbol
.Then I have trouble to annotate integenic SNPs. For example SNP rs188746275 should locate between (PABPC4L , PCDH18)
but the UCSC tables listed the cDNA genes such that the SNP was between BC032916 and BC031238 when I annotated it. Then BC032916 and BC031238 are not known to HGNC or NCBI.
Many thanks if you could guide me how to output the HGNC symbol.
Thanks!
Ake