Entering edit mode
6.4 years ago
salamandra
▴
550
In ensembl it's easy to identify the anotation table with the genes for a genome: (we select the .gtf file that is NOT ‘abinitio’ or chromossome). But as I have a ucsc genome version, I heard should use the corresponding UCSC gene anotation file.
However, there are many options of annotation files in UCSC. This page has all of them.
Which one should I use?
I just want regular gene anotation, the equivalent to ensembl gene file that is NOT abinitio or chromossome.
Tried the table browser and it seems for recent assemblies, e.g. human GRCh38/hg38, there's only 'Old UCSC genes'. Then I found this page that says the default is now to use GENCODE for GRCh38 asembly. So better to use the GENCODE as is updated.
Be careful about jumping around with sequence/annotations. GENCODE provides official annotation for human and mouse genomes. Other providers may add their own (Ensembl/HAVANA, NCBI and UCSC) on top. While the underlying sequence should be identical (for a particular genome build) you would still want to use sequence/annotation combination from a single provider since mixing/matching can cause problems downstream (with read counting/visualization etc). This mainly happens since providers use different naming schemes for chromosomes.
Ok. And, should I use the 'Old UCSC genes' with human GRCh38/hg38 assembly or use the with the older hg19 version, or it doesn't matter?