How to download Gene annotations (by Ensembl) track on NCBI Sequence Viewer 3.49.0
1
0
Entering edit mode
6 months ago
rahu • 0

I'm interested in an newer assembly of an organism available by NCBI but not by Ensembl. However, I still want to use the gene definitions by Ensembl. The annotation file on Ensembl has the coordiates based on the previous assembly. I see on NCBI Sequence Viewer 3.49.0 that it shows Ensembl gene annoations track suitable with the newer assembly I'm interested in. I assume that the coordinates of the gene annotation are corrected to match the newer assembly. However, NCBI Sequence Viewer 3.49.0 allows downloading a some range of a single chromosome. I wonder if there is a way of downloading the entire annoatation file matching the coordinates of the newer assembly by Ensembl, which is not available on the website of Ensembl.

Sequence-Viewer Ensembl NCBI • 1.1k views
ADD COMMENT
1
Entering edit mode

Can you post a screenshot (and provide some information about what organism this is)? Point out the track you are referring to. If it is precomputed then it may be available.

ADD REPLY
0
Entering edit mode

Thank you for your response!

The organism is bovine (Bos taurus). A contamination was discovered for the latest two assemblies ARS-UCD1.2 (bosTau9), ARS-UCD1.3 (bosTau9), both referred to as bosTau9.

Now, there is a newer assembly, ARS-UCD2.0 (bosTau9), whose screenshot I have attached. The second track named Genes, Ensembl release 112, is the one I would like to retrieve. screenhot of the NCBI genome viewer for ARS-UCD2.0

The annotation files on Ensembl website for bovine is based on the assembly ARS-UCD1.3. I would like to download Ensembl gene annotation file compatible with ARS-UCD2.0.

ADD REPLY
0
Entering edit mode

Hello, may I ask which annotation file you finally chose? I am using the ARS-UCD2.0 from RefSeq , but the RefSeq annotation file is too difficult for me to deal with, as many transcript_id values are missing. Can I use the Ensembl annotation file instead if I change the chromosome names?

ADD REPLY
0
Entering edit mode

It is generally safer to use the sequence/annotations from the same provider. So if you want to use Ensembl annotation use the corresponding genome http://ftp.ensembl.org/pub/rapid-release/species/Bos_taurus/GCA_002263795.3/ensembl/genome/Bos_taurus-GCA_002263795.3-unmasked.fa.gz

ADD REPLY
0
Entering edit mode

Thx, actually I have used the genome from RefSeq, so I have to choose this annotation file from RefSeq. I need to quantify the gene expression but the annotation file from RefSeq contains some items without transcript_id, It also contains some pseudo gene ,tRNA I don't know how to deal with it ,I am trying to find a solution. Could you give me some suggestions?

ADD REPLY
0
Entering edit mode

Are you using "transcript_ID" as key for counting with featureCounts? Then you should only get counts for those rows that have that key. Summarize at the gene level unless you have a specific need to do transcript level counts.

ADD REPLY
0
Entering edit mode

I’m using salmon pipeline and it will output the tans-level quantification values,but I need gene-level values, so I have to use tximport to get it which needs a file that contains information from transcript_id to gene_id. The gff file contains some items without transcript_id which will be dropped when generate the file that from transcript_id to gene id.

ADD REPLY
2
Entering edit mode
6 months ago
GenoMax 148k

Looks like ARS-UCD2.0 genome build is available via Ensembl rapid release: http://ftp.ensembl.org/pub/rapid-release/species/Bos_taurus/GCA_002263795.3/ensembl/geneset/2023_06/

ADD COMMENT

Login before adding your answer.

Traffic: 1742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6