Question

How to download Gene annotations (by Ensembl) track on NCBI Sequence Viewer 3.49.0

0

Entering edit mode

6 months ago

rahu • 0

I'm interested in an newer assembly of an organism available by NCBI but not by Ensembl. However, I still want to use the gene definitions by Ensembl. The annotation file on Ensembl has the coordiates based on the previous assembly. I see on NCBI Sequence Viewer 3.49.0 that it shows Ensembl gene annoations track suitable with the newer assembly I'm interested in. I assume that the coordinates of the gene annotation are corrected to match the newer assembly. However, NCBI Sequence Viewer 3.49.0 allows downloading a some range of a single chromosome. I wonder if there is a way of downloading the entire annoatation file matching the coordinates of the newer assembly by Ensembl, which is not available on the website of Ensembl.

Sequence-Viewer Ensembl NCBI • 1.1k views

ADD COMMENT • link updated 3 months ago by kkun • 0 • written 6 months ago by rahu • 0

1

Entering edit mode

Can you post a screenshot (and provide some information about what organism this is)? Point out the track you are referring to. If it is precomputed then it may be available.

ADD REPLY • link 6 months ago by GenoMax 148k

0

Entering edit mode

Thank you for your response!

The organism is bovine (Bos taurus). A contamination was discovered for the latest two assemblies ARS-UCD1.2 (bosTau9), ARS-UCD1.3 (bosTau9), both referred to as bosTau9.

Now, there is a newer assembly, ARS-UCD2.0 (bosTau9), whose screenshot I have attached. The second track named Genes, Ensembl release 112, is the one I would like to retrieve. screenhot of the NCBI genome viewer for ARS-UCD2.0

The annotation files on Ensembl website for bovine is based on the assembly ARS-UCD1.3. I would like to download Ensembl gene annotation file compatible with ARS-UCD2.0.

ADD REPLY • link 6 months ago by rahu • 0

0

Entering edit mode

Hello, may I ask which annotation file you finally chose? I am using the ARS-UCD2.0 from RefSeq , but the RefSeq annotation file is too difficult for me to deal with, as many transcript_id values are missing. Can I use the Ensembl annotation file instead if I change the chromosome names?

ADD REPLY • link 3 months ago by kkun • 0

0

Entering edit mode

It is generally safer to use the sequence/annotations from the same provider. So if you want to use Ensembl annotation use the corresponding genome http://ftp.ensembl.org/pub/rapid-release/species/Bos_taurus/GCA_002263795.3/ensembl/genome/Bos_taurus-GCA_002263795.3-unmasked.fa.gz

ADD REPLY • link 3 months ago by GenoMax 148k

0

Entering edit mode

Thx, actually I have used the genome from RefSeq, so I have to choose this annotation file from RefSeq. I need to quantify the gene expression but the annotation file from RefSeq contains some items without transcript_id, It also contains some pseudo gene ,tRNA I don't know how to deal with it ,I am trying to find a solution. Could you give me some suggestions?

ADD REPLY • link 3 months ago by kkun • 0

0

Entering edit mode

Are you using "transcript_ID" as key for counting with featureCounts? Then you should only get counts for those rows that have that key. Summarize at the gene level unless you have a specific need to do transcript level counts.

ADD REPLY • link 3 months ago by GenoMax 148k

0

Entering edit mode

I’m using salmon pipeline and it will output the tans-level quantification values,but I need gene-level values, so I have to use tximport to get it which needs a file that contains information from transcript_id to gene_id. The gff file contains some items without transcript_id which will be dropped when generate the file that from transcript_id to gene id.

ADD REPLY • link 3 months ago by kkun • 0

score 2 · Answer 1 · 2024-06-10

2

Entering edit mode

6 months ago

GenoMax 148k

Looks like ARS-UCD2.0 genome build is available via Ensembl rapid release: http://ftp.ensembl.org/pub/rapid-release/species/Bos_taurus/GCA_002263795.3/ensembl/geneset/2023_06/

ADD COMMENT • link 6 months ago by GenoMax 148k