How to download mm10 GTF file with the gene id and gene name using UCSC table browser?
2
2
Entering edit mode
4.9 years ago
John ▴ 270

Hi, what is the parameters I should put to download the same format GTF file like the first line of GTF file below, for mm10 ?

chr1    unknown exon    3214482 3216968 .   -   .   gene_id "Xkr4"; gene_name "Xkr4"; p_id "P14345"; transcript_id "NM_001011874"; tss_id "TSS25485";

I can download this format using the following parameters for mm9 but not for mm10!!!

Assembly: mm9
Group: Gene and Gene prediction tracks; 
Track: RefSeq genes; 
Table: refFlat
Output format: GTF

Thanks

RNA-Seq ucsc alignment • 14k views
ADD COMMENT
6
Entering edit mode
4.9 years ago
Luis Nassar ▴ 670

Hello,

Short answer: http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.refGene.gtf.gz

Long answer:

Due to the way the Table Browser forms queries, the Table Browser GTF output repeats the gene_id and transcript_id fields as such:

chr1    mm9_refFlat stop_codon  3206103 3206105 0.000000    -   .   gene_id "Xkr4"; transcript_id "Xkr4"; 

This is why we denote that output as "GTF (limited)". We have a wiki page for how to accomplish this properly (http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format) which comes down to using a separate utility for the conversion. Another reason this may have been confusing, is you did not see the same reFlat table available on the Table Browser. This is because in mm10/hg19/hg38, NCBI started releasing coordinates along with their annotation sequences. This means that to get the equivalent of your selection for mm10, you would use the following:

Assembly: mm9
Group: Gene and Gene prediction tracks; 
Track: NCBI RefSeq; 
Table: UCSC RefSeq (refGene)
Output format: GTF (limited)

Like refFlat, these are our own alignments of the NCBI sequences. However, due to the limited output you will not have the gene name (included in refFlat) unless you follow the wiki conversion.

We also have begun to offer these proper GTF files in our downloads directory. Here it is for mm10: http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/

The equivalent you will want to use will be http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.refGene.gtf.gz

If you have further questions, you can reach us at genome@soe.ucsc.edu. It may take us a little longer to answer questions on biostars.

ADD COMMENT
2
Entering edit mode

Hi Luis, What about the human? Can you share the gtf link for hg19 and hg38?

ADD REPLY
2
Entering edit mode

Yes, we are still in the process of making them available for all of our assemblies.

hg38 GTFs: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/

hg19 GTFs: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/

ADD REPLY
1
Entering edit mode

And what's the difference between refGene and ncbiRefSeq gtf?

ADD REPLY
1
Entering edit mode

The difference is the dataset they were sourced from. You can read about these different tracks in the description page (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=refSeqComposite).

ncbiRefSeq - RefSeq All – all curated and predicted annotations provided by RefSeq.
refGene - UCSC RefSeq – annotations generated from UCSC's realignment of RNAs with NM and NR accessions to the human genome. This track was previously known as the "RefSeq Genes" track.

Essentially ncbiRefSeq contains all transcripts including predicted. For refGene we pull out only the NM_* and NR_* sequences (mRNA and RNA) and we align them ourselves to the genome using BLAT. See this for NCBI prefixes (https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly). Removing these computationally predicted transcripts cuts the table nearly in half. hg38 refGene has 82,864 items and ncbiRefSeq has 166,923 items. You may also find this similar question helpful: A: RefGene: how to find the starts and ends of genes?

ADD REPLY
0
Entering edit mode
4.9 years ago
badribio ▴ 290

Like this?

ADD COMMENT
0
Entering edit mode

I can't see anything! thanks

ADD REPLY

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6