Entering edit mode
8.8 years ago
umn_bist
▴
390
So I found that the annotated GTF file for hg19 from UCSC table does not adhere to the standard GTF format. Thus, I've been getting a fatal error in STAR:
Fatal INPUT FILE error, no valid exon lines in the GTF file: /work/cellbiology/s167125/Documents/ucsc_hg19/ucsc.hg19.gtf Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.
I know that I can retrieve a good GTF file via genePredToGtf application but this is only compatible with Linux 64. I only have access to a Mac. I am wondering if there is an alternative method to retrieve a GTF for UCSC's hg19 reference genome.
Thank you for the help
Is there a reason you want to use the UCSC annotation? The one from Ensembl/Gencode is almost always better (there's a reason that UCSC now uses the copy from gencode).
Yes, so I checked the header of my refgenome (ucsc_hg19.fa) as well as its annotated gtf file (ucsc_hg19.gtf) and it uses 'chr' notation.
Digging further, I realized UCSC does not keep a GTF file of its gene structures - they are all in GenePred Format.
You can export the UCSC gene predictions in GTF from the table browser.
That is what I thought as well, but see this wiki page
To be honest, no. It's just something I had on hand and had generated the index using STAR already. I found that Alex Dobin of STAR recommends using genecode.
Yup, Gencode/Ensembl (they're more or less identical) are what you'll find most people (myself included) recommending.
@Devon Ryan, Could you say a bit more about why Ensemble annotation is better than UCSC's? Thanks!
It's more likely to represent the transcripts you see in your experiments.
@Devon Ryan, because Ensembl people curate the annotation better?
Ensembl and UCSC use completely different methods to arrive at the annotations (historically, at least for recent mouse and human annotations they should be the same).
It says the most likely issue is the chromosome naming convention. So it could be as simple as adding or removing a "chr" from the GTF or reference file.