I have downloaded the reference for alignment of RNA-Seq with human transcriptome formThis link. I downloaded RefSeq transcripts from the link to use as a reference. I was not sure how do I get GTF file for this reference. I posted that question on Bio-stars a few days ago and I got an answer that I should download it from the UCSC table browser. So, I accordingly downloaded it from that source.
However, the GTF from table browser has sam egene_id and transcript_id which is not suitable for analysis using HTSeq
So, I have a couple of questions here.
what should I do in this case? I feel unsafe to edit GTF file
Is there any other way to get GTF for specific reference I am looking for which will be compatible with HTSeq?
I would highly recommend the GENCODE GTF, whose information fields contain the gene symbols that you want. I am almost certain that it is compatible with HTSeq.
Thank you. I am using GRCh38. I followed the link you provided. So can I use "Comprehensive gene annotation" the very first file on that link when the reference used is Human transcriptome(NCBI's RefSeq transcripts)?????
The basic gist is to download your table of interest, chop off some columns (may or may not be necessary depending on the specific table), then run the genePredToGtf utility:
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e "select * from refGene" hg19 | \
cut -f2- | genePredToGtf -source=hg19.refGene.ucsc file stdin stdout
Change stdout to the output filename you want in the last command to get an hg19 refGene GTF file:
Thank you. I am using GRCh38. I followed the link you provided. So can I use "Comprehensive gene annotation" the very first file on that link when the reference used is Human transcriptome(NCBI's RefSeq transcripts)?????
Yes, precisely.
Here is the direct link: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz
Here is the first record (DDX11L1 is 'always' the first gene, right at the beginning of the short arm of chr1)
Thank you very much. I was under the wrong impression that the GTF file for Human genome and Human transcriptome is different.
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Thanks for the information!