I have some paired end RNA-Seq from rats which is a 2 conditions x 4 replicates experiment design. My primary goal is to find DEGs between the 2 different conditions.
After QC and triming there are the clean_fastq files.
I want to follow the Nature Protocol Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. So first I have to get HISAT index file and GTF annotation file.
I first downloaded the index file HISAT provided on its website named R. norvegicus, UCSC rn6 genome, so now I need to get UCSC GTF file. But UCSC does not seem to provide that file.
So I try to get GTF file for Rat in Ensembl: Ensembl-GTF-Rat named Rattus_norvegicus.Rnor_6.0.92.gtf.gz. Next I assumed is to get HISAT index from Ensembl genome. But I got lost which file to get as I went to ftp://ftp.ensembl.org/pub/release-92/fasta/rattus_norvegicus/ folder. I totally did not know which file I should get to build index. I tried the /pub/release-92/fasta/rattus_norvegicus/dna/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz file and build the index, which resulted in a huge file that was much much bigger than the UCSC index HISAT provided and my PC coomplaind about out of memory several times. So I supposed I got the wrong file, again.
Could someone tell me what I am doing wrong? What should I do now? Thanks!
try ucsc table browser for rat gtf.
Thanks for your help. I'll try this.
Great! I got count data after using this GTF file. Thank you very much!
Dear colleague, I can't help you with this issue and I hope others can help you. Nevertheless, I would like to kindly ask you to try our DEWE tool (http://www.sing-group.org/dewe) to run your analysis. DEWE offers an easy way to run that workflow (HISAT2, StringTie and Ballgown) and we would be pleased to have your feedback on the usability of the tool. Thank you in advance.
OK. I may use it when my problems are solved