Hi
I'm very new to this lncRNA things. I'm using HISAT2, STRINGTIE and BALLGOWN pipeline for differential expression analysis.
If I'm only looking for lncRNA's, I will use the lncRNA annotation from Gencode lncRNA's [ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.long_noncoding_RNAs.gtf.gz]
First Question:
For eg: from this paper [https://www.nature.com/articles/nprot.2016.095]
Hisat2 command: Map the reads for each sample to the reference genome
hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188044_chrX_1.fastq.gz -2 chrX_data/samples/ERR188044_chrX_2.fastq.gz -S ERR188044_chrX.sam
Stringtie command: Assemble transcripts for each sample
stringtie -p 8 -G chrX_data/genes/chrX.gtf -o ERR188044_chrX.gtf –l ERR188044 ERR188044_chrX.bam
In which step I need to use the above gtf file (lncRNA annotation from Gencode lncRNA's) [ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.long_noncoding_RNAs.gtf.gz]
In Hisat2 or Strigtie step?
Second question:
If I want to get both protein coding RNAs and lncRNA's which gtf file should I use from Gencode [http://www.gencodegenes.org/releases/current.html] ?
In this way after differential expression analysis I will be having both differentially expressed protein coding RNA's and lncRNA's. How can I filter only lncRNA's from them?
It would be very helpful if you could clear my doubts. Thank you.
No I am also looking for novel lncRnas. Just to know how the command should be given if I need to use gtf file with hisat2?
If you're looking for novel lncRNAs then you'll need the GTF for both (unless you built the hisat index with it, in which case you only need it with stringTie).
You mean for hisat2 I need to create index files of Grch38 and the gtf file will be used for stringtie. Am I right ? How can I create index files?
I'll try this one last time. Your options are below:
--splice-sites
option, providing the aforementioned splice sites from the GTF file (again, see the hisat2 documentation).Regardless, you'll need to use the GTF file with stringTie.
Got it. Thankyou!! Could you also answer my second question which is mentioned above please.
Use one of the "comprehensive gene annotation" files (either PRI or CHR, depending on which genome you downloaded). Don't use the ALL regions GTF or fasta file (I prefer PRI).
Ok but how can I detect differential expressed lncRnas if the gtf file is with pcRna and lncRnas?
For that analysis only do the quantification with the lncRNAs. Only use stringTie to find novel genes, not to quantify them.
Hi Devon,
In this paper [https://www.nature.com/articles/ncomms14421] please check Figure A: After transcript assembly and merging they examined how the transcripts compare with the reference annotation. So for this I should Genecode annotation gtf file which I will be using for stringtie. From this we can assign pc and lncRNAs. Unannotated genes will be used for detecting novel lncRNAs. Am I right? In my case as I'm using Hisat2 and Stringtie pipeline I will use gffcompare [which is mentioned in the paper https://www.nature.com/articles/nprot.2016.095].
Sounds correct, I don't have time at the moment to thoroughly go through that paper.