Identifying differentially expressed lncRNA's from RNA-Seq data
1
1
Entering edit mode
7.0 years ago
Biologist ▴ 290

Hi

I'm very new to this lncRNA things. I'm using HISAT2, STRINGTIE and BALLGOWN pipeline for differential expression analysis.

If I'm only looking for lncRNA's, I will use the lncRNA annotation from Gencode lncRNA's [ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.long_noncoding_RNAs.gtf.gz]

First Question:

For eg: from this paper [https://www.nature.com/articles/nprot.2016.095]

Hisat2 command: Map the reads for each sample to the reference genome

hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188044_chrX_1.fastq.gz -2 chrX_data/samples/ERR188044_chrX_2.fastq.gz -S ERR188044_chrX.sam

Stringtie command: Assemble transcripts for each sample

stringtie -p 8 -G chrX_data/genes/chrX.gtf -o ERR188044_chrX.gtf –l ERR188044 ERR188044_chrX.bam

In which step I need to use the above gtf file (lncRNA annotation from Gencode lncRNA's) [ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.long_noncoding_RNAs.gtf.gz]

In Hisat2 or Strigtie step?

Second question:

If I want to get both protein coding RNAs and lncRNA's which gtf file should I use from Gencode [http://www.gencodegenes.org/releases/current.html] ?

In this way after differential expression analysis I will be having both differentially expressed protein coding RNA's and lncRNA's. How can I filter only lncRNA's from them?

It would be very helpful if you could clear my doubts. Thank you.

RNA-Seq lncrna ngs • 3.7k views
ADD COMMENT
0
Entering edit mode
7.0 years ago

Use the GTF file with hisat2 (unless you built the index with those splice sites already) and skip stringtie entirely. You're not looking for novel lncRNAs, so this doesn't benefit you. Instead, use featureCounts and then DESeq2, edgeR, or limma/voom.

Note that you end up having to modify the GTF file to get the splice sites for hisat2. This is done with hisat2_extract_splice_sites.py, which comes with hisat2.

ADD COMMENT
0
Entering edit mode

No I am also looking for novel lncRnas. Just to know how the command should be given if I need to use gtf file with hisat2?

ADD REPLY
0
Entering edit mode

If you're looking for novel lncRNAs then you'll need the GTF for both (unless you built the hisat index with it, in which case you only need it with stringTie).

ADD REPLY
0
Entering edit mode

You mean for hisat2 I need to create index files of Grch38 and the gtf file will be used for stringtie. Am I right ? How can I create index files?

ADD REPLY
1
Entering edit mode

I'll try this one last time. Your options are below:

  1. Build a hisat2 index with the splice sites from the GTF file (see the hisat2 documentation for details).
  2. Build or download a genomic hisat2 index and then use the --splice-sites option, providing the aforementioned splice sites from the GTF file (again, see the hisat2 documentation).

Regardless, you'll need to use the GTF file with stringTie.

ADD REPLY
0
Entering edit mode

Got it. Thankyou!! Could you also answer my second question which is mentioned above please.

ADD REPLY
1
Entering edit mode

Use one of the "comprehensive gene annotation" files (either PRI or CHR, depending on which genome you downloaded). Don't use the ALL regions GTF or fasta file (I prefer PRI).

ADD REPLY
0
Entering edit mode

Ok but how can I detect differential expressed lncRnas if the gtf file is with pcRna and lncRnas?

ADD REPLY
0
Entering edit mode

For that analysis only do the quantification with the lncRNAs. Only use stringTie to find novel genes, not to quantify them.

ADD REPLY
0
Entering edit mode

Hi Devon,

In this paper [https://www.nature.com/articles/ncomms14421] please check Figure A: After transcript assembly and merging they examined how the transcripts compare with the reference annotation. So for this I should Genecode annotation gtf file which I will be using for stringtie. From this we can assign pc and lncRNAs. Unannotated genes will be used for detecting novel lncRNAs. Am I right? In my case as I'm using Hisat2 and Stringtie pipeline I will use gffcompare [which is mentioned in the paper https://www.nature.com/articles/nprot.2016.095].

ADD REPLY
0
Entering edit mode

Sounds correct, I don't have time at the moment to thoroughly go through that paper.

ADD REPLY

Login before adding your answer.

Traffic: 2513 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6