Expression analysis of LncRNA from RNA-seq data
0
1
Entering edit mode
9 months ago
analyst ▴ 50

I have RNAseq data and I have to identify novel LncRNA and to perform DGE analysis. I have used following approach to find novel lncrnas:

Alignment through hisat2, assembly through stringtie, merging all assemblies through stringtie merge, classification of merged transcripts through gffcompare (assigning class codes), identifying novel transcripts (class code u).

But I don't understand how will I get LncRNA expression for each sample because I have identified novel transcripts from merged file. I have also performed DGE analysis for mRNA genes through deseq2. But I am confused which file do I need to prepare first to get lncRNA counts for each sample before moving to deseq2. please guide

Thanks in advance!

expression lncrna • 1.7k views
ADD COMMENT
1
Entering edit mode

How is mRNA DE analysis any different from lncRNA DE analysis if you're comparison isoform level metrics?

ADD REPLY
0
Entering edit mode

My aim is to identify lncRNA differential expression from RNA-seq data. Till stringtie step, I have individual assembly files if I use these assembly files for deseq2 it is same like we are analysing expression of mRNA. After stringtie do I need to identify ncrna for each sample individualy and then should I move to deseq2? I am stuck at this step.

ADD REPLY
1
Entering edit mode

I'm not familiar with these tools and I don't understand the premise - are you not identifying features per sample already? How does it matter if a feature is mRNA or lncRNA if you can compare their level among samples? Or is the feature annotation process dependent on the merge process (which wouldn't make sense)

ADD REPLY
1
Entering edit mode

What organism is this for? If you are working with a model/well studied organism then you could simply use the locations of known lncRNA.

ADD REPLY
1
Entering edit mode

Kindly give your valuable suggestions on following strategy:

  1. make a gene count file through featurecounts
  2. make a text file containing ids of known lncrna.
  3. filter gene count file using text file(made in step 2). This way we will have count file for lncrna only.
  4. perform dge analysis for lncrna count file

Is this appropriate approach to perform lncrna expression analysis?

ADD REPLY
2
Entering edit mode

It is not at all clear to me that filtering that early is advisable. Removing most of your RNA counts is going to alter normalization and dispersion estimates, and probably not in a good way. I'd keep all of the gene counts for all the genes, process data for all the genes, and then at the end, if you only care about lncRNA, filter away the results you don't care about.

ADD REPLY
0
Entering edit mode

Thanks swbarnes2, I will follow your valuable suggestion

ADD REPLY
1
Entering edit mode

Listen to swbarnes2 and rethink your entire approach.

ADD REPLY
0
Entering edit mode

Thanks GenoMax!

Its Arabidopsis thaliana

ADD REPLY
0
Entering edit mode

To locations do you mean chromosomal coordinates?

ADD REPLY
1
Entering edit mode

lncRNA for Arabidopsis are annotated : https://rnacentral.org/search?q=Arabidopsis%20thaliana%20AND%20so_rna_type_name:%22LncRNA%22

They should be included in the GTF file you probably have. They are in Ensembl GTF.

ADD REPLY
0
Entering edit mode

Yes GenoMax I used annotated GTF while constructing assembly through stringtie.

ADD REPLY
1
Entering edit mode

Arabidiosis is a thoroughly studied organism, I don't think you can make a better assembly with stringtie. Just use the genomic coords that are already documented in the gtf.

ADD REPLY
0
Entering edit mode

I have modified the approach as follows:

  1. Alignment through hisat2
  2. Quantification through FeatureCounts (it outputs count file)

Now the question is how can I find the expression of lncrna from my RNAseq samples

ADD REPLY
1
Entering edit mode

What is your featureCounts command? You must be using a GTF file somewhere, dig into it.

ADD REPLY
0
Entering edit mode

Thanks Ram, I am using following command:

featureCounts -T 64 -p -s 0 -a AT10.58.gtf -o counts.txt *.bam -t exon -g gene_id
ADD REPLY
1
Entering edit mode

Look into AT10.58.gtf for references to lncRNA.

ADD REPLY
0
Entering edit mode

Do you mean that I should grep lncRNA from reference gtf and use that ?

grep "lncRNA" AT10.58.gtf > lncrna.gtf

ADD REPLY
2
Entering edit mode

There should gene_ID/ID counts that are labeled as lncRNA.

An example from Ensembl's GTF (lncRNA is under biotype).

1       araport11       ncRNA_gene      559735  560251  .       +       .       ID=gene:AT1G04137;Name=AT1G04137;biotype=lncRNA;description=ncRNA [Source:NCBI gene (formerly Entrezgene)%3BAcc:28716059];gene_id=AT1G04137;logic_name=araport1
ADD REPLY
1
Entering edit mode

Thanks GenoMax! Above command filters exactly what you presented.

Thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 2477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6