lncRNA and mRNA analysis on Total RNA Sequencing data
1
0
Entering edit mode
5 weeks ago
kanika.151 ▴ 160

Question: Best Practices for Analyzing mRNA and lncRNA from totalRNA Sequencing Data

We performed totalRNA sequencing on our samples and conducted two separate analyses:

  1. mRNA analysis: Using the Gencode v46 GTF file and its corresponding index file for mapping.
  2. lncRNA analysis: Using a custom mapping index created with the lncBook 2.1 FASTA file, while still using the Gencode v46 lncRNA GTF merged with Lncipedia file for feature counting. For featureCounts, we applied the flags -d 200 and -D 3000000, ensuring only transcripts within 200–3,000,000 nucleotides are counted.

    Observations

  • Some lncRNA transcripts intergenic to a gene are found to be differentially expressed at the transcript level in the lncRNA analysis.
  • The corresponding gene is also found to be differentially expressed at the gene level in the mRNA analysis.

Questions

  1. Is this behavior expected, or could it indicate an issue with the mapping or feature counting process?
  2. Are the lncRNAs identified in the lncRNA analysis likely to be genuine lncRNAs, or could they be artifacts of the mapping/analysis process?
  3. Should I consider merging the Gencode v46 GTF file with the lncRNA GTF file and re-analyze the data?

Concerns

  • Are there any potential limitations or pitfalls in performing these analyses separately?
  • Would merging the annotations improve the results, or could it introduce additional biases or errors?

I would appreciate insights or suggestions on how others would approach this type of analysis. Thank you!

mRNA RNASeq lncRNA • 293 views
ADD COMMENT
0
Entering edit mode

lncRNAs are annotated in the standard GENCODE GTF file, so why not using this and then subset the count matrix for what you need?

ADD REPLY
0
Entering edit mode

I saw a separate GTF file for lncRNAs in the GENCODE website. I think only some of them are integrated in the comprehensive GTF file rather all of the lncRNAs. Do you know if all of them are integrated? I just edited my text. We used merged GTF of GENCODE, NONCODE and LNCipedia as not all the lncRNA transcripts are found in the GENCODE GTF. As lncRNA expression is lower than mRNA we thought of doing it separately. Do you think this will skew the results?

ADD REPLY
1
Entering edit mode
5 weeks ago

My advice is always: don't overthink it. People tend to get into overly pessimistic mindsets. I remember a student who spent months trimming data. She was always worried that she did not trim enough or trimmed too much. When we looked, the trimming had no effect whatsoever on the final results.

Use the single most complete reference that would contain all the valid sources for your observed DNA.

Then, there are always problems with mapping.

Notably with short reads it is impossible to tell where a read that matches exactly in different locations originated from. That's just a limitation we need to live with.

I would create a reusable and repeatable pipeline where you can automatically plug in various GTF and FASTA files and that produces some outcomes. I predict that almost nothing will change when you merge one GTF with another.

ADD COMMENT

Login before adding your answer.

Traffic: 1930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6