Question: Best Practices for Analyzing mRNA and lncRNA from totalRNA Sequencing Data
We performed totalRNA sequencing on our samples and conducted two separate analyses:
- mRNA analysis: Using the Gencode v46 GTF file and its corresponding index file for mapping.
lncRNA analysis: Using a custom mapping index created with the lncBook 2.1 FASTA file, while still using the Gencode v46 lncRNA GTF merged with Lncipedia file for feature counting. For featureCounts, we applied the flags
-d 200
and-D 3000000
, ensuring only transcripts within 200–3,000,000 nucleotides are counted.Observations
- Some lncRNA transcripts intergenic to a gene are found to be differentially expressed at the transcript level in the lncRNA analysis.
- The corresponding gene is also found to be differentially expressed at the gene level in the mRNA analysis.
Questions
- Is this behavior expected, or could it indicate an issue with the mapping or feature counting process?
- Are the lncRNAs identified in the lncRNA analysis likely to be genuine lncRNAs, or could they be artifacts of the mapping/analysis process?
- Should I consider merging the Gencode v46 GTF file with the lncRNA GTF file and re-analyze the data?
Concerns
- Are there any potential limitations or pitfalls in performing these analyses separately?
- Would merging the annotations improve the results, or could it introduce additional biases or errors?
I would appreciate insights or suggestions on how others would approach this type of analysis. Thank you!
lncRNAs are annotated in the standard GENCODE GTF file, so why not using this and then subset the count matrix for what you need?
I saw a separate GTF file for lncRNAs in the GENCODE website. I think only some of them are integrated in the comprehensive GTF file rather all of the lncRNAs. Do you know if all of them are integrated? I just edited my text. We used merged GTF of GENCODE, NONCODE and LNCipedia as not all the lncRNA transcripts are found in the GENCODE GTF. As lncRNA expression is lower than mRNA we thought of doing it separately. Do you think this will skew the results?