I have aligned raw RNA-seq reads to the Ensembl reference genome. I intend to quantify the expression using FeatureCounts of only say lincRNAs. What would be a better approach, use the full GTF file containing all types of RNAs or create a GTF containing only lincRNA and then use as input for FeatureCounts?
I tried both these approaches. For protein coding and lncRNA, the results were similar but a huge difference in case of miRNA.
What do you mean by that? miRNA's being small are likely to multi-map. You should be using a specific pipeline meant for miRNA, if you have that data. Normal mRNA protocols will generally not capture miRNA's.
Actually I tried with miRNA-seq data, aligned them to the reference genome and then in featureCounts used the full GTF (containing protein coding, lncRNA etc) and miRNA GTF.
The miRNA GTF was created using:
Even I suspect the difference might be due to multi-mapping.
Most paper I came across usually use miRBase reference and annotaion for miRNA-seq analysis. But, I was insisting on using the Ensembl GTF file as it contains miRBase annotations for mIRNA.
Which program did you use? miRNA's need un-gapped alignments.
Bowtie2 with vsl (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4931105/)