Hello, I am new to NGS analysis, I now have some small RNA-seq data and I would like to quantify the expression levels of small nuclear RNA (snRNA) and small nucleolar RNA (snoRNA), I intend to use STAR to do the alignment and use featureCounts to get the number of counts, I use hg38 for the reference genome, but the problem is the GTF file I downloaded from ensembl (ftp://ftp.ensembl.org/pub/release-101/gtf/homo_sapiens/) contains all gene information and I don't know how to modify the original GTF file into the file needed (only snRNA and snoRNA information contained) for featureCounts.
Appreciated if you could help
You should be aware that the ensembl gene annotation is not necessarily complete when it comes to small RNAs. Things like RNACentral definately contain snoRNAs that the Ensembl gene set doesn't.
Similarly, you can use awk to get the lines for snRNA and snoRNA.
You could do the whole analysis using the whole GTF ie. use featureCounts followed by importing the counts table into R (I assume you will be doing differential gene expression analysis). After that, you could import the whole GTF file into R using rtracklayer::import() function and write a simple loop to extract only those rows from the count table whose rownames correspond to geneids for snRNA and snoRNA genes in the GTF.
The first two methods are the same.. but I do not exactly know whether you will get slightly different counts for the third method. They should be very similar, I guess, but it would be nice to try and see it.
You should be aware that the ensembl gene annotation is not necessarily complete when it comes to small RNAs. Things like RNACentral definately contain snoRNAs that the Ensembl gene set doesn't.