I have miRNA-seq data (Single end) which I map to the whole UCSC hg19 genome. Now given the SAM output of this mapping, is there a way I can summarize the alignment over several genomic features:
- Unaligned
- Mature miRNA
- precursor miRNA
- piRNA
- lincRNA
- human Ribosomal RNA
- snoRNA
- human5S rDNA
- snRNA
Namely for each of the above features how many of my reads (or percentage) are aligned?
I know CLC-BIO or Illumina inbox software possibly already have that. But I'm looking for noncommercial and tweakable way to do it.
a correction on this: if you're interested in transcripts that have different identical copies on the genome (like I realized snRNAs have, for example), you cannot use the default options of HTSeq which discard multi-mapped reads. You should lower the value of the -a option and be also aware that HTSeq would still not count reads with the NH field indicating a multiple mapping. Even better and more easily, you could use RSEM.