Hello,
I am new to sequencing analysis. I have a sam file that appears to be ordered by coordinate and and for reference sequence names gives numbers 1-22, X, Y, and MT, and 59 in the format GL000207.1. I am interested ultimately extracting miRNA read counts from this data. Any suggestions on how to go about this?
I think I need to annotate the file with the appropriate reference genome with miRNA coordinates. Is it possible to annotate a sam file?
No, there isn't a way to annotate a SAM file like this. At the very least, what you need is to find or create a miRNA annotation GTF and use featureCounts to count reads mapping to miRNA features. But you should note this SAM file uses an old and superseded genome reference, namely, GRCh37 (also known as hg19).
There are miRNA-specific pipelines, such as miRDeep2 and others, you should take a look the literature for miRNA analysis.
Lastly, a question: are the original reads miRNA specific, or is this an regular RNAseq library?