I tried looking in mirbase for humans, but as the file header specifically states:
# Note, these sequences do not represent the full primary transcript,
# rather a predicted stem-loop portion that includes the precursor
# miRNA.
Looking in gencode did not help either. Grepping for mirna and looking for transcripts with a length of over 150 yielded only 19 results.
grep miRNA gencode.v21.annotation.gff3 | sort -V -k1,1 -k4,4n -k5,5n - | awk '{if ($5-$4>150 && b != $4 && c != $5) {a += 1; b = $4; c = $5; print $0}} END {print a}'
Where can I find primary transcripts for mirna (preferably with genomic coordinates so that I do not need to run bowtie myself)?
Hi,
why do you want the primary transcript? I don't know if they really exists. The positions in mirbase are the direct precursors of miRNAs. Normally this is enough to perform the majority of the analysis. If you tell us what is the main goal maybe we can tell more. Normally the full primary transcript is understood as the fragment from the TSS of the fragment (that will generate the short precursors) until the end (never heard of any approximation for this). It seems that the primary transcript is quite long and can generate multiple miRNAs precursors, and there are some papers that have described the promoter of these full transcript, but there is no a consensus here.