Question

miRNA alignment and count generation

0

Entering edit mode

12 weeks ago

omicon ▴ 40

Hi everyone,

I'm having some doubts and confusion regarding the alignment of miRNAs.

I'm currently working with miRNA-seq data. I've already completed the trimming and quality control steps, and now I want to identify all the miRNAs present in my sequences. My first idea was use Bowtie1 to align against the miRNA_mature.fa file from miRBase.

However, when I try to generate the count table with featureCounts using the hsa.gff3 file, I can't get any matches. It seems the issue is that the naming and structure in the miRNA_mature.fa file differ from the annotations in the hsa.gff3 file.

So now I'm wondering: was it incorrect to align directly to the miRNA_mature.fa file in the first place? Should I have aligned to something else?

I'm a bit confused because I've read that aligning against the miRNA_mature.fa reference can lead directly to count generation.

If anyone could clarify this or share how they handle this step, I'd really appreciate it!

aligment Mapping Bowtie miRNAs • 723 views

ADD COMMENT • link updated 12 weeks ago by i.sudbery 21k • written 12 weeks ago by omicon ▴ 40

0

Entering edit mode

I am also in the same boat bro, i got 000 countfeauture after aligned. After truobleshooting I found that sam file was failed. My sample was’not aligned to the refrence. I am still figuringout reason why aligments failed. So check ur sam file from tail or in middle

ADD REPLY • link 12 weeks ago by Jasim • 0

score 0 · Answer 1 · 2025-04-22

Align to the hairpins, not the mature miRNA file. This should get you started preparing a hsa hairpin file.

If your species differs, change the seqkit grep command accordingly and update the file names.

# Format miRBase hairpin file
if [ -f hairpin.fa ]; then
    :
else
    wget --no-check-certificate https://www.mirbase.org/download/hairpin.fa
fi
sed '#^[^>]#s#[^AUGCaugc]#N#g' hairpin.fa > hairpin_parse.fa
sed -i 's#\s.*##' hairpin_parse.fa
seqkit grep -r --pattern ".*hsa-.*" hairpin_parse.fa > hairpin_hsa.fa
seqkit seq --rna2dna hairpin_hsa.fa > tmp.fa
fasta_formatter -w 0 -i tmp.fa -o tmp1.fa
rm hairpin.fa hairpin_hsa.fa hairpin_parse.fa tmp.fa
mv tmp1.fa hairpin.fa

# Index miRBase hairpin file
bowtie-build hairpin.fa hairpin

fasta_formatter - FASTX toolkit - https://github.com/agordon/fastx_toolkit

Seqkit - https://github.com/annalam/seqkit

sounds like you already have bowtie installed.

score 0 · Answer 2 · 2025-04-22

If you are aligning to a transcriptome based index (like the mature or hairpin fasta files), then you don't need a gff. The gff tells read counting software, such as featureCounts, where the genes are within the sequences. However, in the case of the fasta file from mirBase the whole sequence is the gene - there is no location within the sequence that represents the gene.

Indeed, the coordinates in the gff3 file will be genome coordinates. They will say things like miR-123 is on chr1 between bases 1000000 and 1000100. The counter will then go into your alignment and look for things aligned to chr1 at those coordinates. But it won't find chr1 in your alignment file, the chromosome names will be things like miR-1, miR-2 etc. and they will be 20-100nt long (depending on whether you've aligned to mature or hairpin) - none of them will have a base 1000000 .

Instead, you just need to count the number of reads that align to the sequence - you don't care about where in the sequence they align.

If we align to a small RNA fasta like this, we just use samtools idxstats to retrieve the alignment count for each "contig" (miRNA in this case).