*Hi all! I know that there are some similar topics and I read them but it didn’t solve my issue…
I’m working with smRNA-seq data (plants) and my aim is to find novel and known miRNAs, then perform DE expression analysis and find some correlations between miRNA and mRNA expression profiles.
I preprocced smRNA reads: 1) trimmed adapters using Illumina documentation I found only one adapter sequence actually and removed it with cutadapt. Command:
~/.local/bin/cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -o trimmed.fq.gz in.fq.gz
98.5% reads with adapters, 41,2% total written
2) then I filtered tRNA and rRNA contamination using bbduk.sh
3) kept reads that are from 19 to 26 nt using cutadapt
Original reads: 17 483 425 sequences Clean reads: 6 592 972 sequences
And here is the issue. I aligned my clean reads against precursor miRNA sequences downloaded from mirbase to identify known miRNAs. I used both bowtie and bbduk.sh but alignment rate is very low, about 5%. What could cause it?
Did you replace the
U
bases withT
(e.g.sed 's/U/T/g'
) in miRBase sequences that you downloaded before doing the alignments?My hairpin file doesn't contain U bases at all
Thank you
Great. I am not sure what kit was used for your dataset but some kits directly ligate a special adapter to miRNA. So unless that adapter is present the read likely does not represent a miRNA. Checking to see if this applies in your case.
can you show us some of the putative miRNAs that did not map?