I am new to microRNA analysis. I have got sequences Homo_sapiens.GRCh38.miRNA.fasta representing the latest build of genome index few of whose first lines look like this.
>hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p MIMAT0010195 Homo sapiens let-7a-2-3p CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7b-5p UGAGGUAGUAGGUUGUGUGGUU
>hsa-let-7b-3p MIMAT0004482 Homo sapiens let-7b-3p CUAUACAACCUACUGCCUUCCC
I converted all the Us to Ts using sed and awk scripts
The first few lines of the converted file look like this.
hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p TGAGGTAGTAGGTTGTATAGTT hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p CTATACAATCTACTGTCTTTC.
I have built the reference genome based on the bowtie script
/gpfs/ycga/scratch60/nicoli/ag2646/Bowtie
2.4.2/bowtie2-2.4.2-linux-x86_64/bowtie2-build Homo_sappeins.GRCH38_DNA.fasta Homo_sapiens_microRNA_sub_ref
I now want to align my trimmed FASTQ file of human microRNAs to this indexed genome . The first few lines of my trimmed FASTQ file looks like this
@SRR8248787.1601 HWI-D00306:1090:HKVGMBCX2:1:1101:8925:2469/1 TCACAGTGAACCGGTCTCTTTAGATCGGAAGATCACACGTCT
+ DDADDHHHEDF<E?CHHIEHIIIIFEE@ECCG@GHIHHD<<G @SRR8248787.1602 HWI-D00306:1090:HKVGMBCX2:1:1101:9104:2269/1 TGAGGTAGTAGTTTGTGCTGTTAGATCGGAAGAGCACACGTCT
+ DDDDDIIIIIIIHIIIHIIIIIHIHIIIIIIIHIIIIIIHHII @SRR8248787.1603 HWI-D00306:1090:HKVGMBCX2:1:1101:9222:2274/1
I am using Bowtie2 default scripts to align the reads that have been trimmed using BBDuk
The bowtie scripts look like this
module load Bowtie2; bowtie2 -x Homo_sapiens_microRNA_sub_ref -U /gpfs/ycga/scratch60/nicoli/ag2646/HumanmicroRNAbowtie_verysensitive_1.06.2021./3kPa_HDF_miRNA_1.bbdukfastq
-S /gpfs/ycga/scratch60/nicoli/ag2646/HumanmicroRNAbowtie_verysensitive_1.06.2021./3kPa_HDF_miRNA_1_Gr38trimmed.sam
The log file that I get after alignment is
29980128 reads; of these:
29980128 (100.00%) were unpaired; of these:
29979332 (100.00%) aligned 0 times
546 (0.00%) aligned exactly 1 time
250 (0.00%) aligned >1 times
0.00% overall alignment rate
The scripts used to do adapter trimming was
bbduk.sh in=3kPa_HDF_miRNA_1.fastq out=3kPa_HDF_miRNA_1bbduk.fastq ref=polyA.fa.gz,truseq_rna.fa.gz k=13 ktrim=r useshortkmers=t mink=5 qtrim=r trimq=10 minlength
Any help will be useful.
I've cleaned up your post this time, but please put some effort into formatting your posts better. I am sure this has been requested of you multiple times already.
Simply using
bbduk.sh
may not be enough since this data may have a specific adapter on 3'-end that you will need to remove.You should use
bowtie v.1.x
since with miRNA you need to do ungapped alignments.Please use a pipeline such as https://scilifelab.github.io/courses/rnaseq/labs/smallRNA-lab
i would align to the mirbase hairpins fasta. then do some manual alignment just using your text editor as a positive control, or sneak a sequence you know is a perfect match into your query. then do your bowtie alignment and get back to us.