Question

Very low number of micro RNA reads

0

Entering edit mode

3.9 years ago

aranyak111 • 0

I am new to microRNA analysis. I have got sequences Homo_sapiens.GRCh38.miRNA.fasta representing the latest build of genome index few of whose first lines look like this.

 >hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p UGAGGUAGUAGGUUGUAUAGUU
 >hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p CUAUACAAUCUACUGUCUUUC
 >hsa-let-7a-2-3p MIMAT0010195 Homo sapiens let-7a-2-3p CUGUACAGCCUCCUAGCUUUCC
 >hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7b-5p UGAGGUAGUAGGUUGUGUGGUU
 >hsa-let-7b-3p MIMAT0004482 Homo sapiens let-7b-3p CUAUACAACCUACUGCCUUCCC

I converted all the Us to Ts using sed and awk scripts

The first few lines of the converted file look like this.

hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p TGAGGTAGTAGGTTGTATAGTT hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p CTATACAATCTACTGTCTTTC.

I have built the reference genome based on the bowtie script

/gpfs/ycga/scratch60/nicoli/ag2646/Bowtie
2.4.2/bowtie2-2.4.2-linux-x86_64/bowtie2-build Homo_sappeins.GRCH38_DNA.fasta Homo_sapiens_microRNA_sub_ref

I now want to align my trimmed FASTQ file of human microRNAs to this indexed genome . The first few lines of my trimmed FASTQ file looks like this

@SRR8248787.1601 HWI-D00306:1090:HKVGMBCX2:1:1101:8925:2469/1 TCACAGTGAACCGGTCTCTTTAGATCGGAAGATCACACGTCT
+ DDADDHHHEDF<E?CHHIEHIIIIFEE@ECCG@GHIHHD<<G @SRR8248787.1602 HWI-D00306:1090:HKVGMBCX2:1:1101:9104:2269/1 TGAGGTAGTAGTTTGTGCTGTTAGATCGGAAGAGCACACGTCT
+ DDDDDIIIIIIIHIIIHIIIIIHIHIIIIIIIHIIIIIIHHII @SRR8248787.1603 HWI-D00306:1090:HKVGMBCX2:1:1101:9222:2274/1

I am using Bowtie2 default scripts to align the reads that have been trimmed using BBDuk

The bowtie scripts look like this

module load  Bowtie2; bowtie2  -x Homo_sapiens_microRNA_sub_ref -U /gpfs/ycga/scratch60/nicoli/ag2646/HumanmicroRNAbowtie_verysensitive_1.06.2021./3kPa_HDF_miRNA_1.bbdukfastq
-S /gpfs/ycga/scratch60/nicoli/ag2646/HumanmicroRNAbowtie_verysensitive_1.06.2021./3kPa_HDF_miRNA_1_Gr38trimmed.sam

The log file that I get after alignment is

29980128 reads; of these:
  29980128 (100.00%) were unpaired; of these:
    29979332 (100.00%) aligned 0 times
    546 (0.00%) aligned exactly 1 time
    250 (0.00%) aligned >1 times
0.00% overall alignment rate

The scripts used to do adapter trimming was

bbduk.sh in=3kPa_HDF_miRNA_1.fastq out=3kPa_HDF_miRNA_1bbduk.fastq ref=polyA.fa.gz,truseq_rna.fa.gz k=13 ktrim=r useshortkmers=t mink=5 qtrim=r trimq=10 minlength

Any help will be useful.

Genomics • 1.0k views

ADD COMMENT • link updated 3.9 years ago by Jeremy Leipzig 23k • written 3.9 years ago by aranyak111 • 0

0

Entering edit mode

I've cleaned up your post this time, but please put some effort into formatting your posts better. I am sure this has been requested of you multiple times already.

ADD REPLY • link 3.9 years ago by Ram 45k

0

Entering edit mode

Simply using bbduk.sh may not be enough since this data may have a specific adapter on 3'-end that you will need to remove.

You should use bowtie v.1.x since with miRNA you need to do ungapped alignments.

Please use a pipeline such as https://scilifelab.github.io/courses/rnaseq/labs/smallRNA-lab

ADD REPLY • link 3.9 years ago by GenoMax 151k

0

Entering edit mode

i would align to the mirbase hairpins fasta. then do some manual alignment just using your text editor as a positive control, or sneak a sequence you know is a perfect match into your query. then do your bowtie alignment and get back to us.

ADD REPLY • link 3.9 years ago by Jeremy Leipzig 23k