Question

which alignment tool and which reference I should choose for miRNA-seq data analysis ?

0

Entering edit mode

8.5 years ago

jimmy_zeng ▴ 90

Hi, All: I am a newbie for the miRNA-seq data analysis, and I just want to follow a paper to get the same results.

I choose the paper: Aggarwal P, Turner A, Matter A, Kattman SJ et al. RNA expression profiling of human iPSC-derived cardiomyocytes in a cardiac hypertrophy model. PLoS One 2014;9(9):e108051. PMID: 25255322

The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292 , which was clean of adapter sequences.

I first use the fastx-toolkit to filter the bad quality reads .

I download the miRBase databae by the code below :

wget ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz    ##　28645　reads
wget ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.zip        ## 35828 reads 
wget ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.zip  ##
wget ftp://mirbase.org/pub/mirbase/CURRENT/genomes/hsa.gff3    ##
wget ftp://mirbase.org/pub/mirbase/CURRENT/miFam.dat.zip  ##

grep sapiens mature.fa |wc  # 2588 

grep sapiens hairpin.fa |wc   #1881 

## Homo sapiens 
perl -alne '{if(/^>/){if(/Homo/){$tmp=1}else{$tmp=0}};print if $tmp==1;}' hairpin.fa  >hairpin.human.fa

perl -alne '{if(/^>/){if(/Homo/){$tmp=1}else{$tmp=0}};print if $tmp==1;}' mature.fa  > mature.human.fa  


## step5 : alignment to miRBase v21 by bowtie2 (hairpin.human.fa/mature.human.fa )
## 
mkdir  bowtie2_index &&  cd bowtie2_index
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../hairpin.human.fa hairpin_human
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../mature.human.fa  mature_human

ls *_clean.fq.gz | while read id ; do  ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U $id   -S ${id%%.*}.hairpin.sam ; done 
## overall alignment rate:  10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95%
ls *_clean.fq.gz | while read id ; do  ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/mature_human  -U $id   -S ${id%%.*}.mature.sam ; done 
## overall alignment rate:  6.67% / 3.78% / 6.70% / 2.80%/ 6.55% / 3.23%

Am I right to use bowtie2 ? Am I right to use the miRBase ???

RNA-Seq miRNA-seq alignment reference • 4.6k views

ADD COMMENT • link 8.5 years ago by jimmy_zeng ▴ 90

0

Entering edit mode

Following reference alignment and read filtering of the miRNA-Seq data, between 863,000 and 1.9 million reads aligned to the custom reference set.

and below is the results from paper, as you can see ,

http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE60292&format=file

ls *gz |while read id ; do (echo $id;zcat $id | cut -f 2 |perl -alne '{$t+=$_;}END{print $t}');done

GSM1470353_iPS_010313_Unstim_known_miRNA_counts.txt.gz 686560 GSM1470354_iPS_010313_ET1_known_miRNA_counts.txt.gz 1109891 GSM1470355_iPS_011013_Unstim_known_miRNA_counts.txt.gz 956918 GSM1470356_iPS_011013_ET1_known_miRNA_counts.txt.gz 679366 GSM1470357_iPS_012513_Unstim_known_miRNA_counts.txt.gz 1164426 GSM1470358_iPS_012513_ET1_known_miRNA_counts.txt.gz 1063609

ADD REPLY • link 8.5 years ago by jimmy_zeng ▴ 90

score 0 · Answer 1 · 2016-06-23

I also use the alignment tool the paper mentions :

step5.2 using SHRiMP to do alignment

http://compbio.cs.toronto.edu/shrimp/README

3.5 Mapping cDNA reads against a miRNA database

cd ~/biosoft/SHRiMP/SHRiMP_2_2_3 export SHRIMP_FOLDER=$PWD cd -

　　We project the database with:

$SHRIMP_FOLDER/utils/project-db.py --seed 00111111001111111100,00111111110011111100,00111111111100111100,00111111111111001100,00111111111111110000 \ --h-flag --shrimp-mode ls miRBase/hairpin.human.fa

$SHRIMP_FOLDER/bin/gmapper-ls -L hairpin.human-ls SRR1542716.fastq --qv-offset 33 \ -o 1 -H -E -a -1 -q -30 -g -30 --qv-offset 33 --strata -N 8 >map.out 2>map.log

I still can get the same results as the paper.

just 20K successfully mapped reads

score 0 · Answer 2 · 2016-06-23

0

Entering edit mode

8.5 years ago

jimmy_zeng ▴ 90

I just look annother post , and find that : miRNA mapping rate is very low.. (less than 0.03%)

Oops, I forgot to convert U to T in hairpin.fa file.. (so stupid about it..)

ADD COMMENT • link 8.5 years ago by jimmy_zeng ▴ 90

score 0 · Answer 3 · 2016-06-23

ls _clean.fq.gz | while read id ; do ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U $id -S ${id%%.}.hairpin.sam ; done

overall alignment rate: 10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95% (before convert U to T )

overall alignment rate: 51.77% / 70.38%/51.45% /61.14%/ 52.20% / 65.85% (after convert U to T )

ls _clean.fq.gz | while read id ; do ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/mature_human -U $id -S ${id%%.}.mature.sam ; done

step5.2 using SHRiMP to do alignment

http://compbio.cs.toronto.edu/shrimp/README

3.5 Mapping cDNA reads against a miRNA database

We project the database with:

overall alignment rate: 10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95% (before convert U to T )

overall alignment rate: 51.77% / 70.38%/51.45% /61.14%/ 52.20% / 65.85% (after convert U to T )

overall alignment rate: 6.67% / 3.78% / 6.70% / 2.80%/ 6.55% / 3.23% (before convert U to T )

overall alignment rate: 34.94% / 46.16%/ 35.00%/ 38.50% / 35.46% /42.41%(after convert U to T )