Hi, All: I am a newbie for the miRNA-seq data analysis, and I just want to follow a paper to get the same results.
I choose the paper: Aggarwal P, Turner A, Matter A, Kattman SJ et al. RNA expression profiling of human iPSC-derived cardiomyocytes in a cardiac hypertrophy model. PLoS One 2014;9(9):e108051. PMID: 25255322
The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292 , which was clean of adapter sequences.
I first use the fastx-toolkit to filter the bad quality reads .
I download the miRBase databae by the code below :
wget ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz ## 28645 reads
wget ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.zip ## 35828 reads
wget ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.zip ##
wget ftp://mirbase.org/pub/mirbase/CURRENT/genomes/hsa.gff3 ##
wget ftp://mirbase.org/pub/mirbase/CURRENT/miFam.dat.zip ##
grep sapiens mature.fa |wc # 2588
grep sapiens hairpin.fa |wc #1881
## Homo sapiens
perl -alne '{if(/^>/){if(/Homo/){$tmp=1}else{$tmp=0}};print if $tmp==1;}' hairpin.fa >hairpin.human.fa
perl -alne '{if(/^>/){if(/Homo/){$tmp=1}else{$tmp=0}};print if $tmp==1;}' mature.fa > mature.human.fa
## step5 : alignment to miRBase v21 by bowtie2 (hairpin.human.fa/mature.human.fa )
##
mkdir bowtie2_index && cd bowtie2_index
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../hairpin.human.fa hairpin_human
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2-build ../mature.human.fa mature_human
ls *_clean.fq.gz | while read id ; do ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U $id -S ${id%%.*}.hairpin.sam ; done
## overall alignment rate: 10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95%
ls *_clean.fq.gz | while read id ; do ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/mature_human -U $id -S ${id%%.*}.mature.sam ; done
## overall alignment rate: 6.67% / 3.78% / 6.70% / 2.80%/ 6.55% / 3.23%
Am I right to use bowtie2 ? Am I right to use the miRBase ???
Following reference alignment and read filtering of the miRNA-Seq data, between 863,000 and 1.9 million reads aligned to the custom reference set.
and below is the results from paper, as you can see ,
http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE60292&format=file
ls *gz |while read id ; do (echo $id;zcat $id | cut -f 2 |perl -alne '{$t+=$_;}END{print $t}');done
GSM1470353_iPS_010313_Unstim_known_miRNA_counts.txt.gz 686560 GSM1470354_iPS_010313_ET1_known_miRNA_counts.txt.gz 1109891 GSM1470355_iPS_011013_Unstim_known_miRNA_counts.txt.gz 956918 GSM1470356_iPS_011013_ET1_known_miRNA_counts.txt.gz 679366 GSM1470357_iPS_012513_Unstim_known_miRNA_counts.txt.gz 1164426 GSM1470358_iPS_012513_ET1_known_miRNA_counts.txt.gz 1063609