I am analyzing miRNA-Seq data for differential expression analysis of miRNAs. First step in the process, I am performing is raw read quality filtering using FASTX-Toolkit to filter out reads with poor qualities using the following settings:
- the minimum quality score for each base = 20;
- the percent of bases that must have the minimum quality score ≤ 95%. ( version 0.0.14,http://hannonlab.cshl.edu/fastx_toolkit/index.html)
I used following command to perform quality filtering
fastq_quality_filter -i input -Q 33 -o output.fastq -v -q 20 -p 95
The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292, which was clean of adapter sequences.
SampleId TotalReads TrimmedReads %OfGoodQualityReadsWithinTotalReads
SRR1542714 1866654 962422 51.56 %
SRR1542715 1842228 955859 51.89 %
SRR1542716 2777542 1976509 71.16 %
SRR1542717 1324705 318259 24.02 %
SRR1542718 3085962 1830745 59.32 %
SRR1542719 1937831 619794 31.98 %
Usually all these samples should produce >95% of good quality reads after quality filtering. This is a huge variation and seems like I am doing something wrong.
So my question is "Is there any problem in running fastq_quality_filter with this parameter settings?" If not what should be reason I am not able to reproduce the result?
Will be really appreciable if somebody can guide me
Usually all these samples should produce >95% of good quality reads after quality filtering ?
Are you sure about it ? For miRNA expression profiles ? and most important , I see the sequencer is Ion Torrent PGM (Homo sapiens) , I didn't use their data before, I am very interested in it. wait me a week , I will process these data and then answer your question
Thanks Jimmy, Please post your answer.
We got the same results.
But I more concern about the mapping rate :
~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U SRR1542714_clean.fq.gz -S tmp.sam
Is that right to align the reads to miRBase ???
there's two things I can make sure , which alignment tool and which reference I should choose ?
belwo is my code :
Use
ADD REPLY
button below relevant posts to provide additional information.SUBMIT ANSWER
should only be used for valid answers for the original question.Sorry for my mistatke:
In fact the overall mapping rate should be Ok by using bowties, the only problem is that I forget to chage the U to T in the sequence download from miRBase .
ls _clean.fq.gz | while read id ; do ~/biosoft/bowtie/bowtie2-2.2.9/bowtie2 -x miRBase/bowtie2_index/hairpin_human -U $id -S ${id%%.}.hairpin.sam ; done
overall alignment rate: 10.20% / 5.71%/ 10.18%/ 4.36% / 10.02% / 4.95% (before convert U to T )
overall alignment rate: 51.77% / 70.38%/51.45% /61.14%/ 52.20% / 65.85% (after convert U to T )