Hello,
Previously, I have asked questions related with mapping the miRNAs. (miRNA mapping rate is very low.. (less than 0.03%))
Thank you David! :) Finally I could successfully map my miRNA reads.
But, this time I had another set of samples..but same design. 3 controls, VS 3 treated..
I followed exactly same logic.. ( since they are generated same machine.)
- Remove adapter sequence
- Remove index sequence
But, This time, I realized that after removing the adapter sequences, (TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC), I can see the file size are reduced dramatically, which means most of reads are removed.
For example, here is the fastq file size for original files
c1 (265M), c2 (428M), c3 (248 M), a1(268M), a2(344M), a3(443M)
after removing the adapter sequences
c1 (132M, okay), c2( 15M, weird), c3(208M, okay), a1(153M, okay), a2(15M, weird), a3(18M, weird)
When I looked at the fastq files, I can see those files (e.g. c2, a2, a3,), many of reads are mostly composed of adapter sequences (I am not sure why it is... maybe experiments were bad? No idea about experiments). I guess that this is the reason that file are mostly chopped.
When, I try to further analysis (e.g. remove index sequence, first 4, last 4 removal for my case), I ran bowtie2.
Here is the result of bowtie2.
c1 c2 c3 a1 a2
mapping rate 55.88% 7.75% 70.06% 68.14% 27.48%
Total number of reads 1196717 135524 1841729 1367558 134217
I am wondering whether I can further process of this analysis. I heard that mapping rate should be usually around 50-80%. In my case, it is much less than that. Also the number of total reads are so less.
I need some comments for this. Is it the problem of experiments? OR what else? Can I further analyze this?
Can you please provide two more things: 1) Number of reads before and after clipping and 2) the read length distribution.
Should be enough, if you just send it for e.g. c2.
Thank you david.
Yes I checked the read length distribution for c2.
Before the clipping the adapter sequence
After the clipping the adapter sequence.
=================================
It seems that originally I had 3 millions reads..with 36 bps.
After chopping, I only have 135524 reads... (I sums up all)
====================================
FYI : Since I also need to clip the first 4 and last 4 index sequences from this, my read length distribution should be shifted 4 bps less.
I concluded that this experiment is somehow wrong.. in some reasons.. don't know why.. since for the side of analysis, there is nothing wrong. .to do.. ;(
We decided to do the experiments again!
For sanity check you can run FastQC and look at the adapter sequences plot, it should match the reduction in sequences.
I will also check with fastQC too. Thanks!