Hello! Recentlly,I got a dataset to practice RNA-seq analysis, when I have done quality check using FastQC, I found some of my data have poor quality at the tail. So I want to trim the tail Using Trimmomatic, but I got in trouble with the parameter:
"MINLEN" :All of my 52 samples posses read length for about 43~57 bases, so I don't know how to choose a proper length(Maybe 1/3 of original length, I guess)to ensure my downstream mapping rate at a proper level, I would appreciate if given some advices!
"SLIDINGWINDOW:4:15": Actually, I chose threshold values at 15~30 respectively and I want to know your choose strategy when facing this parameter?
- "LEADING": If it's necessary to trim the leading several bases everytime?
And I a beginner in RNA-seq, thanks a ton for your suggesting!
Ian, thank you very much! But I'm afraid if I set MINLEN:35, there are pretty many reads lost. Below, I post counts colected from one of my trimmed .fastq file:
Read_length Read_num 10 2026 11 2334 12 1709 13 1421 14 1251 15 1362 16 1622 17 1812 18 2106 19 2318 20 2909 21 3959 22 5490 23 6103 24 7793 25 8843 26 9223 27 9572 28 9729 29 9917 30 10000 31 10046 32 10357 33 10222 34 10395 35 10451 36 10573 37 10762 38 11541 39 13131 40 26277 41 94748 42 156007 43 334605 44 330133 45 295795 46 337757 47 245022 48 132765 49 1915847
You can find the maximum length is 49, and there are many reads with length beyond 35, so, will it be ok if I choose 10 as a threshold value? Thanks again for your replying!
I would not go down to 10bp. Remember the shorter the read length the greater the chance of a false positive match. You could go to 25bp if you are despirate. However, looking at the numbers of reads per length I think the majority of reads are >35bp anyway. Below 35bp numbers are only in the thousands.
Thanks, Ian! I'm gonna try MINLEN:35 for trimming.