I have problem in the trimming my RNA seq reads. The problem is that although the length of all reads are the same, 101 nucleotids, quality test by fastqc shows some reads contain adapter (adapter content is red). For reads containing adapters, I trimmed them by Trimmomatic using ILLUMINACLIP command resulting in the adapter removal. My problem is for some other reads that I can not understand if they contain adapter or not because the adapter content in fastqc file is yellow and according to adapter content graph, from 12th or 13th nucleotide, it seems something wrong because before 12th nucleotide, the graph line is completely direct. I trimmed the reads according to mRNA-Seq Library Prep Kit V2 Lexogen protocol as "The first nine nucleotides need to be removed from Read 1 (starter side), while on the stopper side it is only six nucleotides (Read 2)." But fastqc test shows the same result as before and the problem still exists. So I got completely confused. Would you pleas help me to trim these reads?
I think I must explain my problem more. maybe it is not as enough as clear.
I have received some RNA reads prepared by SENSE mRNA-Seq Library Prep kit V2. After quality test of RNA seq reads with fastqc software, I found out that some reads showed red and some others yellow adapter content. What is common between the reads containing red and yellow adapter content is that, the fastqc adapter content graph is direct up to 13 th nucleotide and after 13th nucleotide there is a shift toward up but with more intensity for the reads with red adapter content. For the reads with red adapter content I removed them by removal of Illumina universal adapter via trimmomatic software using ILLUMINACLIP command. The fastQc quality test showed the removal of adapter and the adapter content got green. For the reads with yellow adapter content I can not find out if they contain adapter or not? I applied the removal of adapter command in trimmomatic for these reads (yellow adapter content). The results showed adapter removal for some reads but for some other reads the problem still persists and fastqc test still is showing yellow adapter content. So I completely got confused that what is this problem and why these graphs are still showing shift toward up. I need your help to find out the problem.
My other question is about trimming instruction in the SENSE mRNA-Seq Library Prep kit V2 protocol. I tried to trim the reads again, this time by following the instruction in the protocol mentioning the removal of 9 nucleotide from R1 and 6 nucleotides from R2 reads. So I removed 9 nucleotides from 5' side of R1 and 6 nucleotides from 3' side of of R2. But it did not make sens and the problem of adapter content graph persists. Would you please guide me how can I solve this problem? and How can trim these reads?
Can you post images (or full FastQC report) for a representative sample so we can see what is going on?
Also keep in mind that FastQC is not indicating "absolute failures". There are limits that FastQC author had to set for various test parameters (and they are set for normal genomic sequencing) so having a test "fail" (red X) does not automatically mean that your data is bad. You need to consider the context of your experiment when looking at FastQC results.
That is showing that your data still has some Illumina universal adapters and needs to be trimmed. Using bbduk.sh from BBMap suite try. adapters.fa file is included in the resources directory of BBMap suite. Use the correct PATH for it in command below.
You appear to be confusing trimming Illumina adapters and the specific trimming that Lexogen has recommended. Instructions from lexogen appear to remove specific adapters/nucleotides their kit must be adding to the fragments. You must not have trimmed your reads correctly with Trimmomatic to remove the Illumina adapter. Based on that graph your reads after proper trimming will have a range of lengths (i.e. they won't remain all 101 bp).
Lexogen's mRNA-Seq protocol uses random-primer. AFAIK, they suggest to remove them, if you want to use Tophat2 or something similar sensitive. If you use STAR or BBmap, you can keep the complete read but you should increase the allowed numbers of mismatches.
FastQC needs a certain length to identify adapter sequences. This is apparently 12 or 13 nts. Regardless of how you trim the reads' start, the detection of adapter sequences will start at this position. Denote, the graph of the adapter content also ends before the actual cycle number is reached.
thanks a lot for your comment. According to what you recommended me, I decided to keep Lxogen primer but use bbmap instead of Tophat2 to map the reads. I assembled paired end reads with trinity and now I have a trinity.fasta file. for the next step I need to map paired end reads to de novo assembled trinity.fasta file and count the reads. But I do not know what is the command to map these paired end reads by using bbmap in the way that allow numbers of mismatches as you recommended me. would you please guide me?
I was mentioning the trimming in regard to the aligner. For de-novo assembly, I would remove the primer-sequences since they have a higher chance of incorporated mismatches. I don't know how trinity can cope with that.
I do not know how trinity can solve the problem of primers. So regarding the comments maybe it is better to remove primers from the reads. My question is that which software is more efficient for primer removal? according to Lexogen protocol I must remove 9 nucleotides from 5' side of R1 and 6 nucleotides from 3' side of of R2. How can I trim these primers? would you please guide me
Many thanks for the command. these commands are for removal of specific number of nucleotides from the end of the read? Is there any other command to remove specific number of nucleotides from the start of the read as I need to remove 9 nucleotide from the 5' side of R1?
Yes. Two commands above are for removing 9 bases from front of Read 1 and 6 bases from end of Read 2. BBMap uses 0-based counting. Verify that correct number of bases are getting removed by commands above.
You can use fastp to preprocess your Illumina sequencing data (no matter RNASeq / DNASeq, no matter PE/SE). It can trim adapters automatically for both PE and SE data, which means that you don't have to input the adapter sequences.
Besides trimming adapters, this tool also performs quality filtering and other operations to improve your data quality. And most of the features are automated. All you have to do is to install fastp, and run:
Where is fastp getting the adapter sequences from? OP is using a kit that has specific instructions about removing additional bases from front/end of reads.
For paired end data, adapters are removed by finding the insert length (cycles beyond insert length are known as adapters).
For single end data, the adapter can be specified in the command line, or detected automatically if not specified. I developed an algorithm to detect adapter sequence by doing a simple assembly for the high frequency last 10 bp. See my code: https://github.com/OpenGene/fastp/blob/master/src/evaluator.cpp (string Evaluator::evaluateRead1Adapter() ). The detected adapter may be a bit shorter than the real one, but it's enough to trim most adapters.
I think I must explain my problem more. maybe it is not as enough as clear. I have received some RNA reads prepared by SENSE mRNA-Seq Library Prep kit V2. After quality test of RNA seq reads with fastqc software, I found out that some reads showed red and some others yellow adapter content. What is common between the reads containing red and yellow adapter content is that, the fastqc adapter content graph is direct up to 13 th nucleotide and after 13th nucleotide there is a shift toward up but with more intensity for the reads with red adapter content. For the reads with red adapter content I removed them by removal of Illumina universal adapter via trimmomatic software using ILLUMINACLIP command. The fastQc quality test showed the removal of adapter and the adapter content got green. For the reads with yellow adapter content I can not find out if they contain adapter or not? I applied the removal of adapter command in trimmomatic for these reads (yellow adapter content). The results showed adapter removal for some reads but for some other reads the problem still persists and fastqc test still is showing yellow adapter content. So I completely got confused that what is this problem and why these graphs are still showing shift toward up. I need your help to find out the problem.
My other question is about trimming instruction in the SENSE mRNA-Seq Library Prep kit V2 protocol. I tried to trim the reads again, this time by following the instruction in the protocol mentioning the removal of 9 nucleotide from R1 and 6 nucleotides from R2 reads. So I removed 9 nucleotides from 5' side of R1 and 6 nucleotides from 3' side of of R2. But it did not make sens and the problem of adapter content graph persists. Would you please guide me how can I solve this problem? and How can trim these reads?
Can you post images (or full FastQC report) for a representative sample so we can see what is going on?
Also keep in mind that FastQC is not indicating "absolute failures". There are limits that FastQC author had to set for various test parameters (and they are set for normal genomic sequencing) so having a test "fail" (red X) does not automatically mean that your data is bad. You need to consider the context of your experiment when looking at FastQC results.
That is showing that your data still has some Illumina universal adapters and needs to be trimmed. Using
bbduk.sh
from BBMap suite try.adapters.fa
file is included in the resources directory of BBMap suite. Use the correct PATH for it in command below.Thank for your comment