Entering edit mode
2.5 years ago
glady
▴
320
Hello,
I'm trying to use RSEM for a paired-end data to map it against human reference with STAR aligner. After, trimming the adapter sequences with cutadapt, when I try to align the reads with RSEM, with the following command;
rsem-calculate-expression -p 6 --paired-end --phred64-quals --output-genome-bam --star --star-path /home/Downloads/tools/STAR-2.5.3a/source --estimate-rspd --append-names sample1-R1_trimmed.fastq sample1-R2_trimmed.fastq ref_genome/human_ref sample1.expression
I get this error;
Warning: Read A00902:595:HC27FDRX2:2:2226:26476:24236 is ignored due to at least one of the mates' length < seed length (= 25)!
How can I solve this issue? Is this because the adapter trimming is not done correctly? Or there is some issue with the input RSEM commands I used?
Thanks!
It's a warning, not an error. If you trimmed adapters you could also filter according to size (cutadapt has this option, other tools as well)
Okay, thank you for the reply. So is it okay, if I ignore and proceed with this warning?
Because another issue is the diagnostic plot, where the unaligned reads are 98%. But if you open the genome.bam file in FASTQC, you can see most of the reads are present.
98% unaligned is bad. The reads are there but they're not aligned so it's pretty useless.
How can I solve the issue? Is it because of the warning from RSEM? or the adapter trimming was not done properly?
Could be several things. Try to manually BLAST some unmapped reads to see where they fall (in non-coding regions, rRNA etc. another genome ...). Run fastqc after trimming to see if you have adapters left (although less likely)
I don't see any adapters after trimming by cutadapt.
The issue is the adapters and illumina indices, in the FASTQC report I can see some sequences as TruSeq Adapter index 15.
But even after trimming these sequences by cutadapt, I still find a poor mapping percentage!
What could be the problem? Should I try some different algorithm instead of cutadapt?