Hi there,
I have a series of RNA-seq samples and, when I perform the alignment with STAR (version 2.7.9a), only a little percentage of reads is aligned...
FastQC shows a high number of overrepresented sequences related with rRNA (determined manually by BLAST) and a weird "Per base sequence content". I have two questions:
- I have no more than 3M of reads when summing the number of rRNA reads... so I would expect not to affect to global alignment.
- Why "Per base sequence content" is so "unstable"? Could it be the cause of low mapping rate?
Is there any way to improve these mapping rates? Thanks in advance!
EDIT
The used protocol for rRNA depletion is the: Illumina Stranded Total RNA Prep with Ribo-Zero Plus...
and the STAR command used:
base_name=${filename%_R?_001.cutadapt.fastq.gz}
STAR --runThreadN 20 \
--outFilterMismatchNmax 3 \
--alignEndsType Local \
--outFilterMultimapNmax 10 \
--outMultimapperOrder Random \
--genomeDir $reference \
--readFilesIn ${base_name}_R1_001.cutadapt.fastq.gz ${base_name}_R2_001.cutadapt.fastq.gz \
--readFilesCommand zcat \
--outFileNamePrefix ${base_name}.star. \
--outSAMtype BAM SortedByCoordinate \
--outBAMsortingThreadN 10 \
--genomeLoad LoadAndKeep \
--outSAMunmapped Within \
--limitBAMsortRAM 40000000000
You may want to provide the STAR command you are running so we can double check the code, and more detail from the STAR results log so we can check if there is a possibility that rRNA contamination could be contributing to this. Also, how was rRNA depleted in your RNA-seq? Was it a ribosomal depletion kit, poly-dT priming, etc..
If the above has been checked and confirmed you can get a more accurate idea of rRNA contamination by filtering your reads with a program such as BBDuk.
The per base sequence content looks typical of RNA-seq, so I wouldn't worry about that part.
Thanks for the reply. I've just updated the post (good suggestion).
Also, try running the FastQScreen tool to check if it is actually rRNA contamination. ^_^