Hi,
Using the STAR aligner, I am getting a very low mapping percentage for my single cell RNA seq data (5-10%). A majority of my reads are being considered "too short" (>90%). My current parameters are STAR --genomeDir --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 --outReadsUnmapped Fastx --outSAMstrandField intronMotif --readFilesCommand zcat --readFilesIn *.fq.gz --runThreadN 6
I am also trimming the reads with trim galore as follows: trim_galore $R2_file --trim-n -a AAAAAAAA -clip_R1 9 -o $dir_name
Is there any hypothesis for why we are getting such low percentage of mapped reads? I am particularly interested in assessing contamination. Is there a good software for just quickly assessing whether my samples could be contaminated? I have no good idea with what they could be contaminated with.
Thanks!
One should never trim reads independently (if you have paired end data). You are also not scanning/removing Illumina adapters.
My presumption is that this is something like CEL-Seq2 data and OP is trying to remove polyadenylation from read 2 (if it's still there then it'll get soft-clipped, so I think that's excess effort). If that's the case, read 1 is mostly polyA plus UMI/cell barcode, which I imagine is causing mapping issues.