Hello,
While performing quality control on a public scRNA-seq raw dataset, I noticed that some of the fastq files exhibit an unusually high T-base content in the middle portion of the sequences (approximately positions 30-70 in the read). However, this issue does not appear in other fastq files. Additionally, almost every fastq file shows a warning for failed per-sequence GC content in the quality control reports.
I would like to ask:
- What could be the cause of the sudden increase in T-base frequency, and how should I address it?
- Should I perform GC bias correction for single-cell RNA-seq data?
Additionally, can scRNA-seq fastq files be processed with fastp using default parameters for quality control? I have mostly worked with bulk RNA-seq data, so I'm quite new to scRNA-seq.
If there are any papers that provide a thorough yet accessible discussion of practical challenges in scRNA-seq data processing and analysis, I would greatly appreciate the recommendation!
Thank you!
Here are the details:
SRR18015167_1_fastqc
SRR18015167_2_fastqc
For the fastqc.html: https://github.com/coopertdx/Biostars
Get it! Thank you!