Question

Unexpected High T-Base Content in scRNA-seq Fastq Files and GC Bias Correction

0

Entering edit mode

6 weeks ago

Cooper • 0

Hello,

While performing quality control on a public scRNA-seq raw dataset, I noticed that some of the fastq files exhibit an unusually high T-base content in the middle portion of the sequences (approximately positions 30-70 in the read). However, this issue does not appear in other fastq files. Additionally, almost every fastq file shows a warning for failed per-sequence GC content in the quality control reports.

I would like to ask:

What could be the cause of the sudden increase in T-base frequency, and how should I address it?
Should I perform GC bias correction for single-cell RNA-seq data?

Additionally, can scRNA-seq fastq files be processed with fastp using default parameters for quality control? I have mostly worked with bulk RNA-seq data, so I'm quite new to scRNA-seq.

If there are any papers that provide a thorough yet accessible discussion of practical challenges in scRNA-seq data processing and analysis, I would greatly appreciate the recommendation!

Thank you!

Here are the details:

SRR18015167_1_fastqc

SRR18015167_2_fastqc

For the fastqc.html: https://github.com/coopertdx/Biostars

Fastq Fastqc scRNA-seq quality-control • 373 views

ADD COMMENT • link 6 weeks ago by Cooper • 0

score 1 · Answer 1 · 2024-10-13

1

Entering edit mode

6 weeks ago

ATpoint 85k

What you see is normal and expected. In 10x scRNA-seq version 3 (which this is) R1 is the CB and UMI in the first 28bp, then comes a polyT (what you see impressively) representing the sequence to originally bind to the polyA tail of transcripts for cDNA synthesis. Nothing to be done here. Just run them through CellRanger or alternative pipelines for scRNA-seq.