Question

trimmomatic on scRNA seq data

3

Entering edit mode

2.3 years ago

friguiahlem8 ▴ 30

Hello -rw-r--r-- 1 5062 5000 753851810 Nov 2 2018 xxxx_L001_R1_001.fastq.gz -rw-r--r-- 1 5062 5000 1772725195 Nov 2 2018 xxxx_L001_R2_001.fastq.gz -rw-r--r-- 1 5062 5000 748651163 Nov 2 2018 xxxxx_S1_L002_R1_001.fastq.gz -rw-r--r-- 1 5062 5000 1763628623 Nov 2 2018 xxxxx_L002_R2_001.fastq.gz

questions: xxxx_L001_R1_001.fastq.gz and xxxxx_L001_R2_001.fastq.gz should not have the same size ?

thank you in advance

single trimmomatic cell sequencing rna • 2.8k views

ADD COMMENT • link updated 2.1 years ago by ATpoint 88k • written 2.3 years ago by friguiahlem8 ▴ 30

0

Entering edit mode

Get a background first and follow guided tutorials. There is no need to use trimmomatic on regular single-cell/10x data. Also make yourself familiar with how these libraries look, what R1 and R2 is (CB/UMI, cDNA) etc. There is lots of online material on that.

ADD REPLY • link 2.3 years ago by ATpoint 88k

0

Entering edit mode

Hi ATpoint , Would you mind explaining why we don't have to trim 10x scRNAseq data? I noticed that cell ranger workflow will take care of TSO and poly-A. But I'm not sure it will trim general illumina adapter sequences. Although, STAR will do soft-clipping. So I was wondering why there's no need to use trimming tools for 10x single cell data. I've seen illumina universal adapter sequence via Fastqc from 10x scRNAseq 3' data.

ADD REPLY • link 2.1 years ago by jkim ▴ 190

1

Entering edit mode

For R1 CellRanger uses only the CB and UMI positions (for example the first 28 bp in a v3 chemistry dataset) so it ignores everything beyond that, and for R2 the STAR aligner which CellRanger uses) can soft-clip parts of the read that do not properly align. That's why trimming is not mandatory, yet you can do it if you feel safer. But people generally don't for 10x data afaik.

ADD REPLY • link 2.1 years ago by ATpoint 88k

score 1 · Answer 1 · 2023-03-10

it's normal to find that the size of R1 is not the same as R2 in this case ?

Yes because in case of 10x, Read 1 is generally cell barcodes+UMI (26 or 28 bp depending on chemistry version) and R2 contains actual RNA read.

You should use proper tools for scRNAseq data like cellranger from 10x genomics, alevin or STARsolo and follow a workflow https://bioconductor.org/books/release/OSCA/. Do not use normal RNAseq data analysis tools on single cell data.