Question

scRNAseq - infer type of single-cell chemistry from fastq

0

Entering edit mode

4.8 years ago

predeus ★ 2.1k

Hello all,

it's a known thing there's lots of variants of single cell experiments - e.g. kb --list from kallisto-bustools lists at least 10 different variants. If you're processing a reasonably big selection of scRNA-seq experiments, it would make sense to try and infer it automatically.

Is there by any chance a tool that can guess the chemistry from the reads?

Thank you in advance, as always.

scRNA-seq single cell RNA-Seq • 1.7k views

ADD COMMENT • link updated 4.2 years ago by bosun1988 • 0 • written 4.8 years ago by predeus ★ 2.1k

1

Entering edit mode

Perhaps the only thing possible with 10x data (assuming sequencing recommendations were followed strictly) is to look at the length of reads and check.

ADD REPLY • link 4.8 years ago by GenoMax 154k

0

Entering edit mode

Thank you. 10x is easier, yes - although 5' is still tricky.

ADD REPLY • link 4.8 years ago by predeus ★ 2.1k

score 0 · Answer 1 · 2021-09-02

Hi,

I don't know if you have solved this yourself already, but I wrote this package:

https://github.com/Nusob888/fasterqParseR/

Hope it helps, at the moment it will only take "SRR" run names as input filenames (it isn't written to handle complex underscore naming).

A workaround is to remove all the underscores and append SRR at the front. It will rename those formats to 10X cell ranger names so will correct the names back if you need it cell ranger (which I assume you don't since that has an auto detect feature).

The one bug I have noticed is that citeseq assays break it. It will not return a chemistry for non-whitelist barcodes.

Additionally it will correct R1, R2 and I1 files, since I noticed for many uploads to SRA, the order is incorrect which also breaks kb-python