Question

Separating each cell in fastq file

1

Entering edit mode

4.4 years ago

sidrah.maryam ▴ 70

Hello, I want to analyze single-cell RNA seq data that I took from a paper. The paper mentioned it has 8000 single cells and the library was made using cellranger software 2.1.0. The data submission was done on NCBI in a total of 22 fastq files. When I took the fastq files and aligned it using STAR, I got 22 bam files which showed it has only 22 cells in it. It took one fastq file as one cell. I want to separate each cell present in the fastq file as the multiple fastq or multiple bam files during conversion. Please suggest how to proceed in this context. Since I have to do cellular level analysis, I want the 8000 cells to have a separate identity.

Any help is highly appreciated. Thank you in advance.

RNA-Seq sequencing • 3.8k views

ADD COMMENT • link updated 2.7 years ago by MYousry ▴ 20 • written 4.4 years ago by sidrah.maryam ▴ 70

1

Entering edit mode

Just run it through the cellranger pipeline.

ADD REPLY • link 4.4 years ago by Pappu ★ 2.1k

0

Entering edit mode

Isn't that information encoded in the fastq headers somehow? Otherwise there might be some kind of barcode.

ADD REPLY • link 4.4 years ago by Michael 55k

0

Entering edit mode

It is encoded via the Cellular Barcodes in R1, and requires specialied software for example the mentioned CellRanger or alternatives such as STAR-solo (a STAR extension from Alex Dobin the STAR author) or lightweight tools such as salmon/alevin or kallisto/bustools. The crux is to distinguish true and unique barcodes from noisy/degenerated barcodes, either by comparing with the 10X whitelist or by some kind of machine learning that then distinbuishes reliably from noisy/nonsense CBs. Please do yourself a favor and use dedicated software for this non-trivial task. Having one bam file per cell (is that what you want?) sounds incredibly tedious, is this really what you need?

ADD REPLY • link 4.4 years ago by ATpoint 87k

0

Entering edit mode

Hi! I am trying to do a similar thing (separate each cell in a BAM file (after running STARsolo to map the reads) or a FASTQ file). Is there a way to do that? I am a student and new to this. Any help would be appreciated! Thank you!

ADD REPLY • link 2.7 years ago by MYousry ▴ 20

0

Entering edit mode

Are you completely sure that you have 22 single end files? And not 11 pairs of files? Have you looked at how 10XGenomics fastqs are arranged, and compared that to yours? And yes, have you looked at your read names, to see if there is a possibility that cell barcodes and UMIs are embedded in the names?

ADD REPLY • link 4.4 years ago by swbarnes2 14k

0

Entering edit mode

@swbarnes2 yes i looked through all of it. I have 22 pairs of fastq files. And the barcodes and UMI are not given in the names. The naming is simply done using SRR id.

@Michael I have no such barcode information that is being given. And being extremely new to this, I am finding it difficult to get the actual problem

ADD REPLY • link 4.4 years ago by sidrah.maryam ▴ 70

0

Entering edit mode

Can you post the relevant bioproject number?

ADD REPLY • link 4.4 years ago by GenoMax 150k

0

Entering edit mode

Hi Sidrah, Could you find a way to separate each cell in a FASTQ or BAM file?

ADD REPLY • link 2.7 years ago by MYousry ▴ 20

score 3 · Accepted Answer · 2020-11-04

3

Entering edit mode

4.4 years ago

swbarnes2 14k

If you've read the 10XGenomics information on the format of their fastqs, and you really have 22 pairs of fastqs, and you see from inspection that they look like 10XGenomics fastqs, you should know that you don't run them through STAR. Run them through cellranger. (Cellranger uses STAR, but does other things as well)