Hello, I want to analyze single-cell RNA seq data that I took from a paper. The paper mentioned it has 8000 single cells and the library was made using cellranger software 2.1.0. The data submission was done on NCBI in a total of 22 fastq files. When I took the fastq files and aligned it using STAR, I got 22 bam files which showed it has only 22 cells in it. It took one fastq file as one cell. I want to separate each cell present in the fastq file as the multiple fastq or multiple bam files during conversion. Please suggest how to proceed in this context. Since I have to do cellular level analysis, I want the 8000 cells to have a separate identity.
Any help is highly appreciated. Thank you in advance.
Just run it through the cellranger pipeline.
Isn't that information encoded in the fastq headers somehow? Otherwise there might be some kind of barcode.
It is encoded via the Cellular Barcodes in R1, and requires specialied software for example the mentioned CellRanger or alternatives such as STAR-solo (a STAR extension from Alex Dobin the STAR author) or lightweight tools such as salmon/alevin or kallisto/bustools. The crux is to distinguish true and unique barcodes from noisy/degenerated barcodes, either by comparing with the 10X whitelist or by some kind of machine learning that then distinbuishes reliably from noisy/nonsense CBs. Please do yourself a favor and use dedicated software for this non-trivial task. Having one bam file per cell (is that what you want?) sounds incredibly tedious, is this really what you need?
Hi! I am trying to do a similar thing (separate each cell in a BAM file (after running STARsolo to map the reads) or a FASTQ file). Is there a way to do that? I am a student and new to this. Any help would be appreciated! Thank you!
Are you completely sure that you have 22 single end files? And not 11 pairs of files? Have you looked at how 10XGenomics fastqs are arranged, and compared that to yours? And yes, have you looked at your read names, to see if there is a possibility that cell barcodes and UMIs are embedded in the names?
@swbarnes2 yes i looked through all of it. I have 22 pairs of fastq files. And the barcodes and UMI are not given in the names. The naming is simply done using SRR id.
@Michael I have no such barcode information that is being given. And being extremely new to this, I am finding it difficult to get the actual problem
Can you post the relevant bioproject number?
Hi Sidrah, Could you find a way to separate each cell in a FASTQ or BAM file?