Dear Biostar community,
I am a bit new to Dropseq analysis (10x sequenced files, if not mistaken). I followed the standard CellRanger protocol and received an aligned BAM file of the samples and the files for downstream analysis with Seurat.
I wonder if there is an efficient way to separate the resulting BAM file into multiple files by the Cell Barcode. (it is a special attribute in the bam file - "CB"). I tried to do it using samtools, but due to the large file number, it was not so efficient.
If summarized: "My input is a single-cell BAM file, and the output is separated bam files - one for each cell barcode. Do you know a tool that can do it or an efficient way to do so?
Much appreciated!
Hi, thanks for the answer! Sadly this not quite what I am trying to do :( if I may quote from the manual (of subset-bam):
This tool is very useful in creating pseudo-bulk files from multiple cells. In my case, as I need to separate each single cell to a different file. I can create a temporary file of a single csv, but I wonder if that approach is indeed efficient (compared to the naive one using samtools)
Looks like
sinto
is multi-threaded so it may be more performant: https://timoast.github.io/sinto/basic_usage.html#filter-cell-barcodes-from-bam-fileHow many cell barcodes were you planning to use?
Hmmm... all of them :) To be more exact - all the barcodes that resulted from CellRanger. I think in average ~8,000-10,000 barcodes.
You may want to run subset-bam using GNU parallel where each temporary single line csv is read in through parallel's
:::
option. I am no expert in GNU parallel so take my advice with a grain of salt.This is a bit late response - but Sinto worked really well - it does require a barcode list and (maybe it is a system limitation) could work only in 500 cell batches, but it indeed worked and was the most efficient way.
Hi there, Could you please share how exactly you did it with sinto? I have multiple scRNA bam files and cell barcode list I want to extract and the way I am doing it will take ages. So, please share it.
Definitely! I am so sorry for not seeing this on time, I hope this is still relevant. I will provide the code that works in my case:
The usual sinto input requires:
The cell annotation for example:
I changed to be:
It seems that Sinto can do it only in batches of ~100-500, depending on your system.