Separate single cell BAM file by the cell barcode
1
0
Entering edit mode
16 months ago
zbidav ▴ 30

Dear Biostar community,

I am a bit new to Dropseq analysis (10x sequenced files, if not mistaken). I followed the standard CellRanger protocol and received an aligned BAM file of the samples and the files for downstream analysis with Seurat.

I wonder if there is an efficient way to separate the resulting BAM file into multiple files by the Cell Barcode. (it is a special attribute in the bam file - "CB"). I tried to do it using samtools, but due to the large file number, it was not so efficient.

If summarized: "My input is a single-cell BAM file, and the output is separated bam files - one for each cell barcode. Do you know a tool that can do it or an efficient way to do so?

Much appreciated!

BAM scRNAseq single-cell • 2.9k views
ADD COMMENT
0
Entering edit mode
16 months ago
GenoMax 147k

10x makes a tool available: https://github.com/10XGenomics/subset-bam

Also: https://github.com/timoast/sinto

ADD COMMENT
0
Entering edit mode

Hi, thanks for the answer! Sadly this not quite what I am trying to do :( if I may quote from the manual (of subset-bam):

subset-bam ... takes a 10x Genomics BAM file, a CSV file defining the subset of cells you want to isolate, and produces a new BAM file with only alignments associated with those cells.

This tool is very useful in creating pseudo-bulk files from multiple cells. In my case, as I need to separate each single cell to a different file. I can create a temporary file of a single csv, but I wonder if that approach is indeed efficient (compared to the naive one using samtools)

ADD REPLY
1
Entering edit mode

Looks like sinto is multi-threaded so it may be more performant: https://timoast.github.io/sinto/basic_usage.html#filter-cell-barcodes-from-bam-file

How many cell barcodes were you planning to use?

ADD REPLY
0
Entering edit mode

Hmmm... all of them :) To be more exact - all the barcodes that resulted from CellRanger. I think in average ~8,000-10,000 barcodes.

ADD REPLY
1
Entering edit mode

You may want to run subset-bam using GNU parallel where each temporary single line csv is read in through parallel's ::: option. I am no expert in GNU parallel so take my advice with a grain of salt.

ADD REPLY
0
Entering edit mode

This is a bit late response - but Sinto worked really well - it does require a barcode list and (maybe it is a system limitation) could work only in 500 cell batches, but it indeed worked and was the most efficient way.

ADD REPLY
0
Entering edit mode

Hi there, Could you please share how exactly you did it with sinto? I have multiple scRNA bam files and cell barcode list I want to extract and the way I am doing it will take ages. So, please share it.

ADD REPLY
0
Entering edit mode

Definitely! I am so sorry for not seeing this on time, I hope this is still relevant. I will provide the code that works in my case:

input_bam="possorted_genome_bam.bam"
output_dir="splitedFile/"
mkdir -p $output_dir
out_cell_barcodes_annot="cell_barcodes_tabular_sinto.tsv"
sinto filterbarcodes --bam $input_bam --cells $out_cell_barcodes_annot --outdir $output_dir --nproc 20

The usual sinto input requires:

  • Bam file
  • Cell annotation file

The cell annotation for example:

TTGGATGTCGCACGAC-1      epithelial
TTGGGCGAGTGACCTT-1      endothelial
TTGGGTATCTGGCCAG-1      endothelial
TTGTGTTCAACCAACT-1      endothelial

I changed to be:

TTGGATGTCGCACGAC-1      TTGGATGTCGCACGAC-1 
TTGGGCGAGTGACCTT-1      TTGGGCGAGTGACCTT-1
TTGGGTATCTGGCCAG-1      TTGGGTATCTGGCCAG-1
TTGTGTTCAACCAACT-1      TTGTGTTCAACCAACT-1

It seems that Sinto can do it only in batches of ~100-500, depending on your system.

ADD REPLY

Login before adding your answer.

Traffic: 1613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6