Question

Parsing BAM file per cell from Smart-Seq2 dataset

0

Entering edit mode

4 months ago

brutonk • 0

I was able to process a Smart-Seq2 dataset with the following code:

STAR --runThreadN $CPUS \ --genomeDir /path/to/dir/ \ --readFilesCommand gunzip -c \ --outFileNamePrefix /path/to/dir/ \ --soloType SmartSeq \ --readFilesManifest /path/to/tsv \ --soloUMIdedup Exact NoDedup \ --outSAMtype BAM SortedByCoordinate \ --soloStrand Unstranded \ --outBAMsortingBinsN 200

I need to work with the BAM file "Aligned.sortedByCoord.out.bam" for some downstream analyses. Given that the "CB" flag isn't applicable to Smart-Seq2 data, it's not clear how I can determine which read is associated with which cell. For example, here is one alignment from the BAM file:

LH00244:248:22VJNCLT3:4:1141:9968:28382 99 chr1 3000720 255 118M33S = 3000866 297 CTTTATTTCATCATTGACCAAGCTATCATTAAGTAGAGTATTGTTCCGTTTCCAAGTGAACGTTTGCTTTCTCTTATTTCTGCTGTTCTTTAAGATCAGCCTTCGTCCGTAGTTCTCTTAAAAGATGCACGGGAAAACTTCCATATTTTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9II9III9I**I999I99II9II99999**9999**9999999**9999999999999999999999*99999* NH:i:1 HI:i:1 AS:i:247 nM:i:10

Any suggestions on how I can add this information or parse the file into separate BAM files per cell?

starsolo star BAM smartseq2 alignment • 719 views

ADD COMMENT • link updated 4 months ago by dsull ★ 7.6k • written 4 months ago by brutonk • 0

0

Entering edit mode

Based on your use of UMI's you appear to have smartseq v.3 data.

Since this is single cell data I don't know if plain STAR is appropriate to use here. You may need to use STARsolo instead.

You could also use kallisto (Analysis of Smart-Seq3 data with kallisto-bustools ) or alevin-fry (https://combine-lab.github.io/alevin-fry-tutorials/2021/sci-rna-seq3/ ).

ADD REPLY • link 4 months ago by GenoMax 152k

score 1 · Answer 1 · 2025-03-15

Look at the STARsolo documentation on GitHub: it shows you how to create your manifest.tsv file so that the RG flag in your BAM file contains the cell identifier.

Essentially, the file should have three columns: R1, R2, and Cell-ID.

https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md

By the way, you also need to add the following:

--outSAMattributes RG