STARsolo sorted bam contains 'non-whitelisted' barcodes
1
0
Entering edit mode
11 hours ago
mk ▴ 310

How can I address empty (CB:Z:-) barcodes in the sorted bam output of STARsolo?

Expected behavior

About 90% of the output sorted BAM has both a CR:Z and CB:Z barcode string like:

(base) [mkarikom@gl3338 mkarikom]$ samtools view -@ 36 /scratch/welchjd_root/welchjd5/mkarikom/bican_fastqs_processed/2024-05-14_v10_DFC_5P_Ex50pAS_truncsmall_20241213_160222_3226046/alignment/rxn1.0_STARsolo/rxn1.0_Aligned.sortedByCoord.out.bam |grep 'CB'|head
LH00146:352:22KFLFLT3:5:1224:41147:21188        419     chr1    11749   0       90M     =       12042   402     TTTTGCTGCATGGCCGGTGTTGAGAATGACTGCGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAG      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII      NH:i:7  HI:i:6  nM:i:0  AS:i:197        CR:Z:CGCACATAGCTGTAGC   UR:Z:GAGCAAGTGTTC       GX:Z:-  GN:Z:-  sS:Z:CGCACATAGCTGTAGCGAGCAAGTGTTCTTTCTTATATGGGAAGTTACATGCAGACAACAGGGGCCAGAAGATGAACAATGGCCCATCCCACTCTAGGCATGGCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTG       sQ:Z:IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII     sM:i:0  CB:Z:CGCACATAGCTGTAGC   UB:Z:GAGCAAGTGTTC
LH00146:352:22KFLFLT3:5:1224:41147:21188        339     chr1    12042   0       109M41S =       11749   -402    CAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTCCCATATAAGAAAGAACACTTGCTCGCTACAGCTATGTGCG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  NH:i:7  HI:i:6  nM:i:0  AS:i:197        CR:Z:CGCACATAGCTGTAGC     UR:Z:GAGCAAGTGTTC       GX:Z:-  GN:Z:-  sS:Z:CGCACATAGCTGTAGCGAGCAAGTGTTCTTTCTTATATGGGAAGTTACATGCAGACAACAGGGGCCAGAAGATGAACAATGGCCCATCCCACTCTAGGCATGGCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTG     sQ:Z:IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII     sM:i:0  CB:Z:CGCACATAGCTGTAGC   UB:Z:GAGCAAGTGTTC
LH00146:352:22KFLFLT3:5:2253:34508:6319 163     chr1    12625   255     90M     =       183143  170583  GCCAGGCATGCCCTTCCCTAGCATCAGGTCTCCAGAGCTGCAGAAGACGACGGCCGACTTGGATCACACTCTTGTGAGTGTCCCCAGTGT      IIIII9IIIIIIIIIIIIIIIIIIIIIIIII-IIIII9IIIIIIIIIIIIIIIIIII9IIIIIIIIIIII9IIIIIIIIIII-IIIIIII      NH:i:1  HI:i:1  nM:i:2  AS:i:147        CR:Z:CCTCTTACCTCTGCAA   UR:Z:CACTGGGGACAC       GX:Z:-  GN:Z:-  sS:Z:CCTCTTACCTCTGCAACACTGGGGACACTCACAAGAGTGTGATCCAAGTCGGCCGTCGTCTTCTGCAGCTCTGGAGACCTGATGCTAGGGAAGGGCATGCCTGGCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGAACTTAGAA       sQ:Z:IIIIIIIIIIIIIIIII9IIIIIIIIIII9IIIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII-II99IIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIII-     sM:i:-1 CB:Z:-  UB:Z:-
LH00146:352:22KFLFLT3:3:2202:24781:3211 163     chr1    13483   255     90M     =       13583   208     GCAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTG      IIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIII9IIIIIIIIIIIII9IIIIIIIIIIIIIII9II      NH:i:1  HI:i:1  nM:i:0  AS:i:196        CR:Z:CTGTACGCACAGTTTA   UR:Z:TCCTGTAGGTGG       GX:Z:-  GN:Z:-  sS:Z:CTGTACGCACAGTTTATCCTGTAGGTGGTTTCTTATATGGGTTGGCCAGGACCCACCATTT

Problem?

However, the other 10% of reads in the sorted BAM seem to have an empty barcode (CB:Z:-) but a populated CR:Z column like:

(base) [mkarikom@gl3338 mkarikom]$ samtools view /scratch/rxn1_S1_L001_Aligned.sortedByCoord.out.bam | grep 'CB:Z:-' | head
A00708:13:HTC7MDSXX:4:1220:23891:28213  419     chr1    12006   0       101M    =       12372   428     GCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGAT   FFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF   NH:i:6  HI:i:1  nM:i:0  AS:i:161        CR:Z:CTACACTCAGCGATTG   UR:Z:CGAAGATTAT GX:Z:-  GN:Z:-  sS:Z:CTACACTCAGCGATTGCGAAGATTATTTCTTATATGGGGTGGCAGCGATGGCCTGCCTGATCTTCCACCTGCTCTCCCAGGGCCAAAGCCAGACCTGCTGA        sQ:Z:FFFFFFFFFFFFFFFFFFFFFFFFFF,,FFFF,FFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      sM:i:-1 CB:Z:-  UB:Z:-
A00708:13:HTC7MDSXX:4:1220:22544:28761  419     chr1    12006   0       101M    =       12372   428     GCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGAT   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF   NH:i:6  HI:i:1  nM:i:0  AS:i:161        CR:Z:CTACACTCAGCGATTG   UR:Z:CGAAGATTAT GX:Z:-  GN:Z:-  sS:Z:CTACACTCAGCGATTGCGAAGATTATGCCTTAGAGGGGGTGGCAGCGATGGCCTGCCTGATCTTCCACCTGCTCTCCCAGGGCCAAAGCCAGACCTGCTGA        sQ:Z:FFFFFFFFFFFFFFFFFFFFFFFFFF:,FF,:,:FFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF      sM:i:-1 CB:Z:-  UB:Z:-
A00708:13:HTC7MDSXX:4:2650:18313:29168  163     chr1    12031   0       101M    =       12372   403     CTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCC   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFF   NH:i:7  HI:i:1  nM:i:0  AS:i:161        CR:Z:CTACACTCAGCGATTG   UR:Z:CGAAGATTAT GX:Z:-  GN:Z:-  sS:Z:CTACACTCAGCGATTGCGAAGATTATTACTTATATGGTGTGGCAGCGATGGCCTGCCTGATCTTCCACCTGCTCTCCCAGGGCCAAAGCCAGACCTGCTGA        sQ:Z:FFFFFFFFFFFFFFFFFFFFFF:FFF,,FFF:,FFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      sM:i:-1 CB:Z:-  UB:Z:-
A00708:13:HTC7MDSXX:4:2262:27317:11287  163     chr1    12065   0       101M    =       12372   369     GTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGG   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF   NH:i:7  HI:i:1  nM:i:0  AS:i:161        CR:Z:CTACACTCAGCGATTG   UR:Z:CGAAGATTAT GX:Z:-  GN:Z:-  sS:Z:CTACACTCAGCGATTGCGAAGATTATTTCTTATATTGGGTGGCAGCGATGGCCTGCCTGATCTTCCACCTGCTCTCCCAGGGCCAAAGCCAGACCTGCTGA        sQ:Z:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      sM:i:-1 CB:Z:-  UB:Z:-
starsolo alignment • 84 views
ADD COMMENT
0
Entering edit mode
9 hours ago
dsull ★ 7.0k

If I’m not mistaken, CR:Z: contains the actual barcode sequence encountered in the sequencing read, whereas the CB:Z: contains the barcode after correction to the “whitelist”, and if it’s empty, it probably means it couldn’t be corrected to the “whitelist”.

So that means STARsolo couldn’t successfully resolve what barcode in the whitelist that sequence belongs to.

For example, if the sequence has two or more substitution errors, it may not be possible to find out which sequence in the whitelist it belongs to. So you can’t do anything about it.

ADD COMMENT

Login before adding your answer.

Traffic: 1733 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6