How can I address empty (CB:Z:-
) barcodes in the sorted bam output of STARsolo?
Expected behavior
About 90% of the output sorted BAM has both a CR:Z and CB:Z barcode string like:
(base) [mkarikom@gl3338 mkarikom]$ samtools view -@ 36 /scratch/welchjd_root/welchjd5/mkarikom/bican_fastqs_processed/2024-05-14_v10_DFC_5P_Ex50pAS_truncsmall_20241213_160222_3226046/alignment/rxn1.0_STARsolo/rxn1.0_Aligned.sortedByCoord.out.bam |grep 'CB'|head
LH00146:352:22KFLFLT3:5:1224:41147:21188 419 chr1 11749 0 90M = 12042 402 TTTTGCTGCATGGCCGGTGTTGAGAATGACTGCGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NH:i:7 HI:i:6 nM:i:0 AS:i:197 CR:Z:CGCACATAGCTGTAGC UR:Z:GAGCAAGTGTTC GX:Z:- GN:Z:- sS:Z:CGCACATAGCTGTAGCGAGCAAGTGTTCTTTCTTATATGGGAAGTTACATGCAGACAACAGGGGCCAGAAGATGAACAATGGCCCATCCCACTCTAGGCATGGCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTG sQ:Z:IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII sM:i:0 CB:Z:CGCACATAGCTGTAGC UB:Z:GAGCAAGTGTTC
LH00146:352:22KFLFLT3:5:1224:41147:21188 339 chr1 12042 0 109M41S = 11749 -402 CAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTCCCATATAAGAAAGAACACTTGCTCGCTACAGCTATGTGCG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NH:i:7 HI:i:6 nM:i:0 AS:i:197 CR:Z:CGCACATAGCTGTAGC UR:Z:GAGCAAGTGTTC GX:Z:- GN:Z:- sS:Z:CGCACATAGCTGTAGCGAGCAAGTGTTCTTTCTTATATGGGAAGTTACATGCAGACAACAGGGGCCAGAAGATGAACAATGGCCCATCCCACTCTAGGCATGGCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTG sQ:Z:IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII sM:i:0 CB:Z:CGCACATAGCTGTAGC UB:Z:GAGCAAGTGTTC
LH00146:352:22KFLFLT3:5:2253:34508:6319 163 chr1 12625 255 90M = 183143 170583 GCCAGGCATGCCCTTCCCTAGCATCAGGTCTCCAGAGCTGCAGAAGACGACGGCCGACTTGGATCACACTCTTGTGAGTGTCCCCAGTGT IIIII9IIIIIIIIIIIIIIIIIIIIIIIII-IIIII9IIIIIIIIIIIIIIIIIII9IIIIIIIIIIII9IIIIIIIIIII-IIIIIII NH:i:1 HI:i:1 nM:i:2 AS:i:147 CR:Z:CCTCTTACCTCTGCAA UR:Z:CACTGGGGACAC GX:Z:- GN:Z:- sS:Z:CCTCTTACCTCTGCAACACTGGGGACACTCACAAGAGTGTGATCCAAGTCGGCCGTCGTCTTCTGCAGCTCTGGAGACCTGATGCTAGGGAAGGGCATGCCTGGCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGAACTTAGAA sQ:Z:IIIIIIIIIIIIIIIII9IIIIIIIIIII9IIIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII-II99IIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIII- sM:i:-1 CB:Z:- UB:Z:-
LH00146:352:22KFLFLT3:3:2202:24781:3211 163 chr1 13483 255 90M = 13583 208 GCAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTG IIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIII9IIIIIIIIIIIII9IIIIIIIIIIIIIII9II NH:i:1 HI:i:1 nM:i:0 AS:i:196 CR:Z:CTGTACGCACAGTTTA UR:Z:TCCTGTAGGTGG GX:Z:- GN:Z:- sS:Z:CTGTACGCACAGTTTATCCTGTAGGTGGTTTCTTATATGGGTTGGCCAGGACCCACCATTT
Problem?
However, the other 10% of reads in the sorted BAM seem to have an empty barcode (CB:Z:-
) but a populated CR:Z column like:
(base) [mkarikom@gl3338 mkarikom]$ samtools view /scratch/rxn1_S1_L001_Aligned.sortedByCoord.out.bam | grep 'CB:Z:-' | head
A00708:13:HTC7MDSXX:4:1220:23891:28213 419 chr1 12006 0 101M = 12372 428 GCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGAT FFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6 HI:i:1 nM:i:0 AS:i:161 CR:Z:CTACACTCAGCGATTG UR:Z:CGAAGATTAT GX:Z:- GN:Z:- sS:Z:CTACACTCAGCGATTGCGAAGATTATTTCTTATATGGGGTGGCAGCGATGGCCTGCCTGATCTTCCACCTGCTCTCCCAGGGCCAAAGCCAGACCTGCTGA sQ:Z:FFFFFFFFFFFFFFFFFFFFFFFFFF,,FFFF,FFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF sM:i:-1 CB:Z:- UB:Z:-
A00708:13:HTC7MDSXX:4:1220:22544:28761 419 chr1 12006 0 101M = 12372 428 GCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF NH:i:6 HI:i:1 nM:i:0 AS:i:161 CR:Z:CTACACTCAGCGATTG UR:Z:CGAAGATTAT GX:Z:- GN:Z:- sS:Z:CTACACTCAGCGATTGCGAAGATTATGCCTTAGAGGGGGTGGCAGCGATGGCCTGCCTGATCTTCCACCTGCTCTCCCAGGGCCAAAGCCAGACCTGCTGA sQ:Z:FFFFFFFFFFFFFFFFFFFFFFFFFF:,FF,:,:FFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF sM:i:-1 CB:Z:- UB:Z:-
A00708:13:HTC7MDSXX:4:2650:18313:29168 163 chr1 12031 0 101M = 12372 403 CTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFF NH:i:7 HI:i:1 nM:i:0 AS:i:161 CR:Z:CTACACTCAGCGATTG UR:Z:CGAAGATTAT GX:Z:- GN:Z:- sS:Z:CTACACTCAGCGATTGCGAAGATTATTACTTATATGGTGTGGCAGCGATGGCCTGCCTGATCTTCCACCTGCTCTCCCAGGGCCAAAGCCAGACCTGCTGA sQ:Z:FFFFFFFFFFFFFFFFFFFFFF:FFF,,FFF:,FFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF sM:i:-1 CB:Z:- UB:Z:-
A00708:13:HTC7MDSXX:4:2262:27317:11287 163 chr1 12065 0 101M = 12372 369 GTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:7 HI:i:1 nM:i:0 AS:i:161 CR:Z:CTACACTCAGCGATTG UR:Z:CGAAGATTAT GX:Z:- GN:Z:- sS:Z:CTACACTCAGCGATTGCGAAGATTATTTCTTATATTGGGTGGCAGCGATGGCCTGCCTGATCTTCCACCTGCTCTCCCAGGGCCAAAGCCAGACCTGCTGA sQ:Z:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF sM:i:-1 CB:Z:- UB:Z:-