Entering edit mode
2.0 years ago
ashwini
•
0
Hi everyone,
I have generated an alignment file from a scRNA demultiplexing pipeline and now need to match the format of the tags present in my BAM file to those present in files generated by CellRanger. Specifically I need to append a sample identifier to the cell barcode (CB) tag and duplicate the UMI tag because 10X files have UR and UB.
Current BAM output:
01_01_14__R__49_1_14__CTGCTTTG_AACGTGAT_AACCGAGA__GGCGCTTTTT__221014Su_CAGATC 0 hg38_2111123897 255 20S94M * 0 0 GTGGTATCAACGCAGAGTGAAAGGGGACAGCTGCCCCCACGGCAGCCCTCAGGGCCCGCTGGCCCCACCTGCCAGCCCTGGCCCTTTTGCTACCAGATCCCCGCTTTTCATCTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:92 nM:i:0 GX:Z:ENSG00000153094 GN:Z:BCL2L11 pN:Z:GGCGCTTTT CR:Z:CTGCTTTG_AACGTGAT_AACCGAGA CB:Z:01_01_14 pB:Z:49_1_14 pS:Z: RE:A:I
I have to append s1 to the end of the CB tag (CB:Z:01_01_14_s1) and duplicate pN and rename them to UR and UB.
Is there a simple way to do this using either sed and BASH or pysam? Thanks!