I have a large bam file that I need to subset/split into separate bam files by unique entries of one of the optional TAG:TYPE:VALUE fields. As a simplified example, say these are the optional fields in my example.bam file (i.e. fields following the mandatory 1-11 fields):
NH:i:6 HI:i:3 AS:i:94 nM:i:1 NM:i:1 CR:Z:TGCCCATCACTCAGGC CY:Z:CCCBBFGGGGGGGGGG CB:Z:TGCCCATCACTCAGGC-1 UR:Z:ACGCCCAGGT UY:Z:GGGGGGGGGG UB:Z:ACGCCCAGGT BC:Z:TGCGCAGC QT:Z:CCCCCGGG RG:Z:XXXXXX
NH:i:6 HI:i:3 AS:i:94 nM:i:1 NM:i:1 CR:Z:TGCCCATCACTCAGGC CY:Z:CCCBBFGGGGGGGGGG CB:Z:TGCCCATCACTCAGGC-1 UR:Z:ACGCCCAGGT
NH:i:7 HI:i:3 AS:i:96 nM:i:0 NM:i:0 CR:Z:TGCCCATCACTCAGGC CY:Z:BCBBCFGGGGGGGGGG CB:Z:TGCCCATCACTCAGGC-1 UR:Z:ACGCCCAGGT
NH:i:7 HI:i:3 AS:i:96 nM:i:0 NM:i:0 CR:Z:TGCCCATCACTCAGGC CY:Z:CCCCCGGGGGGGGGGG CB:Z:TGCCCATCACTCAGGC-1 UR:Z:ACGCCCAGGT
NH:i:3 HI:i:2 AS:i:96 nM:i:0 NM:i:0 CR:Z:TGCCCATCACTCAGGC CY:Z:CCCCCGGGGGGGGGGG CB:Z:TGCCCATCACTCAGGC-1 UR:Z:ACGCCCAGGT
NH:i:3 HI:i:2 AS:i:96 nM:i:0 NM:i:0 CR:Z:TGCCCATCACTCAGGC CY:Z:CCCCCGGGGGGGGGGG CB:Z:TGCCCATCACTCAGGC-1 UR:Z:ACGCCCAGGT
NH:i:6 HI:i:4 AS:i:96 nM:i:0 NM:i:0 CR:Z:CCGGTAGGTCATACTG CY:Z:CCCCCGGGGGGGGGGG CB:Z:CCGGTAGGTCATACTG-1 UR:Z:GCCGCCTTCT
NH:i:6 HI:i:4 AS:i:96 nM:i:0 NM:i:0 CR:Z:CCGGTAGGTCATACTG CY:Z:CCCCCGGGGGGGGGGG CB:Z:CCGGTAGGTCATACTG-1 UR:Z:GCCGCCTTCT
NH:i:6 HI:i:4 AS:i:96 nM:i:0 NM:i:0 CR:Z:CCGGTAGGTCATACTG CY:Z:CCCCCGGGGGGGGGGG CB:Z:CCGGTAGGTCATACTG-1 UR:Z:GCCGCCTTCT
NH:i:6 HI:i:4 AS:i:96 nM:i:0 NM:i:0 CR:Z:CCGGTAGGTCATACTG CY:Z:BBBCBGGEGGGGGGGG CB:Z:CCGGTAGGTCATACTG-1 UR:Z:GCCGCCTTCT
NH:i:6 HI:i:4 AS:i:96 nM:i:0 NM:i:0 CR:Z:CCGGTAGGTCATACTG CY:Z:CCCCCGGGGGGGGGGG CB:Z:CCGGTAGGTCATACTG-1 UR:Z:GCCGCCTTCT
How would I split this bam file by unique CB:Z: entries, such that the results would be 2 bam files as follows:
TGCCCATCACTCAGGC-1.bam (contains 6 reads) CCGGTAGGTCATACTG-1.bam (contains 5 reads)
Thanks! D
See this past thread: Sorting .bam file by tag
looks like
bamtools
(https://github.com/pezmaster31/bamtools/wiki ) may be able to do this.bamtools split -in test.bam -tag CB
Try this but the problem is there are too many files and bamtools can't open the bam file for writing.
See https://github.com/pezmaster31/bamtools/issues/135