You can try quick and dirty approach, like using awk.
samtools view outs/possorted_bam.bam |head -2
D00733:486:CE5HCANXX:8:2109:16462:81597 1123 chr1 9998 0 50M = 10029 81 CCATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC 3:3<AE;=@@GGB>0E@11E>F1EB0C1:@1::CDG1EGGGG1<<F>1<: NM:i:1 MD:Z:1G48
MC:Z:50M AS:i:48 XS:i:50 CR:Z:AAACGAACACACATGT CY:Z:<33:@;/;/00//00@ CB:Z:AAACGAACACACATGT-1 BC:Z:AGGCTACC QT:Z:3A30A1?F GP:i:9997 MP:i:10078 MQ:i:0 RG:Z:High:MissingLibrary:1:CE5HCANXX:8
D00733:486:CE5HCANXX:8:1316:13949:92544 163 chr1 9998 0 50M = 10016 68 CCATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC BCBBBCGGGGBGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGEFGGGGB NM:i:1 MD:Z:1G48
MC:Z:50M AS:i:48 XS:i:50 CR:Z:CTCTACGGTTAGGAGC CY:Z:<?BB>G=BCG<G>DB0 CB:Z:CTCTACGGTTAGGAGC-1 BC:Z:CTAGCTGT QT:Z:BBC??FF@ GP:i:9997 MP:i:10065 MQ:i:0 RG:Z:High:MissingLibrary:1:CE5HCANXX:8-46FD2F59
So you can try to get the desired part (Assuming the CB
tag stores the barcode information):
samtools view outs/possorted_bam.bam | head | awk -v OFS="\t" -F" " '{ print $3,$4,$4,$1";"$19}'
chr1 9998 9998 D00733:486:CE5HCANXX:8:2109:16462:81597;CB:Z:AAACGAACACACATGT-1
chr1 9998 9998 D00733:486:CE5HCANXX:8:1316:13949:92544;CB:Z:CTCTACGGTTAGGAGC-1
chr1 9998 9998 D00733:486:CE5HCANXX:8:1315:2269:7307;CB:Z:ACAGGCCTCGTCCCTA-1
chr1 9998 9998 D00733:486:CE5HCANXX:8:2206:20350:83100;CB:Z:GAAACAAGTACGGATG-1
chr1 9998 9998 D00733:486:CE5HCANXX:8:2104:3289:54151;CB:Z:GTAGACTTCAGAGTGG-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:2211:2487:73034;CB:Z:TCCATCGCATACCCGG-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:2201:15714:3416;CB:Z:TGCCTCAAGTCGAAAT-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:2201:11407:27253;CB:Z:CCTGCTAGTAGCTGTT-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:1105:15836:96636;CB:Z:AGCCAGCCACATTGCA-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:2302:12853:92500;CB:Z:CGTTCCATCTGCGTCT-1
Last coulmn is readname;barcode
.
A better approach is to look for CB:Z:
tag using perl ( look for CB:Z: tag and print all the following characters until you hit a tab).
samtools view outs/possorted_bam.bam | head |
perl -nle '@reads=split(/\t/,$_); if (m/CB:Z:([^\t\n]+)\t/) { print "$reads[2]\t","$reads[3]\t","$reads[3]\t","$reads[0]_",$1; }
chr1 9998 9998 D00733:486:CE5HCANXX:8:2109:16462:81597_AAACGAACACACATGT-1
chr1 9998 9998 D00733:486:CE5HCANXX:8:1316:13949:92544_CTCTACGGTTAGGAGC-1
chr1 9998 9998 D00733:486:CE5HCANXX:8:1315:2269:7307_ACAGGCCTCGTCCCTA-1
chr1 9998 9998 D00733:486:CE5HCANXX:8:2206:20350:83100_GAAACAAGTACGGATG-1
chr1 9998 9998 D00733:486:CE5HCANXX:8:2104:3289:54151_GTAGACTTCAGAGTGG-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:2211:2487:73034_TCCATCGCATACCCGG-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:2201:15714:3416_TGCCTCAAGTCGAAAT-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:2201:11407:27253_CCTGCTAGTAGCTGTT-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:1105:15836:96636_AGCCAGCCACATTGCA-1
chr1 9999 9999 D00733:486:CE5HCANXX:8:2302:12853:92500_CGTTCCATCTGCGTCT-1
If you want to have more control like extending read position based on strand or selecting reads with any specific features, you have to use pysam or something similar so you can print the desired output.
I don't know if this is a good approach but you could try
bamtobed
frombedtools
.