I have paired end reads where the read structure is as follows:
>Read1
5'-6N-20CB-XXXXXXXXXX....XXXXGS-3'
The 6N corresponds to the UMI, the 20CB corresponds to 20 basepairs of cell barcode and X..XGS corresponds to genomic sequence.
I want to extract the UMIs and place them in the header, but NOT discard them (reattach them to the read similarly to how umi_tools treats X's) and discard the 20bp of cell barcode so that the read structure becomes:
>Read1::NNNNNN
5'-6N-XXXXXXXXXX....XXXXGS-3'
Is this possible to do with umi_tools? If not, what would you suggest?
Do you mean that you don't want the cell barcode to be transffered to the read header?