Can I use umi_tools to discard cell barcodes and extract the UMI while keeping the UMI in the read?
1
1
Entering edit mode
18 months ago
cag104 ▴ 10

I have paired end reads where the read structure is as follows:

>Read1
5'-6N-20CB-XXXXXXXXXX....XXXXGS-3'

The 6N corresponds to the UMI, the 20CB corresponds to 20 basepairs of cell barcode and X..XGS corresponds to genomic sequence.

I want to extract the UMIs and place them in the header, but NOT discard them (reattach them to the read similarly to how umi_tools treats X's) and discard the 20bp of cell barcode so that the read structure becomes:

>Read1::NNNNNN
5'-6N-XXXXXXXXXX....XXXXGS-3'

Is this possible to do with umi_tools? If not, what would you suggest?

umitools umi_tools • 999 views
ADD COMMENT
0
Entering edit mode

Do you mean that you don't want the cell barcode to be transffered to the read header?

ADD REPLY
0
Entering edit mode
18 months ago

You were happy to transfer the CB to the read header as well, then you could do:

$ umi_tools extract -I read1.fastq.gz --read2-in=read2.fastq.gz --stdout=read1.cb.fastq.gz --read2-out=read2.cb.fastq.gz --bc-pattern='^.{6}(?P<umi_1>.{20}' --extract-method=regex  --log=cb.log

$ umi_tools extract -I read1.cb.fastq.gz --read2-in=read1.cb.fastq.gz --stdout=/dev/null --read2-out=read1.umi.fastq.gz --bc-pattern=NNNNNN --log=read1.umi.log

$ umi_tools extract -I read1.cb.fastq.gz --read2-in=read2.cb.fastq.gz --stdout=/dev/null --read2-out=read2.umi.fastq.gz --bc-pattern=NNNNNN --log=read2.umi.log

Obviously if you only have 1 read you can skip the final step, and the read2 bits in the first step.

ADD COMMENT

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6