Umi_tools regex blank output
2
0
Entering edit mode
22 days ago

Hello,

I am looking for assistance using regex for umi extraction with umi tools.

My reads contain a 23 bp universal sequence followed by a 12 bp umi.

I know these are at the 5' end of my read, because when I extract the first 35 bp with umi_tools to visually inspect, the sequences are exactly as expected.

However, when I attempt to use regex to pull out the umi (which I would prefer to do for multiple reasons), no matches are found.

The code I am using:

umi_tools extract --stdin=test.filt.fastq \
--extract-method=regex\
--bc-pattern='(?P<discard_1>TCTTACGATTACGCCAACCACTG{e<=2})(?P<umi_1>.{12})'--stdout test.processed.fastq

Any guidance is greatly appreciated!

Thank you

regex umi_tools • 324 views
ADD COMMENT
0
Entering edit mode
22 days ago

I think, you may be missing the part of the regex that does match the remainder of the reads (e.g. .* or .{1,120} ): --bc-pattern='(?P<discard_1>TCTTACGATTACGCCAACCACTG{e<=2})(?P<umi_1>.{12}).*'

ADD COMMENT
0
Entering edit mode
22 days ago

You need a space between (?P<discard_1>TCTTACGATTACGCCAACCACTG{e<=2})(?P<umi_1>.{12}) and --stdout currently the pattern is being parsed as including the --stdout, and since none of your reads include --stdout they are being rejected. A .* isn't needed, but wouldn't hurt.

ADD COMMENT
0
Entering edit mode

Thank you SO much! I apologize for not catching that error.

ADD REPLY

Login before adding your answer.

Traffic: 2400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6