Question

Rapid BaseQ drop in adaptor regions with UMI

0

Entering edit mode

5 weeks ago

geneticatt ▴ 140

Hi everyone,

I am using UMIs for the first time and the library design is such that the UMI barcode exists between the P7 cluster forming adapter sequence and the P7 seq primer binding adapter sequence. This means that I need to UMI extract and deduplicate using information from the 3' end of the R1 reads.

I am trying to use umi_tools for this purpose, but I found that pattern matching was yielding far fewer matches than expected. When I assessed BaseQ of the reads 3' ends, I saw a pattern illustrated by the image, where the BaseQ dramatically decreases when the adaptor region is reached.

I am wondering if this is a common issue and what the cause could be. First, it's puzzling to me to choose to sequence UMIs at the end of the reads where baseQ drop normally occurs, but I think that in this case, the magnitude of the drop is far beyond the normal reduction from phasing issues. Could anyone make a suggestion on what to test for and how to proceed? I do not think I can use these reads with such low quality UMIs.

Thank you,

Alex

Representative BaseQ dip in samples.

adapter fastqc umi adaptor • 348 views

ADD COMMENT • link updated 5 weeks ago by i.sudbery 20k • written 5 weeks ago by geneticatt ▴ 140

score 0 · Answer 1 · 2024-11-12

0

Entering edit mode

5 weeks ago

GenoMax 148k

I am wondering if this is a common issue and what the cause could be.

Such Q score drops are common when the sequencer encounters low nucleotide diversity. Are your UMI's before the drop we see (that is likely because of sequencing into the adapter) or are they in the drop region.

First, it's puzzling to me to choose to sequence UMIs at the end of the reads where baseQ drop normally occurs, but I think that in this case, the magnitude of the drop is far beyond the normal reduction from phasing issues.

Why is that puzzling? Looks like UMI's were designed to be in that location. Low nucleotide diversity = poor Q scores. Generally adding phiX or other neutral DNA will help counter this drop.

I do not think I can use these reads with such low quality UMIs.

As long as your UMI's have no N's or weird repeats the sequence should be fine to use. I assume you will simply trim off the adapter to the 3' side. You may want to do that before using umi-tools.

ADD COMMENT • link 5 weeks ago by GenoMax 148k

0

Entering edit mode

My understanding from the OP's post is that the UMIs are AFTER the adaptor , not before it (and therefore would be after the drop in quality, not before it).

If that understanding is correct, then the problem would that it would be difficult to find the UMI without an intact adaptor sequence to register to.

My understanding was that when UMIs were in this position, the idea was that they would be present in the barcode read, not either of the insert reads.

ADD REPLY • link 5 weeks ago by i.sudbery 20k

0

Entering edit mode

the idea was that they would be present in the barcode read, not either of the insert reads.

If that is the case then it should be possible to get the index reads in a separate file and then the data can be trimmed normally. Would umi_tools be able to use the UMI's present in the index read file assuming the basecalls are not compromised?