Question

Detecting presence of UMI from BAM/fastq files without prior knowledge

0

Entering edit mode

13 months ago

artemd ▴ 20

Hi All, I have many bam and their fastq files from several different sources of whole genome sequencing experiments. some of those sources might have used UMI in their workflow, while others didn't. In the case UMIs were used, I don't have their structure/form.

Is there a way to detect whether fastq/bam files have UMIs?

I found that umi-tools could be used or fastp from this thread: Use fastp to preprocess FASTQ data with unique molecular identifer (UMI) integrated But in both cases the user needs to specify the format of the UMIs, which I don't have. In my case I want to detect whether UMIs were used in each of the samples I have (I also don't need to remove them, simply to detect if they were used in the sample)

Any suggestions would be appreciated. Thanks.

fastq bam picard sequencing • 632 views

ADD COMMENT • link updated 13 months ago by i.sudbery 21k • written 13 months ago by artemd ▴ 20

score 1 · Answer 1 · 2024-03-12

1

Entering edit mode

13 months ago

i.sudbery 21k

The only thing I can suggest is aligning the reads with a local aligner, and then checking if the same amount of non-aligning sequence is soft-clipped from the start/end of each read.

ADD COMMENT • link 13 months ago by i.sudbery 21k