Detecting presence of UMI from BAM/fastq files without prior knowledge
1
0
Entering edit mode
8 months ago
artemd ▴ 20

Hi All, I have many bam and their fastq files from several different sources of whole genome sequencing experiments. some of those sources might have used UMI in their workflow, while others didn't. In the case UMIs were used, I don't have their structure/form.

Is there a way to detect whether fastq/bam files have UMIs?

I found that umi-tools could be used or fastp from this thread: Use fastp to preprocess FASTQ data with unique molecular identifer (UMI) integrated But in both cases the user needs to specify the format of the UMIs, which I don't have. In my case I want to detect whether UMIs were used in each of the samples I have (I also don't need to remove them, simply to detect if they were used in the sample)

Any suggestions would be appreciated. Thanks.

fastq bam picard sequencing • 417 views
ADD COMMENT
1
Entering edit mode
8 months ago

The only thing I can suggest is aligning the reads with a local aligner, and then checking if the same amount of non-aligning sequence is soft-clipped from the start/end of each read.

ADD COMMENT

Login before adding your answer.

Traffic: 1998 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6