Question

Tool:umitools - working with UMI incorporated data

2

Entering edit mode

10.4 years ago

Joe Brown ▴ 70

availability: https://github.com/brwnj/umitools

umitools facilitates the processing of data that has incorporated a unique molecular identifier (UMI). It assumes the UMI is incorporated as part of the read.

Using the IUPAC sequence design of the UMI, strip the sequence from the 5' end of the fastq:

umitools trim --end 5 unprocessed_fastq.gz NNNNNV > out.fq

The UMI sequence for reads are appended onto the read name and processed again after the reads are mapped. Duplicate UMIs at any given start site need to be removed:

umitools rmdup unprocessed.bam out.bam > before_after.bed

EDIT:

I've updated this to account for mismatches among a given UMI sequence set at a start site. This allows the user to essentially merge very similar UMIs into fewer representative sequences.

umitools rmdup --mismatches 1 unprocesed.bam out.bam > before_after.bed

UMI sequencing • 6.3k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 10.4 years ago by Joe Brown ▴ 70

0

Entering edit mode

Dose umitools adapt to paired-end data(PE is popular in NGS analysis)?

ADD REPLY • link 10.1 years ago by xfliwz ▴ 50

0

Entering edit mode

PE is popular? What are you trying to do? What's your UMI incorporation design?

ADD REPLY • link 10.1 years ago by Joe Brown ▴ 70

0

Entering edit mode

Hello, in my PE reads, both 1.fq and 2.fq have UMIs.

1.fq: UMI1=============
      2.fq: UMI2=============

To take advantage of UMIs, I should take two UMIs into consideration.

So, does umitools can solve my problem?

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.8 years ago by xfliwz ▴ 50

0

Entering edit mode

unexpected problem with this tool: paired-end reads find themselves with different names, which causes BWA-MEM to quit. What aligner do you use downstream of umitools that does not require paired reads to have the same name?

ADD REPLY • link 9.4 years ago by sowalsky • 0

0

Entering edit mode

I could make this work on PE reads, but it's unclear how I would be counting the UMIs at a given start. Would you want to remove R1s independently of R2s?

If you were interested in sharing data with me I think we can get it worked out. If you've already solved it and made the code available somewhere, I'd love to check it out!

ADD REPLY • link 9.0 years ago by Joe Brown ▴ 70