Question

PICARD MarkDuplicates with random barcodes

0

Entering edit mode

7.9 years ago

Nicolas Rosewick 11k

Hi,

I've paired-end 2x100bp targeted DNA-seq reads that spans multiple regions in the genome. Read 2 contains 2 barcodes :

bp 1-10 : barcode 1
bp 11-19 : barcode 2

These barcodes are usefull to distinguish the differents samples (barcode 2) , and between DNA fragment (barcode 2). What I want is a bam file for each sample and to remove the duplicate reads (same barcode 1 and same alignment position). I saw in PICARD MarkDuplicates a barcode option :

BARCODE_TAG (String) Barcode SAM tag (ex. BC for 10X Genomics) Default value: null.

READ_ONE_BARCODE_TAG (String) Read one barcode SAM tag (ex. BX for 10X Genomics) Default value: null.

READ_TWO_BARCODE_TAG (String) Read two barcode SAM tag (ex. BX for 10X Genomics) Default value: null.

But I'm a little bit lost how to specify to picard the position within read 2 to check. Any ideas ?

If PICARD is not suited for this task, I thought to parse R2 and extract barcode 1 and 2 remove the duplicates by checking alignment position and barcode informations..

Thanks

edit : I just found this paper discussing barcodes (or UMIs) : http://genome.cshlp.org/content/early/2017/01/18/gr.209601.116.abstract . A good start

picard barcode • 4.0k views

ADD COMMENT • link 7.9 years ago by Nicolas Rosewick 11k

0

Entering edit mode

edit : I just found this paper discussing barcodes (or UMIs)

The Ph. D. thesis of Kasper Karlsson is also a very good read about UMIs.

ADD REPLY • link 7.9 years ago by Charles Plessy ★ 2.9k