PICARD MarkDuplicates with random barcodes
0
0
Entering edit mode
7.5 years ago

Hi,

I've paired-end 2x100bp targeted DNA-seq reads that spans multiple regions in the genome. Read 2 contains 2 barcodes :

  • bp 1-10 : barcode 1
  • bp 11-19 : barcode 2

These barcodes are usefull to distinguish the differents samples (barcode 2) , and between DNA fragment (barcode 2). What I want is a bam file for each sample and to remove the duplicate reads (same barcode 1 and same alignment position). I saw in PICARD MarkDuplicates a barcode option :

BARCODE_TAG (String) Barcode SAM tag (ex. BC for 10X Genomics) Default value: null.

READ_ONE_BARCODE_TAG (String) Read one barcode SAM tag (ex. BX for 10X Genomics) Default value: null.

READ_TWO_BARCODE_TAG (String) Read two barcode SAM tag (ex. BX for 10X Genomics) Default value: null.

But I'm a little bit lost how to specify to picard the position within read 2 to check. Any ideas ?

If PICARD is not suited for this task, I thought to parse R2 and extract barcode 1 and 2 remove the duplicates by checking alignment position and barcode informations..

Thanks

edit : I just found this paper discussing barcodes (or UMIs) : http://genome.cshlp.org/content/early/2017/01/18/gr.209601.116.abstract . A good start

picard barcode • 3.8k views
ADD COMMENT
0
Entering edit mode

edit : I just found this paper discussing barcodes (or UMIs)

The Ph. D. thesis of Kasper Karlsson is also a very good read about UMIs.

ADD REPLY

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6