I have multiplexed pair-end fastq reads with dual barcodes. The issue is that one barcode is present in the header and one is present at the beginning of the read. I need a method to demultiplex this data, but in order to assign a read to an individual, both barcodes are required, as there is overlap between the barcodes. It seems there are packages available to demultiplex using header ID or in-line barcodes to demultiplex, but not both.
example reads:
@700819F:525:HT235BCXX:2:1101:1139:2144 1:N:0:ATCACGAT
CGAATTGCAGATTTTTTCTGAATAAAGCAGTGCAATAAAATTCCCCGCAAAAACACTTNANNNGNNNNNNNNNNNNNNNNNNNNANNNNNNGNTAATAAA
+
GGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII#<###.####################.######<#.<<GGGI
@700819F:525:HT235BCXX:2:1101:1212:2172 1:N:0:ATCACGAT
AAGGATGCAGGGCATCTCCCTCAGGCTGCGCTCTATCGAAGTCATCCCAGAATTAGATTCCGACCACAGACCAGTCTTAGTCAAACTAGGACCCGAGTGT
+
GGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIGIIIIIIGGIIIGIIIIIIIIIIIIIGGIGIGIIIGIGI
@700819F:525:HT235BCXX:2:1101:1110:2173 1:N:0:ATCACGAT
GGTTGTGCAGAAAGAGTTGCTGATAAACTTAGCCATGCAGAACAGAATTATGAGTTAGAAGTATGTATATATATACCAATCACTATATCAACCCATTACC
+
<G.G<GGIIIG.GA.GAGAAGG<AA<A<.<<<GA<.<<.G<.G<<A<GGAA....<G.G..<<.GA..<A.<GG<<<.<..<<.GG.A..G..<<<.<<G
Thanks in advance.
First, use a program that demultiplex by header. After that, use a program that demultiplex by inline barcode.
I've tried that approach, but was unsuccessful. In order to assign reads to an individual, both barcodes are required. The barcodes alone are not unique to individuals, but in combination, they are and can be used to assign reads to a sample.
I fail to see how this approach don't work. Suppose you have two barcodes in the header and two barcodes inline, identifyng four individuals. First you demultiplex by header, second you demultiplex separately each of two resulting fastqs by inline barcode:
You end up with your four individuals identified.
Looking at the example above step [1] is already done.
For step [2] you may want to look at
fastp
(fastp, the ultra-fast FASTQ preprocessing tool, is now on BioConda )