Demultiplex pair-end fastq reads with barcode 2 in the identifier line
1
0
Entering edit mode
6.9 years ago
cb1579 • 0

I have multiplexed pair-end fastq reads with dual barcodes. The issue is that one barcode is present in the header and one is present at the beginning of the read. I need a method to demultiplex this data, but in order to assign a read to an individual, both barcodes are required, as there is overlap between the barcodes. It seems there are packages available to demultiplex using header ID or in-line barcodes to demultiplex, but not both.

example reads:

@700819F:525:HT235BCXX:2:1101:1139:2144 1:N:0:ATCACGAT
CGAATTGCAGATTTTTTCTGAATAAAGCAGTGCAATAAAATTCCCCGCAAAAACACTTNANNNGNNNNNNNNNNNNNNNNNNNNANNNNNNGNTAATAAA
+
GGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII#<###.####################.######<#.<<GGGI
@700819F:525:HT235BCXX:2:1101:1212:2172 1:N:0:ATCACGAT
AAGGATGCAGGGCATCTCCCTCAGGCTGCGCTCTATCGAAGTCATCCCAGAATTAGATTCCGACCACAGACCAGTCTTAGTCAAACTAGGACCCGAGTGT
+
GGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIGIIIIIIGGIIIGIIIIIIIIIIIIIGGIGIGIIIGIGI
@700819F:525:HT235BCXX:2:1101:1110:2173 1:N:0:ATCACGAT
GGTTGTGCAGAAAGAGTTGCTGATAAACTTAGCCATGCAGAACAGAATTATGAGTTAGAAGTATGTATATATATACCAATCACTATATCAACCCATTACC
+
<G.G<GGIIIG.GA.GAGAAGG<AA<A<.<<<GA<.<<.G<.G<<A<GGAA....<G.G..<<.GA..<A.<GG<<<.<..<<.GG.A..G..<<<.<<G

Thanks in advance.

sequence demultiplex fastq • 3.3k views
ADD COMMENT
0
Entering edit mode

First, use a program that demultiplex by header. After that, use a program that demultiplex by inline barcode.

ADD REPLY
0
Entering edit mode

I've tried that approach, but was unsuccessful. In order to assign reads to an individual, both barcodes are required. The barcodes alone are not unique to individuals, but in combination, they are and can be used to assign reads to a sample.

ADD REPLY
0
Entering edit mode

I fail to see how this approach don't work. Suppose you have two barcodes in the header and two barcodes inline, identifyng four individuals. First you demultiplex by header, second you demultiplex separately each of two resulting fastqs by inline barcode:

                  [1]                 [2]
original.fastq ___----> header1.fastq ----> header1_inline1.fastq
                  |                   |
                  |                   |_--> header1_inline2.fastq
                  |
                  |_--> header2.fastq ----> header2_inline1.fastq
                                      |
                                      |_--> header2_inline2.fastq

You end up with your four individuals identified.

ADD REPLY
0
Entering edit mode

Looking at the example above step [1] is already done.

For step [2] you may want to look at fastp (fastp, the ultra-fast FASTQ preprocessing tool, is now on BioConda )

ADD REPLY
1
Entering edit mode
6.9 years ago
Charles Plessy ★ 2.9k

How about pasting your first barcode to the reads, and demultiplexing with virtual barcodes that represent all the combinations of barcode 1 and 2 ? Here is one way to paste.

paste -d '' <(grep '^@' test.fq | sed s/.*:// | perl -ne 'chomp;print"\n$_\n\n","I"x(length($_)),"\n"') test.fq 
@700819F:525:HT235BCXX:2:1101:1139:2144 1:N:0:ATCACGAT
ATCACGATCGAATTGCAGATTTTTTCTGAATAAAGCAGTGCAATAAAATTCCCCGCAAAAACACTTNANNNGNNNNNNNNNNNNNNNNNNNNANNNNNNGNTAATAAA
+
IIIIIIIIGGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII#<###.####################.######<#.<<GGGI
etc...

The command is a bit cryptic, but basically it reads as follows: to the original file, paste without delimiter a virtual file made by extracting the barcode sequence from the read names, and for each barcode outputting and empty line, followed by a line containing the barcode, followed by an empty line, followed by a quality line with one "I" per base in teh barcode.

ADD COMMENT
0
Entering edit mode

Thanks, this is what I was looking for. I am feeding multiplexed RADSEq reads into ipyrad to look for SNPs, so this is ideal because it works with the program nicely.

ADD REPLY
0
Entering edit mode

Go ahead and accept this answer (green check mark) to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 1890 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6