Question

How to demultiplex a pooled fastq sequence file and extract each sample sequences

0

Entering edit mode

19 months ago

rishav513 ▴ 30

Hello all,

I have a pooled sequence file named "ERR1806550_1.fastq.gz" containing single-end sequences. Now, I want to demultiplex this sequence file and extract 37 sample sequences of my interest from it. These are the barcode sequences of those 37 samples that I want to extract from this "ERR1806550_1.fastq.gz" file :

AACACC
AACCAG
AACGGA
CCGTTA
CCTAAG
CCTTCT
CGAGTT
GAACGT
GACTTC
GAGTCA
GATGAC
GCACTA
GCCATT
GCTTGA
GGAGAA
GGATCT
GTAACC
TATGCG
AAGAGG
AAGCCT
AAGGTC
TCAGAG
TCCTTG
TCGACT
AATCGC
ACAACG
ACCGAT
TCTAGC
TGACCA
TGGAAG
ACCTCA
CAACTC
CAAGCA
TGGTGA
TGTGTC
TTCCGT

So, far I used this script to extract the sample sequences:

grep -B1 -A2 "^AGCACTGTAG" file.fastq | grep -v "^--$" > out.fq

but after extracting the sample sequences and processing them in dada2, it is showing an error:

Error in add(bin) : record does not start with '@'

but I checked every file, each one started with @

I think maybe I am not able to demultiplex the file correctly, Kindly help regarding this concern.

fastq demultiplexing files • 1000 views

ADD COMMENT • link 19 months ago by rishav513 ▴ 30

0

Entering edit mode

Where are these barcodes located? Are they within the actual sequence or are they in the header?

ADD REPLY • link 19 months ago by GenoMax 147k

0

Entering edit mode

Sorry, actually they are within the actual sequences

ADD REPLY • link 19 months ago by rishav513 ▴ 30

score 1 · Answer 1 · 2023-05-02

Use demuxbyname.sh from BBMap suite.

See in-line help.

demuxbyname.sh in=<file> out=<outfile> delimiter=: prefixmode=f
This will split on colons, and use the last substring as the name; useful for
demuxing by barcode for Illumina headers in this format:
@A00178:73:HH7H3DSXX:4:1101:13666:1047 1:N:0:ACGTTGGT+TGACGCAT

out=<file>      Output files for reads with matched headers (must contain % symbol).
                For example, out=out_%.fq with names XX and YY would create out_XX.fq and out_YY.fq.
                If twin files for paired reads are desired, use the # symbol.  For example,
                out=out_%_#.fq in this case would create out_XX_1.fq, out_XX_2.fq, out_YY_1.fq, etc.