Hello all,
I have a pooled sequence file named "ERR1806550_1.fastq.gz" containing single-end sequences. Now, I want to demultiplex this sequence file and extract 37 sample sequences of my interest from it. These are the barcode sequences of those 37 samples that I want to extract from this "ERR1806550_1.fastq.gz" file :
AACACC
AACCAG
AACGGA
CCGTTA
CCTAAG
CCTTCT
CGAGTT
GAACGT
GACTTC
GAGTCA
GATGAC
GCACTA
GCCATT
GCTTGA
GGAGAA
GGATCT
GTAACC
TATGCG
AAGAGG
AAGCCT
AAGGTC
TCAGAG
TCCTTG
TCGACT
AATCGC
ACAACG
ACCGAT
TCTAGC
TGACCA
TGGAAG
ACCTCA
CAACTC
CAAGCA
TGGTGA
TGTGTC
TTCCGT
So, far I used this script to extract the sample sequences:
grep -B1 -A2 "^AGCACTGTAG" file.fastq | grep -v "^--$" > out.fq
but after extracting the sample sequences and processing them in dada2, it is showing an error:
Error in add(bin) : record does not start with '@'
but I checked every file, each one started with @
I think maybe I am not able to demultiplex the file correctly, Kindly help regarding this concern.
Where are these barcodes located? Are they within the actual sequence or are they in the header?
Sorry, actually they are within the actual sequences