I have a large fastq file with 100-base reads from a pooled barcoding experiment. This is not data I generated so I have limited options.
The barcodes are 21-mer and there are up to 100,000 different barcodes in the FASTQ.
The barcode is flanked by two static sequences from the constructs such that:
Flanking 1 Barcode Flanking 2
[...]AAAGGACGAAACACC NNNNNNNNNNNNNNNNNNNNN GTTTCAGAGCTATGC[...]
I need to parse all the reads of the FASTQ file looking for all possible barcodes that fit this pattern of sequence and either count them, or output only the barcodes so I can count them manually.
I can use bbduk.sh to match a single barcode to the data, and I also looked at PoolQ but I can't generate a reference which PoolQ seems to require.
I only have the information included here and the FASTQ file.
Can someone suggest a tool to do this?