I got a set of Illumina files which are barcoded in the sequence identifier instead (barcode is not part of the sequence), therefore we cannot use fastx_barcode_splitter.pl or similar scripts. Example:
@HWI-ST132_459:6:2208:20745:200766#AGTTCC/1
CCCAGGGGGTTGCTAGGTTGAAAGAGAAGAACTAAGCTTAAATTTGTTGTACATTGTATATAATTACAAAGTGTTATGTTATTATTATTAAAAAAAAAAA
+
ca^WcZX[D_T]GQI^]^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST132_459:6:2208:21328:200860#AGTTCC/1
CATTTTGGTGGGTTGTGGTTTTGGGGGGTTTGTTGTTGGGTTTTATAAGGTGGTTTTTTTTAATAAGTAAAAATAAAAAAAAAAATTAAGAATAAAAAAA
+
]TPKODYF[TSHWUQRRGZV`N_Y`c\abc]]D_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Anybody knows a program that is able to split/demultiplex these datasets?
Are the barcodes and read names delimited by something? It looks like the "_" character separates the barcoding and read names?
The barcode in the above mentioned example is AGTTCC, always after the number sign (#) and before the slash, although in some other datasets it comes as the six last characters of the identifier line