I have HiSeq paired-end run data for an assay using our own dual barcodes to ID specific samples that I need to demultiplex before sample specific analysis.
I'm having trouble finding a tool for the job. The person who used to do this would basically use a single barcode demultiplexing software and run multiple times, but it seems a little messy to me.
I've tried to write my own tool, which works but feels pretty slow. I'm just using my own computer, so I have no idea how long this kind of process usually takes / if people generally do this on a dedicated machine / cluster.
Any suggestions?
I think I'm being unclear. I have samples that have already demultiplexed by the illumina machine by index. When designing our PCR primers, the actual primers were preceded by a golay or hamming barcode. The combination of forward and reverse primer barcode uniquely identifies a sample name and assay
So, I need to demultiplex based on the first 12 or 8 nt from the 5' end of each paired-end read.
Just to verify, I can use these tools to demultiplex .fastq files based on our own barcodes barcodes from our primers? The reads have been demultiplex from the indices already by the machine.
Oh, you're wanting to start with the FASTQ files and split them out based on the index value in the first line of each read, instead of starting from the basecall data?
If you're starting with FASTQ data, can you please post the output of
zcat [fastq_file] | head
(if the data are gzipped) or justhead [fastq_file]
(if not)?I want split them out based on the first 12 nt (or 8nt) in each read, not from the illumina indices. Unfortunately there are all possible combinations of 12/8 on the table too (ie. 12nt on R1, 12nt on R2, 12nt on R1, 8nt on R2...)
Answer updated.