I have a problem, we have done a PE run which was not demultiplexed automatically. I can see barcode sequences at the end of reverse reads, so it should be possible. Is there any tools to make this operation? I can make my own script, but it will be very ineffective and slow (my fastqs are 70+ gb). Thanks in advance for any help!
At best you ask the sequencing provider as they do this routinely and should take them little to no effort. If that is not possible please show an example of the data, so one can understand where the barcoes are. Are they in the header?Is it a single "unassigned" fastq file? There have been quite some posts here on that, please also use the search function. Demultiplexing fastq.gz files
Thank you for the reply, seems like I found partial solution with FastX toolkit barcode split, but only for the reverse reads which contains barcodes. Is there an easy way to extract forward reads from big fastq that are paired with properly demultiplexed reverse reads?
I can see barcode sequences at the end of reverse reads,
That means you are using barcodes that are integral part of the sequence reads. If that is the case then you need something like sabre: https://github.com/najoshi/sabre
At best you ask the sequencing provider as they do this routinely and should take them little to no effort. If that is not possible please show an example of the data, so one can understand where the barcoes are. Are they in the header?Is it a single "unassigned" fastq file? There have been quite some posts here on that, please also use the search function. Demultiplexing fastq.gz files
Thank you for the reply, seems like I found partial solution with FastX toolkit barcode split, but only for the reverse reads which contains barcodes. Is there an easy way to extract forward reads from big fastq that are paired with properly demultiplexed reverse reads?
What application is this for? it's possible that the people who designed the protocol also made a demultiplexer too.