Entering edit mode
6 months ago
LDT
▴
340
Hello Biostars Community,
I have a FASTQ file that contains sequences with UMI indices in the headers. I have already counted the UMI indices and saved a list of the top 1000 UMIs in a text file named top_UMIs.txt.
I would like to split my initial FASTQ file into 1000 separate FASTQ files, where each file contains the sequences corresponding to each UMI.
Could anyone provide guidance or code to accomplish this task?
You left out a critical bit of info. Where are these UMI currently? Inside sequences, in fastq headers? Show examples if possible.
thank you GenoMax! they are in the header
Can you show an example?
It may be best to start with original data (if you moved the UMI to header) and use
reaper
(LINK), which was recommended by UMI-tools author in a prior thread.Reaper works on barcodes in the sequence, not in the header.
I don't know if any tool that can do this out of the box. Might need to be a case of writing some custom code.