How to Split FASTQ File by UMI Indices into Multiple Files

0

Entering edit mode

6 months ago

LDT ▴ 340

Hello Biostars Community,

I have a FASTQ file that contains sequences with UMI indices in the headers. I have already counted the UMI indices and saved a list of the top 1000 UMIs in a text file named top_UMIs.txt.

I would like to split my initial FASTQ file into 1000 separate FASTQ files, where each file contains the sequences corresponding to each UMI.

Could anyone provide guidance or code to accomplish this task?

SeqKit UMI split fastq • 441 views

ADD COMMENT • link updated 6 months ago by GenoMax 147k • written 6 months ago by LDT ▴ 340

0

Entering edit mode

You left out a critical bit of info. Where are these UMI currently? Inside sequences, in fastq headers? Show examples if possible.

ADD REPLY • link 6 months ago by GenoMax 147k

0

Entering edit mode

thank you GenoMax! they are in the header

ADD REPLY • link 6 months ago by LDT ▴ 340

0

Entering edit mode

Can you show an example?

It may be best to start with original data (if you moved the UMI to header) and use reaper (LINK), which was recommended by UMI-tools author in a prior thread.

ADD REPLY • link 6 months ago by GenoMax 147k

0

Entering edit mode

Reaper works on barcodes in the sequence, not in the header.

I don't know if any tool that can do this out of the box. Might need to be a case of writing some custom code.

ADD REPLY • link 6 months ago by i.sudbery 20k

Login before adding your answer.