How to Split FASTQ File by UMI Indices into Multiple Files
0
0
Entering edit mode
6 months ago
LDT ▴ 340

Hello Biostars Community,

I have a FASTQ file that contains sequences with UMI indices in the headers. I have already counted the UMI indices and saved a list of the top 1000 UMIs in a text file named top_UMIs.txt.

I would like to split my initial FASTQ file into 1000 separate FASTQ files, where each file contains the sequences corresponding to each UMI.

Could anyone provide guidance or code to accomplish this task?

SeqKit UMI split fastq • 441 views
ADD COMMENT
0
Entering edit mode

You left out a critical bit of info. Where are these UMI currently? Inside sequences, in fastq headers? Show examples if possible.

ADD REPLY
0
Entering edit mode

thank you GenoMax! they are in the header

ADD REPLY
0
Entering edit mode

Can you show an example?

It may be best to start with original data (if you moved the UMI to header) and use reaper (LINK), which was recommended by UMI-tools author in a prior thread.

ADD REPLY
0
Entering edit mode

Reaper works on barcodes in the sequence, not in the header.

I don't know if any tool that can do this out of the box. Might need to be a case of writing some custom code.

ADD REPLY

Login before adding your answer.

Traffic: 2098 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6