Question

Biopython - appending single fastq records to existing file

0

Entering edit mode

2.9 years ago

mbabic • 0

I'm not sure if this is actually possible, but - is there a way to add a FASTQ record to an existing FASTQ file, without affecting previously written data?

While that is the entirety of the question, in case it's helpful, some context - in case someone has approach ideas. For complicated reasons, I'm sorting through multiple independent gigantic FASTQ read files, which then need to be sorted into much smaller subfiles for alignment, based on particular barcodes.

I.e. I need to go through whatever.fastq and sort out all reads that have BarcodeX into barcodex.fastq file. Then I need to go through alsothis.fastq and repeat the process, adding more of BarcodeX reads to barcodex.fastq. And so on, and so forth, for possibly hundreds of independent files. Appending the source .fastqs into one file is not feasible due to total resulting size; and data needs to be added from time to time as well.

biopython fastq • 807 views

ADD COMMENT • link updated 2.9 years ago by GenoMax 148k • written 2.9 years ago by mbabic • 0

1

Entering edit mode

You can append to a file by using the append option in Python when you write. See here.
You can use Biopython in the reading and screening process to get your individual sequence record, convert that to a string by typecasting, and then just use pure python for the writing/appending step.
The question may become though how fast do you need this to run. (Also scale. Hundreds may be fine with Python.) It may be better to use pure shell and in that case you'd use >> to append whereas you usually use > to just write to a file. See here.

ADD REPLY • link 2.9 years ago by Wayne ★ 2.1k

1

Entering edit mode

which then need to be sorted into much smaller subfiles for alignment, based on particular barcodes.

You may want to use a tool meant to do this. e.g. sabre (LINK)

ADD REPLY • link 2.9 years ago by GenoMax 148k