I'm not sure if this is actually possible, but - is there a way to add a FASTQ record to an existing FASTQ file, without affecting previously written data?
While that is the entirety of the question, in case it's helpful, some context - in case someone has approach ideas. For complicated reasons, I'm sorting through multiple independent gigantic FASTQ read files, which then need to be sorted into much smaller subfiles for alignment, based on particular barcodes.
I.e. I need to go through whatever.fastq and sort out all reads that have BarcodeX into barcodex.fastq file. Then I need to go through alsothis.fastq and repeat the process, adding more of BarcodeX reads to barcodex.fastq. And so on, and so forth, for possibly hundreds of independent files. Appending the source .fastqs into one file is not feasible due to total resulting size; and data needs to be added from time to time as well.
You can append to a file by using the append option in Python when you write. See here.
You can use Biopython in the reading and screening process to get your individual sequence record, convert that to a string by typecasting, and then just use pure python for the writing/appending step.
The question may become though how fast do you need this to run. (Also scale. Hundreds may be fine with Python.) It may be better to use pure shell and in that case you'd use
>>
to append whereas you usually use>
to just write to a file. See here.You may want to use a tool meant to do this. e.g.
sabre
(LINK)