Dear all,
I have a 16s NGS fastq file which comprise of two separate NGS run with the heading information as below:-
Run 1 heading : @HISEQ:791:HCGHLBCX2:1:1101:5407:3025 1:N:0:AGTGGTCA
Run 2 heading: @HISEQ:781:HCMFJBCX2:1:2101:15224:51333 1:N:0:ATTGAGGA
I would like to split the sequences based on the heading information into two separate files.
I have tried looking in the forum for any QIIME and sed script but couldn't manage to done it.
Any idea how to do it ?
Thanks.
You can grep them out..some thing like this:
But these are read headers. Flow cells and run IDs seem to be different (from OP). You can grep based on them.
Thanks for the reply. I have tried it. However, it just give me a list without sequences or quality. I think i did not mention my question very clear initially, i will try to clarify below. Thanks.
I have a fastq file which contains a few hundreds of sequences like below. However, there are two separate runs (781 & 791) in one fastq file. I would like to separate them including the sequences and quality into separate fastq files.
@OP: I guess the sequences are copy/pasted from webpage or word or some other formatted pages. Sequence qualities are affected by this. Please post as they are. Otherwise, tools fail to parse fastq proper.
input (copy/pasted from above and formatted):
output:
with seqkit:
for ubuntu/parallel users, try (test.fastq - input fastq):