Hello everyone
I have some Ilumina reads from a metagenomic project, in a fastq file and I am trying to "fish out" some sequences in particular from all the mess. I conducted a blast search of this and I got the sequences I am interested in, in fasta format. The thing is, I need the sequences in fastq format for assembly. How can I extract the sequences from the original fastq file using the blast fasta file as a reference? or should I just convert my output blast file in fasta format to fastq format?
Thanks in advance.
Great, I will try it.
Thanks
Are you sure you want the whole sequence from the Fastq file? If you have quality trimmed your reads and/or are really just interested in the parts matching the reference sequences then the grep approach may not be what you want.
Yes, you are right. My solution will only work if he used "Read ID" or "Header info" which I assume has been preserved between fasta and fastq files.
Right, the IDs would need to be the same, though that is not what I was referring to. I meant that if you quality trim a file and want to extract those reads from another file (or just keep the blast query string), then pulling reads from a file with the IDs alone won't work (in that case, the trimming and match information would be lost). Hopefully that is clear.