Dear All,
I would like to ask a question regarding extraction of 100000 sequences in a big fasta file. In the forum, there is a bunch of script handling the sequence extraction based on ID number, but I could not find a script for such a purpose. Basically, is there any script or bash command for extraction first and/or last 100000 sequences in a fasta file?
Many thanks in advance for all your help!
If not, you could write one easily enough with biopython or bioperl...
Well, the first X records is easier than the last X records, but still.
You can have a look at this. May be helpful for you
Extract sequence with header from a fasta file with specific ID given in another file
That is not an answer for the question that was originally asked.