Hello,
I'm an undergraduate who has been working with quite a few FASTA files for the past few months, and something that has been irking me is my inability to quickly take out a select few sequences from a 'master' fasta and put them into their own dataset.
I have a 'master' fasta file that has ~200 sequences in it, each labelled with their respective name like >NZ_123456. I've been constructing smaller datasets out of the plasmids in the larger one by copying and pasting the sequences I need into a new file, and that has gotten tedious. Would there be a way for me to list out the sequences I need and have their sequences copied from that dataset and put together into a new dataset? I imagine there would be a way to use either the terminal or python to do this, but I am a novice in the field so I am curious about suggestions. Thank you and have a good day!
Please search the forum for similar posts (or google "subset fasta by id". This question has been answered multiple times.
Ah, good to know. Thanks! I'll take down my post.
Please do not delete posts that have received feedback (GenoMax has answered your question already). Instead, accept GenoMax's answer to mark the post as resolved.