I have a large multifasta file (about 125,000 sequences) and a smaller multifasta file (about 100 sequences). All sequences in the smaller multifasta file are found in the larger file, but the headers are different. I have many (thousands) of such smaller multifasta files. How can I search the larger file for the sequences found in the smaller and then exchange the header? I would ideally be able to print out a smaller multifasta file that would be identical to the one I started with, just with the headers found in the larger file. All sequences in both files have been linearized- that is, they are a single line. Thanks!
Thanks! Is there any way to change the grep command to only print out the headers themselves, instead of also including the associated sequences? I am not very familiar with grep. Thanks again.
You could use
grep "^>"
, which would get you just the headers.Looks like the original
grep
solution worked? I am curious since you had large files.I couldn't do it locally because grep runs out of memory, but I can run it successfully over the server that I have access to. I need to learn more about grep, awk, sed, etc. They seem quick, powerful, and really simple. Maybe I am deceiving myself about the simple part though! Do you have any resources to suggest for learning more about these type of commands? Thanks again.