Hi,
I would like to filter out some sequences from a fasta file by using a specific pattern.
For example I have this file:
>input1
UGAGGUAGUAGG
>input2
CUAUGCUUACC
>out1
UCCCUGAGACCGUGA
>out2
CUCCGGGUACC
>desc1
ACUUCCUUACAUGCCC
I know already how I can extract all the fasta sequences with a specific pattern into a new file by using awk.
But what I would like to do is to remove all entries of a specific pattern from the original fasta file and save the newly made file into a new one. In my file above, I would like for example to remove all sequences with the header pattern out. and save only the other to a new file.
Is there a tool somewhere for doing that, or is it possible in awk/sed or even grep
Thanks
Assa
Thanks. That was fast :-)
Hi Pierre! Thanks for your command.
I have a question on this issue. Instead of one header pattern "out" (in your case), I am looking for many patterns that are stored in a file. So what should I do? Your help is appreciated in advance.
You could modify my answer below:
See also this post: How to remove some fasta sequences by header information from a large fasta file, any command and script please?
Thank you Pierre for your answer. I used your command and it seems it is removing the header with a particular pattern from fasta file . I was wondering if this line of command removes the whole sequence associated to that header as my fasta file is not linearized and sequences are stored in multiple lines.
Also do you have any Idea how can I store the target sequences for deletion in another file?
Please