Maybe my question is very simple, but, I'm not familiar with the programming language. I use the following script to extract sequence from fasta file. How can I write similar command to remove the sequences by sequence IDs from fasta file.
do you want to remove the duplicate sequences? Botht the sequences look duplicate to me:
if you want to remove dups, $ seqkit rmdup --quiet -i -n test.fa. If it is an example file, then try $ grep -A 1 -w '> c50249_g1_i3' input.fa. This assumes that sequence is in a single row next to id.
please be more patient. We all doing this here in our free time. So don't expect to get a ready-to-use-solution within some minutes.
Which file do you downloaded from seqkit? What platform are you using (windows, linux distribution ...)? What are you meaning with "but did not worked"? What have you done and what was the result of your action?
If your output file is empty and you didn't get any error message means that all id's in your fasta matches to the id's in your Tran_Cod.txt. Could this be?
What happens if you remove the -v?
Could please post the output of head Tran_Cod.txt?
are these fasta files flattened i.e is the sequence in a single line after each ID? Then you can use:
grep -A 1 -w <ID> input.fasta
eg: output:
input:
hi cpad0112
my fasta file is like this:
ATCAAAAATAGTTTCAGTTTGTGAAAAAATTCATTCTCTCAATCTCTTGTTTTACTTTTG AATTATAAACTCGAGGCAAAGAAAAATGTTCATTCAAGAAGATTGATACCCAGTGTGCTC ATAATGAAGAAGAAATTCGAAAGTAGAACATTCGGTATTTGGTCAAAGAAGAAATGTCGT CTTCTCAAGTTACAGCTGTTCCAACTTCACCACCAAAAACTATTCGTCATACTGCTGATT TCCATCCCACAATATGGGGAGATCAATTCCTCAAAGATACTTCTGAACTCAAGGTAATTA CACGTACACACTCTATATGAGATTAAATTTCTAAATGCACCCACCCTATGCATGCACATC ACAATAAACG
ATCAAAAATAGTTTCAGTTTGTGAAAAAATTCATTCTCTCAATCTCTTGTTTTACTTTTG AATTATAAACTCGAGGCAAAGAAAAATGTTCATTCAAGAAGATTGATACCCAGTGTGCTC ATAATGAAGAAGAAATTCGAAAGTAGAACATTCGGTATTTGGTCAAAGAAGAAATGTCGT CTTCTCAAGTTACAGCTGTTCCAACTTCACCACCAAAAACTATTCGTCATACTGCTGATT TCCATCCCACAATATGGGGAGATCAATTCCTCAAAGATACTTCTGAACTCAAGGTAATTA CACGTACACACTCTATATGAGATTAAATTTCTAAATGCACCCACCCTATGCATGCACATC ACAATAAACG
What is the solution to my problem??
do you want to remove the duplicate sequences? Botht the sequences look duplicate to me: if you want to remove dups,
$ seqkit rmdup --quiet -i -n test.fa
. If it is an example file, then try$ grep -A 1 -w '> c50249_g1_i3' input.fa
. This assumes that sequence is in a single row next to id.I have one IDs list that want to remove their related sequences. and my fasta file is like that with several lines. now, do you have command for me??