I parsed title of desire sequences from a Fasta file. The title of sequences are rather long, thus, I want to trim a bit the title with Unix sed command. The title is looks like:
AC166615 weakly similar to UniRef100_A1EGX0 Cluster: 1-aminocyclopropane-1-carboxylate bla bla bla
I would like to trim the sequence's title as:
AC166615 1-aminocyclopropane-1-carboxylate bla bla bla
I tried with the sed command as below:
sed 's/^.*\:/'$'\t''/g' seqTitle.txt
I got the output like below with the sequence ID removed as well. But, I wish to keep the sequence ID.
1-aminocyclopropane-1-carboxylate bla bla bla
Could someone kindly please give me some guide about the Unix sed manipulation?
I can give you a small hack, if all the titles are like that, then you can cut the ID first and merge it with description that you are getting with your own sed code.
So,
cut -f1 -d" " seqTitle.txt > id && sed 's/^.*\:/'$'\t''/g' seqTitle.txt > desc
But I also managed to do it with sed, albeit I doubt this is the best way to do it:
sed 's/\(\)\ .\+\(:\)/\1\2/'
I can't point you to a guide, except maybe for O'Reilly's "Sed and Awk", but I found this list of explained one-liners to be particularly useful to me:
The same site also has tips on awk and perl one-liners, in case you are interested.
Random, thanks for your suggestion. The sed command you mentioned here will have the ":" included:
AC166615: 1-aminocyclopropane-1-carboxylate bla bla bla
I think that should be fine. Thanks.
Random, thanks for your suggestion. The sed command you mentioned here will have the ":" included: AC166615: 1-aminocyclopropane-1-carboxylate bla bla bla I think that should be fine. Thanks.
For some reason I had assumed you wanted the ":" to separate the two fields. If you can still use awk and do:
Or use sed and do:
Thanks for the suggestion.