How to replace # with character in fasta file after header and space
1
0
Entering edit mode
2.3 years ago
Neel ▴ 20

Hi, I want to replace for multiple file # with character/name of the strain/gene name and i want to remove this line from my file -1 # ID=1_2660;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5bp;gc_cont=0.687.

>CP077971.1_2660 # 2813973 # 2814887 # -1 # ID=1_2660;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5bp;gc_cont=0.687 MexT
ATGAACCGAAACGACCTGCGCCGCGTCGATCTGAACCTGCTGATCGTGTTCGAGACCCTGATGCACGAACGCAGCGTGACCCGCGCCGCAGAGAAACTGTTCCTCGGCCAGCCGGCCATCAGCGCCGCGCTGTCGCGCCTGCGCACGCTGTTCGACGACCCGCTGTTCGTCCGTACCGGACGCAGCATGGAGCCCACCGCGCGAGCCCAGGAAATCTTCGCCCACCTGTCGCCGGCGCTGGATTCCATCTCCACCGCCATGAGTCGCGCCAGCGAGTTCGATCCGGCGACCAGCACCGCGGTGTTCCGCATCGGCCTTTCCGACGACGTCGAGTTCGGCCTGTTGCCGCCCCTGCTCCGCCGCCTGCGCGCGGAGGCGCCGGGGATCGTCCTCGTCGTGCGCCGCGCCAACTATCTATTGATGCCGAACCTGCTGGCCTCGGGGGAGATCTCGGTGGGCGTCAGCTACACCGACGAACTGCCGGCCAACGCCAAGCGCAAGACCGTGCGCCGCAGCAAGCCGAAGATCCTCCGCGCCGACTCCGCGCCCGGCCAGCTGACCCTCGACGACTATTGCGCGCGACCGCACGCGCTGGTGTCCTTCGCCGGCGACCTCAGCGGCTTCGTCGACGAGGAGCTGGAAAAATTCGGCCGCAAGCGCAAGGTGGTCCTGGCGGTGCCGCAGTTCAACGGCCTCGGCACCCTCCTGGCCGGCACCGACATCATCGCCACCGTGCCCGACTACGCCGCCCAGGCGCTGATCGCCGCCGGCGGCCTACGCGCCGAGGACCCACCGTTCGAGACCCGCGCCTTCGAACTGTCGATGGCTTGGCGCGGCGCCCAGGACAACGATCCGGCCGAACGCTGGCTGCGCTCGCGGATCAGCATGTTCATCGGCGATCCGGACAGTCTCTGA
>CP077971.1_2661 # 2815108 # 2816127 # 1 # ID=1_2661;partial=00;start_type=ATG;rbs_motif=GAGG;rbs_spacer=6bp;gc_cont=0.669  MexS
ATGTCCCGAGTGATCCGTTTTCATCAGTTTGGCCCGCCAGAGGTCCTCAAATGCGAAGAGCTGCCGACCCCGGCGCCAGCCGCAGGGGAAGTCCTGGTGCGTGTCCAGGCGATCGGCGTGAGCTGGAAGGATGTGCTCTGGCGTCAGAACCTGGCCCCGGAGCAGGCTGCGCTGCCGTCCGGTCTCGGCTTCGAACTGGCCGGCGAGGTGCTGGCGGTCGGCGCCGGCGTCGGCGACCTGCCGCTGGGTTCCCGCGTGGCCAGTTTCCCCGCCCATACCCCCGATCATTATCCGGCCTATGGCGACGTGGTGCTGATGCCGCGCGCGGCCCTGGCGGTCTACCCCGAGGTACTCACCCCGGTGGAGGCCAGCGTCTACTACACCGGCCTGCTGGTGGCCTATTTCGGCCTGGTCGACCTGGCCGGGTTGAAGGCCGGGCAGACCGTGCTGATCACCGAGGCGGCGCGCATGTACGGGCCGGTCTCGATCCAGTTGGCCAAGGCTCTCGGCGCGCGGGTGATCGCTTCCACCAAGTCCGCCGAGGAGCGCGAGTTCCTCCGCGAGCAGGGCGCCGACAAGGTGGTGGTGACCGACGAGCAGGACCTGGTCCTGGAAGTCGAGCGCTTCACCGAGGGCAAGGGCGTCAATGTCATCCTCGACGAATTGGGCGGTCCGCAGATGACCCTGCTCGGCGATGTCTCCGCCACCCGCGGCAAGCTGGTGCTGTATGGCTGCAACGGCGGCAACGAGTCGGCGTTCCCGGCCTGCGCCGCGTTCAAGAAGCACCTGCAGTTCTACCGCCACTGCCTGATGGATTTCACCGGTCATCCGGAGATGGGCCTGGAACGCAACGACGAGTCGGTGAGCAAGGCCCTCGCGCACATCGAGCAACTGACCCGCGATCGCCTGCTCAAACCGGTGGTCGACCGGGTATTCGAGTTCGACCAGGTGGTCGAGGCGCACCGCTACATGGAAACCTGTCCGAAGCGCGGCCGGGTGGTGATCCACGTCGCCGATTGA

For example-

>CP077971.1_2660 |PA_PAO1|MexT  ATGAACCGAAACGACCTGCGCCGCGTCGATCTGAACCTGCTGATCGTGTTCGAGACCCTGATGCACGAACGCAGCGTGACCCGCGCCGCAGAGAAACTGTTCCTCGGCCAGCCGGCCATCAGCGCCGCGCTGTCGCGCCTGCGCACGCTGTTCGACGACCCGCTGTTCGTCCGTACCGGACGCAGCATGGAGCCCACCGCGCGAGCCCAGGAAATCTTCGCCCACCTGTCGCCGGCGCTGGATTCCATCTCCACCGCCATGAGTCGCGCCAGCGAGTTCGATCCGGCGACCAGCACCGCGGTGTTCCGCATCGGCCTTTCCGACGACGTCGAGTTCGGCCTGTTGCCGCCCCTGCTCCGCCGCCTGCGCGCGGAGGCGCCGGGGATCGTCCTCGTCGTGCGCCGCGCCAACTATCTATTGATGCCGAACCTGCTGGCCTCGGGGGAGATCTCGGTGGGCGTCAGCTACACCGACGAACTGCCGGCCAACGCCAAGCGCAAGACCGTGCGCCGCAGCAAGCCGAAGATCCTCCGCGCCGACTCCGCGCCCGGCCAGCTGACCCTCGACGACTATTGCGCGCGACCGCACGCGCTGGTGTCCTTCGCCGGCGACCTCAGCGGCTTCGTCGACGAGGAGCTGGAAAAATTCGGCCGCAAGCGCAAGGTGGTCCTGGCGGTGCCGCAGTTCAACGGCCTCGGCACCCTCCTGGCCGGCACCGACATCATCGCCACCGTGCCCGACTACGCCGCCCAGGCGCTGATCGCCGCCGGCGGCCTACGCGCCGAGGACCCACCGTTCGAGACCCGCGCCTTCGAACTGTCGATGGCTTGGCGCGGCGCCCAGGACAACGATCCGGCCGAACGCTGGCTGCGCTCGCGGATCAGCATGTTCATCGGCGATCCGGACAGTCTCTGA

Thank you!

fasta • 741 views
ADD COMMENT
0
Entering edit mode
2.3 years ago

The first step is pretty straightforward: sed "/^>/s/#.*$//g" will delete everything after the first # till the end of the line. This will leave you with such a file

>CP077971.1_2660
MexT
ATGAACCGAAACGACCTGCGCCGCGTCGATCTGAACCTGCTGATCGTGTTCGAGACCCTGATGCACGAACGCAGCGTGACCCGCGCCGCAGAGAAACTGTTCCTCGGCCAGCCGGCCATCAGCGCCGCGCTGTCGCGCCTGCGCACGCTGTTCGACGACCCGCTGTTCGTCCGTACCGGACGCAGCATGGAGCCCACCGCGCGAGCCCAGGAAATCTTCGCCCACCTGTCGCCGGCGCTGGATTCCATCTCCACCGCCATGAGTCGCGCCAGCGAGTTCGATCCGGCGACCAGCACCGCGGTGTTCCGCATCGGCCTTTCCGACGACGTCGAGTTCGGCCTGTTGCCGCCCCTGCTCCGCCGCCTGCGCGCGGAGGCGCCGGGGATCGTCCTCGTCGTGCGCCGCGCCAACTATCTATTGATGCCGAACCTGCTGGCCTCGGGGGAGATCTCGGTGGGCGTCAGCTACACCGACGAACTGCCGGCCAACGCCAAGCGCAAGACCGTGCGCCGCAGCAAGCCGAAGATCCTCCGCGCCGACTCCGCGCCCGGCCAGCTGACCCTCGACGACTATTGCGCGCGACCGCACGCGCTGGTGTCCTTCGCCGGCGACCTCAGCGGCTTCGTCGACGAGGAGCTGGAAAAATTCGGCCGCAAGCGCAAGGTGGTCCTGGCGGTGCCGCAGTTCAACGGCCTCGGCACCCTCCTGGCCGGCACCGACATCATCGCCACCGTGCCCGACTACGCCGCCCAGGCGCTGATCGCCGCCGGCGGCCTACGCGCCGAGGACCCACCGTTCGAGACCCGCGCCTTCGAACTGTCGATGGCTTGGCGCGGCGCCCAGGACAACGATCCGGCCGAACGCTGGCTGCGCTCGCGGATCAGCATGTTCATCGGCGATCCGGACAGTCTCTGA

This file, you could reshape accordingly with paste -d \| - - - , however I suspect that the example you provided is somehow mangled, because that MexT in a separate line is no valid Fasta. Plus, where does that PA_PAO1 in the desired output come from?

ADD COMMENT
0
Entering edit mode

Thank you so much for your time, Actually i want to add strain name also so that i can track later which gene from which strain.

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6