How can I do this using python or shell command:
Suppose I have 200 protein sequences in fasta file named 'proto.fasta' with different lengths
I want to find maximum length between these sequences and then I want to add one string (Z) to all other sequences at the end of them to make all sequences have the same length of maximum length. Some sequence needs only one character to have the same length of maximum length but other sequences need more character to be the same maximum length. By the end, all the sequences will have the same length of maximum length.
After that, I want to save the result in the text file or fasta file
This is only example but I want to do same things on my sequences file for example:
>sp|P08112|MASY_ECOLI Malate synthase B OS=Escherichia coli (strain K12) GN=aceB PE=1 SV=1
MTEQATTTDELAFTRPYGEQEKQILTAEAVEFLTELVTHFTPQRNKLLAARIQQQQDID
NGTLPDFISETASIRDADWKIRGIPADLEDRRVEITGPVERKMVINA
LNANVKVFMADFED
>gi|84383531|gb|AER967412.1| C OS=Escherichia coli (strain K14)
MTEQATTTDELAFTRPYGEQEKQILTAEAVEFLTELVTHFTPQRN
KLLAARIQQQQDIDNGTLPD
>np|M04142|MASY_ECOLI tra synthase D OS=Escherichia coli (strain K16) GN=aceB
MTEQATTTDELAFTRPYGEQEKQILTAEAVEFLTELVTHFTP
>np|S08112|MASY_ECOLI kw synthase S OS=Escherichia coli (strain K16) GN=aceB
MTEQA
the result will be like this since first sequence the largest one we will add Z to all the sequences to make it equal to first sequences:
>sp|P08112|MASY_ECOLI Malate synthase B OS=Escherichia coli (strain K12) GN=aceB PE=1 SV=1
MTEQATTTDELAFTRPYGEQEKQILTAEAVEFLTELVTHFTPQRNKLLAARIQQQQDIDNGTLPDFISETASIRDADWKIRGIPADLEDRRVEITGPVERKMVINALNANVKVFMADFED
>gi|84383531|gb|AER967412.1| C OS=Escherichia coli (strain K14)
MTEQATTTDELAFTRPYGEQEKQILTAEAVEFLTELVTHFTP
QRNKLLAARIQQQQDIDNGTLPDZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
>np|M04142|MASY_ECOLI tra synthase D OS=Escherichia coli (strain K16) GN=aceB
MTEQATTTDELAFTRPYGEQEKQILTAEAVEFLTELVTHFTPZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
>np|S08112|MASY_ECOLI kw synthase S OS=Escherichia coli (strain K16) GN=aceB
MTEQAZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
Can I ask why? And why the Z character specifically?
and a "protein" of length 5 looks very dubious to me