Entering edit mode
2.9 years ago
Confused_human
▴
30
Hello all,
I have a file that has protein fasta sequences, I want to get the length of each protein.
Which linux command can do this work please suggest.
Example fasta:
>Zymomonas_mobilis_peg_0001__lcl|NC_006526.2_prot_WP_011239989.1_1_[locus_tag=ZMO_RS00005]_[db_xref=GeneID:58025911]_[protein=TonB-dependent_receptor]_[protein_id=WP_011239989.1]_[location=147..2516]_[gbkey=CDS]
MKNFIKKGGFLFSFSWFFLQYPVEAATDTVSNTSNNDAIIVTGTRETHKKIRDSLSPIEVLNNRELLETGQTNVTSALAQLVPSITQPAVGQFVAAPTNFVSLRGLNPNQTLVLVNGKRRHNSSFLYIDGFADAATPTDLDLIAPELIDHIEVLKDGAAAQYGSDAIAGVVNIILKSDNHGGSARSQIGQTYAGDGLVGQAGFNKGFKIGHSGFFDVAFDFRHQNHTSRDGIDSRTQRHSLKVVGDPMATRYNLAINAGYDFGNGIEIYTTDSYSHRNSEVNQVYRTADRFPEVYPDGFMPIQKLSENDFSASLGLRGDNALGIHWDVNSTYGGNFIRNDLDKTANLGLYAATGSTPLSVHLNNYSTTQLNNTLDLSKELALPVIYSPLTLAAGFAHRYETYKTGAGDPASYLYGGTQARTGIIPAVAGSHSRNVYSGYLDLSAHLTKRWQVDLAGRYEYYSDFGSTLNGKASTRYDFTDQFALRATFSSGTRAPSLANEYFSTLSVGPDSASGTLGANSAAAKLLGAVPLKPEKATNITAGLVYSPLKNLHFTLDSYQIAIRNRIVSGPGISGEAALAALNAQGVTVASTLEAQNISAWFFTNGASTRTRGLDFTASYHSRFENFGIVDWDIALNINATTIRHVNQLANGMSALNAQTRAYLTSSTPKNRITFGGRWESFSKKWDVSLHEQRFGQTTDEMTWYQGPNAYSMTDFNRIHNHPRWITNLEIGYRPIEKLRVAIGANNLFNAHTTRIPAANGYYGSGKYDGAASQIGVNGGFYYLQTSYQF
Expected output:
>Zymomonas_mobilis_peg_0001__lcl|NC_006526.2_prot_WP_011239989.1_1_[locus_tag=ZMO_RS00005]_[db_xref=GeneID:58025911]_[protein=TonB-dependent_receptor]_[protein_id=WP_011239989.1]_[location=147..2516]_[gbkey=CDS] 790
output should have fasta header and length .
Are you sure this is a fasta file ? In your example it does not have the opening header character ">".
Mean Length Of Fasta Sequences