Linux script for counting the length of a Protein fasta file .
1
0
Entering edit mode
2.9 years ago

Hello all,

I have a file that has protein fasta sequences, I want to get the length of each protein.

Which linux command can do this work please suggest.

Example fasta:

>Zymomonas_mobilis_peg_0001__lcl|NC_006526.2_prot_WP_011239989.1_1_[locus_tag=ZMO_RS00005]_[db_xref=GeneID:58025911]_[protein=TonB-dependent_receptor]_[protein_id=WP_011239989.1]_[location=147..2516]_[gbkey=CDS]
MKNFIKKGGFLFSFSWFFLQYPVEAATDTVSNTSNNDAIIVTGTRETHKKIRDSLSPIEVLNNRELLETGQTNVTSALAQLVPSITQPAVGQFVAAPTNFVSLRGLNPNQTLVLVNGKRRHNSSFLYIDGFADAATPTDLDLIAPELIDHIEVLKDGAAAQYGSDAIAGVVNIILKSDNHGGSARSQIGQTYAGDGLVGQAGFNKGFKIGHSGFFDVAFDFRHQNHTSRDGIDSRTQRHSLKVVGDPMATRYNLAINAGYDFGNGIEIYTTDSYSHRNSEVNQVYRTADRFPEVYPDGFMPIQKLSENDFSASLGLRGDNALGIHWDVNSTYGGNFIRNDLDKTANLGLYAATGSTPLSVHLNNYSTTQLNNTLDLSKELALPVIYSPLTLAAGFAHRYETYKTGAGDPASYLYGGTQARTGIIPAVAGSHSRNVYSGYLDLSAHLTKRWQVDLAGRYEYYSDFGSTLNGKASTRYDFTDQFALRATFSSGTRAPSLANEYFSTLSVGPDSASGTLGANSAAAKLLGAVPLKPEKATNITAGLVYSPLKNLHFTLDSYQIAIRNRIVSGPGISGEAALAALNAQGVTVASTLEAQNISAWFFTNGASTRTRGLDFTASYHSRFENFGIVDWDIALNINATTIRHVNQLANGMSALNAQTRAYLTSSTPKNRITFGGRWESFSKKWDVSLHEQRFGQTTDEMTWYQGPNAYSMTDFNRIHNHPRWITNLEIGYRPIEKLRVAIGANNLFNAHTTRIPAANGYYGSGKYDGAASQIGVNGGFYYLQTSYQF

Expected output:

>Zymomonas_mobilis_peg_0001__lcl|NC_006526.2_prot_WP_011239989.1_1_[locus_tag=ZMO_RS00005]_[db_xref=GeneID:58025911]_[protein=TonB-dependent_receptor]_[protein_id=WP_011239989.1]_[location=147..2516]_[gbkey=CDS]  790

output should have fasta header and length .

Linux • 1.2k views
ADD COMMENT
0
Entering edit mode

Are you sure this is a fasta file ? In your example it does not have the opening header character ">".

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
2.9 years ago
$ bioawk -c fastx '{print ">"$name, length($seq)}' input.fa
$ awk -v OFS="\t" '/>/ {getline seq} {print $0, length(seq)}' input.fa
ADD COMMENT

Login before adding your answer.

Traffic: 3987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6