Multiple fasta file : calculate % composition of amino acids
2
0
Entering edit mode
5.9 years ago
Biogeek ▴ 470

I've come across Expasy protein param... it's limited in the sense you can only copy/paste 1 sequence at a time.

Is there an alternative approach via command line to calculate alphabet % composition of each >sequence? I haven never ventured into Biopython. I mostly use R and command line. All help appreciated. Thanks.

amino acids composition calculate • 2.1k views
ADD COMMENT
1
Entering edit mode
5.9 years ago
GenoMax 147k

You can use pepstats from EMBOSS. Documentation here. You will need to download EMBOSS.

ADD COMMENT
1
Entering edit mode
5.9 years ago

using awk:

function dump(arr,n)
    {
    for(i in arr)
        {
        printf("%s %d %f\n",i,arr[i],arr[i]/n);
        }
    }
BEGIN   {}
/^>/ {dump(array,N);print;delete array;N=0.0;next;}
    {
    for(i=1;i<=length($0);i++) { array[substr($0,i,1)]++;N++;}
    }
END {
    dump(array,N);
    }

usage:

$ wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=CAA68495.1,CAA64262.1,CAA46742.1&rettype=fasta"   |\
awk -f script.awk 


>CAA68495.1 unnamed protein product [Rotavirus]
N 35 0.088161
A 27 0.068010
C 3 0.007557
P 20 0.050378
Q 17 0.042821
D 19 0.047859
E 19 0.047859
R 25 0.062972
F 25 0.062972
S 28 0.070529
G 19 0.047859
T 30 0.075567
H 7 0.017632
I 27 0.068010
V 26 0.065491
W 5 0.012594
K 8 0.020151
Y 11 0.027708
L 34 0.085642
M 12 0.030227
>CAA64262.1 NSP2 [Rotavirus]
N 24 0.075710
A 17 0.053628
P 10 0.031546
C 5 0.015773
Q 11 0.034700
D 13 0.041009
R 14 0.044164
E 22 0.069401
S 20 0.063091
F 15 0.047319
G 11 0.034700
T 16 0.050473
H 10 0.031546
V 23 0.072555
I 22 0.069401
W 4 0.012618
K 29 0.091483
Y 14 0.044164
L 30 0.094637
M 7 0.022082
>CAA46742.1 viral non structural protein NS5 [Rotavirus]
N 15 0.048077
A 21 0.067308
P 9 0.028846
C 5 0.016026
Q 9 0.028846
D 22 0.070513
R 15 0.048077
E 18 0.057692
S 20 0.064103
F 14 0.044872
G 10 0.032051
T 25 0.080128
H 8 0.025641
I 17 0.054487
V 23 0.073718
W 3 0.009615
K 28 0.089744
Y 14 0.044872
L 27 0.086538
M 9 0.028846
ADD COMMENT

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6