I've come across Expasy protein param... it's limited in the sense you can only copy/paste 1 sequence at a time.
Is there an alternative approach via command line to calculate alphabet % composition of each >sequence? I haven never ventured into Biopython. I mostly use R and command line. All help appreciated. Thanks.
function dump(arr,n){
for(i in arr){
printf("%s %d %f\n",i,arr[i],arr[i]/n);}}
BEGIN {}
/^>/ {dump(array,N);print;delete array;N=0.0;next;}{
for(i=1;i<=length($0);i++){ array[substr($0,i,1)]++;N++;}}
END {
dump(array,N);}
usage:
$ wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=CAA68495.1,CAA64262.1,CAA46742.1&rettype=fasta"|\
awk -f script.awk
>CAA68495.1 unnamed protein product [Rotavirus]
N 35 0.088161
A 27 0.068010
C 3 0.007557
P 20 0.050378
Q 17 0.042821
D 19 0.047859
E 19 0.047859
R 25 0.062972
F 25 0.062972
S 28 0.070529
G 19 0.047859
T 30 0.075567
H 7 0.017632
I 27 0.068010
V 26 0.065491
W 5 0.012594
K 8 0.020151
Y 11 0.027708
L 34 0.085642
M 12 0.030227
>CAA64262.1 NSP2 [Rotavirus]
N 24 0.075710
A 17 0.053628
P 10 0.031546
C 5 0.015773
Q 11 0.034700
D 13 0.041009
R 14 0.044164
E 22 0.069401
S 20 0.063091
F 15 0.047319
G 11 0.034700
T 16 0.050473
H 10 0.031546
V 23 0.072555
I 22 0.069401
W 4 0.012618
K 29 0.091483
Y 14 0.044164
L 30 0.094637
M 7 0.022082
>CAA46742.1 viral non structural protein NS5 [Rotavirus]
N 15 0.048077
A 21 0.067308
P 9 0.028846
C 5 0.016026
Q 9 0.028846
D 22 0.070513
R 15 0.048077
E 18 0.057692
S 20 0.064103
F 14 0.044872
G 10 0.032051
T 25 0.080128
H 8 0.025641
I 17 0.054487
V 23 0.073718
W 3 0.009615
K 28 0.089744
Y 14 0.044872
L 27 0.086538
M 9 0.028846