Dear All
I am trying to use simple for loop in shell script to execute bacth process of codeml to calculate dn/ds values for 1500 Orthologues but I am getting this error.
#!/bin/bash
for file in *.aln
do
`echo $file | sed 's/\.aln//'`
python codemlScript.py /home/tulasi/Desktop/Tools/Tools/AutoPAML/Alignments/\.aln /home/tulasi/Desktop/Tools/Tools/AutoPAML/Trees/\.tree /home/tulasi/Desktop/Tools/Tools/AutoPAML/Out/\.out
done
But I am getting this out put with error
Saureus1202.aln
Saureus1202.aln
Saure_omega.out
25 kappa | kappa 1.60
16 model | model 0.00
CODONML in paml version 4.8, March 2014
----------------------------------------------
Phe F TTT | Ser S TCT | Tyr Y TAT | Cys C TGT
TTC | TCC | TAC | TGC
Leu L TTA | TCA | *** * TAA | *** * TGA
TTG | TCG | TAG | Trp W TGG
----------------------------------------------
Leu L CTT | Pro P CCT | His H CAT | Arg R CGT
CTC | CCC | CAC | CGC
CTA | CCA | Gln Q CAA | CGA
CTG | CCG | CAG | CGG
----------------------------------------------
Ile I ATT | Thr T ACT | Asn N AAT | Ser S AGT
ATC | ACC | AAC | AGC
ATA | ACA | Lys K AAA | Arg R AGA
Met M ATG | ACG | AAG | AGG
----------------------------------------------
Val V GTT | Ala A GCT | Asp D GAT | Gly G GGT
GTC | GCC | GAC | GGC
GTA | GCA | Glu E GAA | GGA
GTG | GCG | GAG | GGG
----------------------------------------------
Nice code, uuh?
Ambiguity character definition table:
T (1): T
C (1): C
A (1): A
G (1): G
U (1): T
Y (2): T C
R (2): A G
M (2): C A
K (2): T G
S (2): C G
W (2): T A
H (3): T C A
B (3): T C G
V (3): C A G
D (3): T A G
- (4): T C A G
N (4): T C A G
? (4): T C A G
processing fasta file
reading seq# 1 A27919 1359 sites
reading seq# 2 A15262 1359 sites
reading seq# 3 A23133 1359 sites
reading seq# 4 A10245 1359 sites
reading seq# 5 A2708 1359 sites
reading seq# 6 A12710 1359 sites
reading seq# 7 A5063 1359 sites
reading seq# 8 A1 1359 sites
reading seq# 9 A7706 1359 sites
reading seq#10 A20825 1359 sites
reading seq#11 A17684 1359 sites
reading seq#12 A25250 1359 sites
ns = 12 ls = 1359
Reading sequences, sequential format..
Reading seq #12: A25250
Sequences read..
Counting site patterns.. 0:00
71 patterns at 453 / 453 sites (100.0%), 0:00
Counting codons..
NG distances for seqs.:
1
2 1:Sites 298.7 + 1060.3 = 1359.0 Diffs 11.0 + 1.0 = 12.0 2
3 1:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
3 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 4.0 + 0.0 = 4.0 3
4 1:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
4 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 4.0 + 0.0 = 4.0
4 3:Sites 298.7 + 1060.3 = 1359.0 Diffs 0.0 + 0.0 = 0.0 4
5 1:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
5 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 4.0 + 0.0 = 4.0
5 3:Sites 298.7 + 1060.3 = 1359.0 Diffs 0.0 + 0.0 = 0.0
5 4:Sites 298.7 + 1060.3 = 1359.0 Diffs 0.0 + 0.0 = 0.0 5
6 1:Sites 298.8 + 1060.2 = 1359.0 Diffs 0.0 + 0.0 = 0.0
6 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 11.0 + 1.0 = 12.0
6 3:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
6 4:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
6 5:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0 6
7 1:Sites 298.7 + 1060.3 = 1359.0 Diffs 4.0 + 1.0 = 5.0
7 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 11.0 + 0.0 = 11.0
7 3:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 0.0 = 7.0
7 4:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 0.0 = 7.0
7 5:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 0.0 = 7.0
7 6:Sites 298.7 + 1060.3 = 1359.0 Diffs 4.0 + 1.0 = 5.0 7
8 1:Sites 298.7 + 1060.3 = 1359.0 Diffs 10.0 + 1.0 = 11.0
8 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 0.0 = 3.0
8 3:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 0.0 = 3.0
8 4:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 0.0 = 3.0
8 5:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 0.0 = 3.0
8 6:Sites 298.7 + 1060.3 = 1359.0 Diffs 10.0 + 1.0 = 11.0
8 7:Sites 298.6 + 1060.4 = 1359.0 Diffs 10.0 + 0.0 = 10.0 8
9 1:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 1.0 = 4.0
9 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 10.0 + 0.0 = 10.0
9 3:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 0.0 = 6.0
9 4:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 0.0 = 6.0
9 5:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 0.0 = 6.0
9 6:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 1.0 = 4.0
9 7:Sites 298.6 + 1060.4 = 1359.0 Diffs 1.0 + 0.0 = 1.0
9 8:Sites 298.6 + 1060.4 = 1359.0 Diffs 9.0 + 0.0 = 9.0 9
10 1:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 1.0 = 7.0
10 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 0.0 = 7.0
10 3:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 0.0 = 3.0
10 4:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 0.0 = 3.0
10 5:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 0.0 = 3.0
10 6:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 1.0 = 7.0
10 7:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 0.0 = 6.0
10 8:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 0.0 = 6.0
10 9:Sites 298.7 + 1060.3 = 1359.0 Diffs 5.0 + 0.0 = 5.0 10
11 1:Sites 298.7 + 1060.3 = 1359.0 Diffs 8.0 + 2.0 = 10.0
11 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 11.0 + 1.0 = 12.0
11 3:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
11 4:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
11 5:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
11 6:Sites 298.7 + 1060.3 = 1359.0 Diffs 8.0 + 2.0 = 10.0
11 7:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 1.0 = 7.0
11 8:Sites 298.7 + 1060.3 = 1359.0 Diffs 10.0 + 1.0 = 11.0
11 9:Sites 298.7 + 1060.3 = 1359.0 Diffs 5.0 + 1.0 = 6.0
11 10:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 1.0 = 7.0 11
12 1:Sites 298.8 + 1060.2 = 1359.0 Diffs 0.0 + 0.0 = 0.0
12 2:Sites 298.7 + 1060.3 = 1359.0 Diffs 11.0 + 1.0 = 12.0
12 3:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
12 4:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
12 5:Sites 298.7 + 1060.3 = 1359.0 Diffs 7.0 + 1.0 = 8.0
12 6:Sites 298.8 + 1060.2 = 1359.0 Diffs 0.0 + 0.0 = 0.0
12 7:Sites 298.7 + 1060.3 = 1359.0 Diffs 4.0 + 1.0 = 5.0
12 8:Sites 298.7 + 1060.3 = 1359.0 Diffs 10.0 + 1.0 = 11.0
12 9:Sites 298.7 + 1060.3 = 1359.0 Diffs 3.0 + 1.0 = 4.0
12 10:Sites 298.7 + 1060.3 = 1359.0 Diffs 6.0 + 1.0 = 7.0
12 11:Sites 298.7 + 1060.3 = 1359.0 Diffs 8.0 + 2.0 = 10.0 12
528 bytes for distance
69296 bytes for conP
0 bytes for fhK
5000000 bytes for space
. (Continuation of codeml output)
.
.
w = 0.02373 dN = 0.00000 dS = 0.00001 d4 = 0.00000 (189.6 four-fold sites)
dN*= 0.00000 dS*= 0.00001 S* = 315.11 N* =1043.89
end of tree file.
Time used: 0:10
Saureus1203.aln
Saureus1203.aln
Saure_omega.out
Traceback (most recent call last):
File "codemlScript.py", line 45, in <module>
results = cml.run(verbose=True)
File "/usr/local/lib/python2.7/dist-packages/Bio/Phylo/PAML/codeml.py", line 185, in run
raise IOError("The specified tree file does not exist.")
IOError: The specified tree file does not exist.
This is the pyhthon script I got from github
#!/usr/bin/python
###################################################################################################################
#This script was written by Nathan Whelan.
# THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE CONTRIBUTORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
# OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE
# SOFTWARE.
##########################################################################################################################
##This script utilizes codeml in PAML by Ziheng Yang. If you use this script please also cite PAML.
##BIOPYTHON IS REQUIRED FOR THIS SCRIPT!
##This script can be used to automate running the codeml PAML package(e.g. if you have hundreds of genes you want to fit to a model).
#A codeml ctl file should be in your working directory with the parameters you wish to use.
from __future__ import division
from Bio.Phylo.PAML import codeml ##Biopython PAML
import sys
#This program takes three inputs: the alignment in phylip format, a treefile in Newick format, and outputfile name
#It returns a nested dictionary with various results depending on analysis
if len(sys.argv) != 4:
print "Error. There should be three inputs. An alignment in pyhlip format, a tree file, and the name for the output file"
quit()
cml = codeml.Codeml()
cml.read_ctl_file("codeml.ctl") ##CTL File. See PAML manual for format.
cml.alignment = sys.argv[1]
cml.tree = sys.argv[2]
cml.out_file = sys.argv[3]
name=sys.argv[1]
print name
name=sys.argv[1]
print name
name2=name[0:5]+"omega.out"
print name2
results1 = cml.run(verbose=True)
I would like to know where I am doing mistake..
I guess now we need small modification in python script but i don't know where. Please try to solve
Thank you
Why are you referring to your files like:
/Trees/\.tree
? You aren't passing the$file
variable in to the python function? Is the tree file one of your inputs or one of the script outputs? (Sorry haven't read the code fully as its too early here :P )Your script isn't finding the tree file, which tells you there's something wrong with how you're specifying the file path. Using the
.tree
(I would guess) is creating a hidden file called.tree
Just try:
for file in *.aln ; do python ....blah blah blah.... $file ; done