PAML Batch File?
1
2
Entering edit mode
9.3 years ago
zgayk ▴ 90

Hello,

Sorry to ask so many questions, but this is related to a problem I am trying to solve with PAML. I have no prior experience with PAML other than the few test analyses I've run, but I would like to use it to calculate pairwise dn/dS ratios for a large number of (1000+) orthologs or gene fragments. I was hoping I could submit a batch file to PAML consisting of the aligned sequences with the number of sequences and alignment length above each pair of sequences (phylip format). Because each aligned pair would have independent header information, I though PAML would give me an output file with omega values for each pair. But so far I have not been able to batch process these orthologs and it seems I will have to split this massive file into individual files for each alignment.

The question I am left with is: how are dN/dS ratios usually calculated in PAML or other programs by large genome projects that have far too many coding sequences alignments to read in by hand. Once my files are split into individual alignments, I am assuming I'll have to develop some sort of loop, but PAML will require the control file to be updated for each of the 1000 orthologs I wish to analyze. So, I am a bit confused how to analyze such a large amount of data. If anyone had any tips, they'd be very useful.

Thanks,
Zach

2 1011
seq11
CCCCGTCTGCTAAATGGGATGAGGATTGCAGCTGCGTAGCCTGATTTTTGTGGGCGAGATAATACATCAGTGAGCAAAGCCTGGCAACATGGTCCTTAAAAGCCAGGAGATCTTTGCT-GCTCGCGAGACACTCGCAGCACACC-GCAGCGTGAGGAAT--AAAAGCGGGCGTGTGGGACTTTCTAGGAGATTTTTCTTGGCAATGAGTCTTGCTTCAGACGTAAAACCGGTAGCTGTGCCACTGGTAGGGCTCCAGGTCGCTGTATGTAGAGCAACGCATGTCTGATGCTCTGACCTACAGGCAAGAAGGAGGAGAGGAACCAGGAAAACCACGTTTGTCTGTACCTTTGGGCCTGCACAGGGCCACATTGATGG-CAACACAG-ACCCGTGTTTCTGGCAGGGCAGAAGCCATGGGTGGGGATTTGCACGA-CGGGAGCTCCAGGCAGTTTAGGATTTGGCACAGCGATCTCAGAGAGGAGAACTCCTCTCCAAGAAGAGCAGCCCTTTGACAAGGTTACCCCTCATTTCCT-CAGTCGAACTGCCCTTTGCAAGGAGCATC-CCTCAGCTGCCACAGCCTATCGCATCTTCCTCCTGAAATCTGATGCTACCCTGCTTTGATGGTGATCAGTTGTTGACTGCAGAAAAGCTGAATCATGAAACAGAGGTTATAAACAAGGTTATTTTTAACGAAGGAAATAAATCCCACATCACGCAGAAGTC-GGAGCAAGGGACTGTCAGCTAGGGCTCGCAGATCTCACTCCTGCCCTGATATTCTCCATGTCCTGGAGCAAATTACAATCTCTCTAACCTTTAATTTACCTGCTTGTAACATGGACAAAGCAGCAGACTCCAGTTGTGCTTTCTTTGTACAGCAAAGCTCCTGGTGTGACAGTAAAGCGGTGCTGGGAGCCGTGCGATGCCA-GGG--G-CT-GC-CTGGC----G-C-C-TTGCAGAGCTATTTTCAGCCACACAAAATGATCGTACACGGTATTTGAACAATGTTCATAGACGTTTTGAATGCAGAGAGAGACTACCCAGTCTGATTGCC-CTCTCCCGTATTTCACAGTCTTACATTTCAATTCCCACATGCAGCCCTGTAACTTCTGCTTCATTTTGCTCACTGAAAACTCAAAACAGACAAGTTTTGTAGCCCAGAGTCTCACG-TCGTATCCTTCAGGCACTGACGTGCTGCAGGGTGTGTGTTGTAGATATCCAGCCCAGGGGGAGGGAAGATGAGTCTGGTACACCTACAGCTA-ACTGGGAAATTTCCTGATTGTGCCACTTCTTATCTCTCTCTCATACCTAAATTCTTCTCA-GAAAAGCGTATTTCATTTTCCCTGGGAAAGATGCCTCAACTCAGGGAAGGAGGCGATTGAGTCTGGGACTGGCCAGGCTTCCCTGTGCTGCCTGAGGAATACGAACAACCCCACAGAGAGACCCTTGCAAACCCATCTTCTCTGCT-CTACGCCCAAGTTAACAAGTGCTTTGGGTGTGTCT-CC-AACTTTACCCAGATAACGACTAAGGTTAAAGAACTCTAGCTTTTTTGAAA-GTATATTGTAAATAT-CTGTTTTTTTAAATATATATACATAAATGTACACACACACATAAATTTATGAAAACTTCAGGAGACTGGAAACTTTTCCTGCCGTACTTTATTTACAGCCCTGCTAA-G-GCTCTTT-GCTGCTGAAGGAGCTTACCTTCCTCGCTCAAGTTTTCTTTGAGCTAAATAGTGATTTCCACAAAGCAAATATCAAATAAAAACCAAGGCATCGATGGATTGGAT-CCAGAGTGGGTTCTGCTCCCTCTGTCCATTCACTGGTGCAAAGCCCTGGCCCACCTTCACCTGCCTGCTGTGGAAGCAGACACCAAACCTGCACACGCTCGGGACACACAGCAGCATT-TTGGCACAGGTGCCCGGGCACCTCCAGCGC-GGCCGTGAGAAGTGGAACAGGACCAACCTTGCAAGCTAGTTCCGTCTTGCAGGTGCCTGCAAGGACACATGGCTGCGTGCCAGCGGATGGAATAAGCATCGCCCCGTGCTAGTTATGGGGTAGGGATGCGGCAAGGCT--GCGTGAACTAGTGCTTGAGACGCCTCTGTGATCTGTAGATGAAAGGTGATTTACAGCTGCAAATTATTGCTCTTCCT-CATGTGAAGTCTGTTATTTTCTGGTGTCCCTCATTTTGACTCATGCCACGTCCCGTATTTGGCGCGGGGCAGGCTCCTCTCTATG-GGTGGCTGTATAGGTTCTTCACATCTCCAGCTGGCTGCATGCTTTAGAAGCGTGTGATCTGCTCACAGCATCCATTCGAGGCGGCCCGGCAGAGCCCATGCTTGGGTAACTTGGGCTGGGATTTGCACAAAACTGCAGCACTT-AGCAGATTGCTTAGCCGGTGGGTTTAGAGGGCCTGTATTTGAG-GCTGGATACCCAGCTGGGTTGTGCACTCAACTCTGTGAACAGGCAGAGGACCAGCTGGGAGAACCTTAAATCCTGTGGCAGAGAGGAGAAACAGGAATATCCCGAGCCTCTGTGCTTGGGGCTGCCCCAGGGAAGGGGAGGAAGCCAGAGAGCACTAGACTGGGGGCC-CTCAGTCTGCAGCTTCTCATCTGCGTCGCAAACCTCCTGGGCAGGAATTACAAAGCAGGAGAGGTTTGTATCTTTGGCAGGCGTCTGGAGGGAAGGGGTATTAGGGTGGTTTTGTGAACAGCCCATGGACAGCAGAAGGAGCCGGTAGATTCGATGTCCTTCCCGCATTATGCACCATGGCATGCTCACCCCATCAGGCTGCAGAGAGCAAGCGACCTTTTGCTCTTTACCGCTAAATAAAACAGCAGAAAAT-AC-CATGTG-GTATAAGATAGTTATAACGATGGAGGAAGAAATATCCCTGATGCTC
seq12
CCCAGTCTGCTAAACGGCATGAGGATTGCAGCTGTGTAGCCTGATTTTTGCAGGCAACATAATGCAGCAGTTAGCAAACCCTCACAACATGGTCATTAGAAGCCTGGGGAGCTGTGCTGGCT-GTGAGACA--CG--TCACACCAG-AGTGCGAGGAATAAAAAAGTGGGCGTGTGGGACTTTCTAGGAGATTTTTCTTGGCAATGAGTCTTGCTTTAAATGTAGAACTGGTAGCTGTGCCACTGGTAGGGCCCCAGGTCGCCGCATGTAGAGCAGCGCATGTCTGATGCTCTGACCTACAGGCAAGAAGGAGGAGAGGAACCAGGAAAACCACGTTTGTCTGTAGCTTTGGGCCTGCACGGGGCCACATTCATGGTGATGA-GGCACTGGTGTTTCTGGCAGGGCAGAAGCCATGGGTGGGGATTTGCAC-AGCAGGAACTCCAGGTAGTGTAGGATTTGGCGCAGCAATTTCAGACAGGAGAACTCCTCTCCAAAAAGAGCAGCCCTTTGACAAGG-CATCCCTCATTTCCTCCA-TCAAACTGCCCTTTGCAAGGAGCA-CGCCTGAGCTGCAACAGCCTATTGCATCTTCCTCCTGAAATCTGATGCTGCCCTGCTTTGATGGCGATCAGTTGATGACTGCAGAAAAGCTGAATCATGAAAGAGAGGTTATAAACAAGGTTATTTTTAACAAAGGAAATGAATCCCACATCATTCAGAAGTCAGG-GCAAGGGACTGTCAGCTAGGGCTTGCAGGTCCCACTTCTGCCCCGACGTTCTTTATGTCCTGGAGCAAATTAAAATCTCTCTAGCCTTTAATTTACCTCCTTGTAGCATGGACAATGCAGCAGACTCCAGTTGTGCTTTCTTTGTACAGCAAAGCTCCTGGTGTGACAATAAAGCGATGCTGGGAGCTGTGTGATGCCAAGGGCTGGCTTGCACTGCCACCCGGCACCTTGCAGAGCTATTTTCAGCCATGCAAAACGATCGCACGCAGTATTTCAACAATGTTCATAGACATTTCAAATGCAGAGAGAGACTAACCAGTCTGATT-CCTCTCTCCCATATTTCACAGTCTTACATTTCAATTCCCACATGAAGCCCAGTAACTTCTGCTTCATTTTGCTCACTGTACACTCAAAATAGACAAGTTTTGCAACCCAGAGACTCATGCT-GTGTCCTTCAGGCACTGGGGTGCTGCTGA-TGTGA-TA-TAGGTAACCAGCCCAGGGGGAAGGAAGACGGGTCTGGTTTACCTACAGCTACA-TGGGAAATTTCCTGA--GT---A-TACT-A-CTTTCTCGCATT--TAAATTCTTC-CAAGAAAAGCATATTTCATTTTTCCTACAAAAGATGACTCCACTCAGAGAAGCAGGAGATTGAGGCTGGGGCTGGCCAGGCTTCCAAG---------A--AATACGAACAACTCCACATAGAGATCCTTGCAAACCTGTCTTCTCTGCTTC-ATGCCCAAGTTAACAAGTGTTTTGGGTGTGTCTTCCCAACATTACAGAGATAACTACCAAGGTTAAAGAACTCTAGATTTTTTTTTATGT-T-TTGTAAATATTC-GTATTTT-AA-TATATACACATG--TGTA----------T---TT-ACAAAAACTTTAGGAGATTGTAAACTTC-CCTGCCATACTT-ACTCAAAATGCTGTTAAAGAGGACTTTTGCTCCTGGAGGAGCTCACTTTGCTCACTCAAGTTTTCTTTAAG-TAG---G-GACTTCCACACAGAAAATATCAGATAGAACCCAAAGCAGAAATGGGTTGGATGCCA-AGTA--T---G-T---TCTGT---T-C-C-----C----CC-TGGCCCACCTTCACCTGCCTGCTGTGGAAGCAGACACCAAACCTGCACTGGCTTGGGGGACACAGCAG-AGCCTTAGCACAGGTGCCCAGGAACCTCCAGC-CTGGTTGGGAGAAGTGGAACAGGACCAACCCTGCAAGCCAGTTCCCTCTTGCTGGTGCCTGCAAGGACATGTGGCTGTGTGCCAGCAGATGGAATAACCATTGCCCCGTGCTAGTTATGGGGCAGGGATGCTGCACGGCTCTGC-T-AATTAGTGCCTGAGACGCCTCTGTGATCTGTAGATGAAAGGTGATTTACAGCTGCTAATTATTGCTCT-CCTTCATGTGAAGTCTATTATTTCCTGGTGTCCCTCATTTTGACTCATGCCACATCCCGTATTTGGCACAGGGCAGGTTCCTCA-TA-GAGGTGG-TGC-T---TTG--CA--TCTCCAGCTGCCTGCACACTGAACAAGCCTGAGATCTGCTCTCAGCGTCCATTTGAGGCAGCCTGGCTGTGCCCATGCTTGGGAAACTTGGCCTGGAATTTGCATAAAGTTACAACATTTGAGCAGGT-GCTTAGCTAGTGGATTGAGGGGGC-TGCATTC-AGTGCTGGGTACTCAGCTGGGCTGTGCACTCAACTCTGTGA---G-CAGAGGACCAGCTGGGTGAACC--A------G----AGAG-G-AGAAACAGGAATATCCCGAACTTCTCTGCTCGGGACTGCCCCAGGGAAGGGGAAGAAGCCAGAGAGGACCAGACTAGGG-CCTCTTAGTCTATGGCTTCTCATCTGTGTCCCAAACCTCCTGGGCAGGAGTTACAAAGCAGGAGAGGTTCGTGTCTCTGGCAGGTGTCTGGAGGGAAGGGATATTAACGTGGTTTTCTGAACAGCCCATGGATAGCAGAAGGAACAAGTATTTTCAATGTCCTTCCCATGTTATGCCCCATGGCA--C--AT---A-CAGGTTGCAAA-AGCAAGTGACATTTTGCTCTT-ATTGCAAAATAAAAGAGCACAAAGTGACACATG-GCGTA-AGG-TAGTTGTAACGACAGAGGATGAAATATCCCTGATACTC

2 1015
seq21
GTCCCTGTAGCTTATAGCAAAGCATGGCACTGAAGATGCCAAGACGGTTGCCTTC-ATCATACCCAGGGACAAAAGACTTAGTCCTAACCTTACAGTTAATTCTTGCTAAACATATACATGCAAGTATCCGCGCACCAGTGTAAATGCCCTCAATCTCTTGCTTGCAAGACAAAGGAGCGGGTATCAGGCACACCTGTAATTGAACCGTAGCCCAAGACGCCTTGCTTAGCCACACCCCCACGGGTATTCAGCAGTAGTTAACATTAAGCAATAAGTGTAAACTTGACTTAGTTATAGCAACACTCAGGGTCGGTAAATCTTGTGCCAGCCACCGCGGTCACACAAGAGGCCCAAATTAACCGTATACACGGCGTAAAGAGTGGTACCATGCTATCCCATCAACTAGGATCAAAGTGCAACTGAGCTGTCGTAAGCCCAAGATGCATTAAAAGCCACCCTCAAGACGATCTTAGCACCCCCGATCAATTGAACCCCACGAAAGCTGGGACACAAACTGGGATTAGATACCCCACTATGCCCAGCCCTAAATCTTGATGCTTACCCCACTGAAGCATCCGCCTGAGAACTACGAGCACAAACGCTTAAAACTCTAAGGACTTGGCGGTGCCCCAAACCCACCTAGAGGAGCCTGTTCTGTAATCGATAACCCACGATACACCCAACCGTCCCTTGCCACAGCAGCCTACATACCGCCGTCGCCAGCTCACCTCTACCTGAGAGTGCA-A-CAGTGAGCACAATAGCCCTAC-G-C--CGCTAACAAGACAGGTCAAGGTATAGCTCATGGGGCGGAAGAAATGGGCTACATTTTCTAAG-ATAGAAAACACGAAAAGGGGTATGAAACTACCCCTGGAAGGCGGATTTAGCAGTAAAGCGGGACAATAAAGCCCCCTTTAAGTCGGCCCTGGAGCACGTACATACCGCCCGTCACCCTCCTCATAAGCCCCTATTGCTCATAACTAATACACCTACCAGCTGAAGATGAGGTAAGTCGTAACAAGGTAAGTGTACCGGAAGGTGCACTTAGCACACCAAGATGTAGCTAAACGTAAAGCATTCAGCTTACACCTGAAAGATATCTGCC-TCTTACCGGATCATCTTGAAG-CCAACTCTAGCCCAACCATATTACTAATAGAGCACACCA-AAAAAATCCACTCCACC-ACCAAATTAAAACATTTTTTCCACAACTTAGTATAGGCGATAGAAAAGATACTTTGGCGCTATAGAGATATTTGTACCGCAAGGGAAAGATGAAATAACAATGAAAAACTCAAGCAACAAATAGCAAAGATAAGCCCTTGTACCTTTTGCATCATGATTTAGCAAGAACCACCAAGCAAAATGAATTTTAGCTTGCCACCCCGAAACCTGAGCGAGCTACTTACAAGCAGCTATCCTAGAGCGAACCCGTCTCTGTTGCAAAAGAGTGGGAAGACTTGCCAGTAGAGGTGAAAAGCCTACCGAGCCAGGTGATAGCTGGTTGCCTGTGAAACGAATCTAAGTTCCCTCTTAATTTTCCTCTACGGACCCCACCCAACCCCCAACGTAGTGAATCAAGAGCTATTTAAAGGGGGTACAGCCCCTTTAAAGAAGGACACGCCTTCCCTAGCGGATAACTTACCCAACCCCACCCCCTAAACTTGTAGGCCCTTAAGCAGCCATCAGCAAAGAGTGCGTCAAAGCTCCACAC--CCCAAAAATCTGAAGACTGTACGACTCCCTTACCACCAACAGGCCAACCTATAACAATAGAAGGATTAATGCTAAAATAAGTAACTAGGGCCTCTCACCCTCTCAGGCGCAAGCTTACATGATTCCATTATTAACAGGCTAACTAATACCGCAACTTTGACAAGACAAAATATTGAACCCGTC-CTGTTAACCCAACTCAGGAGCGCCCATAAGAAAGATTTAAATCTGCAGAAGGAACTAGGCAAACCCAAGGCCCGACTGTTTACCAAAAACATAGCCTTCAGCCAACCAAGTATTGAAGGTGATGCCTGCCCAGTGACCCCACGTTCAACGGCCGCGGTATCCTAACCGTGCGAAGGTAGCGCAATCAATTGTCCCATAAATCGAGACTTGTATGAATGGCTAAACGAGGTCTTAACTGTCTCCTGTAGATAATCAGTGAAATTGATCTTCCTGTGCAAAAGCAGGAATAGGCACATAAGACGAGAAGACCCTGTGGAACTTAAAAATCAGCGGCCACCACACATTTA-ACTCCTAAGCCTACTAGGCCCGCACACCCCC-TCCAAACACTGGCCCGCATTTTTCGGTTGGGGCGACCTTGGAGAAAAACGAATCCTCCAAAAATAAGACCACACCTCTTAACCAAGAGCAACATCTCAACGTACCAACAGTAACCAGACCCAGCACAAGCCTGACTAATGGACCAAGCTACCCCAGGGATAACAGCGCAATCTCCTTCAAGAGCCCATATCGACGAGGAGGTTTACGACCTCGATGTTGGATCAGGACATCCTAATGGTGCAGCCGCTATTAAGGGTTCGTTTGTTCAACGATTAACAGTCCTACGTGATCTGAGTTCAGACCGGAGCAATCCAGGTCGGTTTCTATCTATGAC-GAACTTTTCCTAGTACGAAAGGACCGGAAAAGTAGAGCCAATACTACAAGCATGCCCTCCCTCTAAGCAGTGAATCCAACTAAACTGCCAAAAGGACACCCACAA-CCCC-TACATCCTAGAAAAGGACCGCTAGCGTGGCAGAGCTCGGCAAATGCAAAAGGCTTAAGCCCTTTACCCA
seq22
GTCCCTGTAGCTTACAGCAAAGCATGGCACTGAAGATGCCAAGACGGTTGTC-TCTATCATACCCAAGGACAAAAGACTTAGTCCTAACCTTACAGTTAATTCTTGCTAGACATATACATGCAAGTATCCGCGCACCAGTGTAAATGCCCTCAATCTCTTGCTTGCAAGACAAAGGAGCGGGCATCAGGCACACCCATGATTAAATCGTAGCCCAAGACGCCTTGCTTAGCCACACCCCCACGGGTATTCAGCAGTAATTAACATTAAGCAATAAGTGTAAACTTGACTTAGTTATAGCAGCCCTTAGGGTCGGTAAATCTTGTGCCAGCCACCGCGGTCACACAAGAGACCCAAATTAACTGTA-ATACGGCGTAAAGAGTGGCATCATGTTATCCCACCAACTAAGATCAAAGTGCAACTGAGCTGTCACAAGCCCAAGATGCATTAAAAACCACCCTCAAGACGGTCTTAGCACTCACGATCGATTGAATCCCACGAAAGCTGGGGCACAAACTGGGATTAGATACCCCACTATGCCCAGCCCTAAATCTTGATGCTTACCCTACTGAAGCATCCGCCTGAGAACTACGAGCACAAACGCTTAAAACTCTAAGGACTTGGCGGTGCCCCAAACCCACCTAGAGGAGCCTGTTCTATAATCGATAACCCACGATACACCCAACCATCCCTTGCCACAGCAGCCTACATACCGCCGTCGCCAGCTCACCTCTACCTGAGA--GCATAGCAGTGAGCGCAATAGCCCAACAGACATCGCTAACAAGACAGGTCAAGGTATAGCCCATGGGACGGAAGAAATGGGCTACATTTTCT-AGAATAGAAAACACGAAAAGGGGTGTGAAACTACCCCTGGAAGGCGGATTTAGCAGTAAAGCGGGACAATAAAGCCCCCTTTAAGTTGGCCCTGGGGCACGTACATACCGCCCGTCACCCTCCTCATAAGCCCCCATTACTTATAACTAATACATTTACAAGCTGAAGATGAGGTAAGTCGTAACAAGGTAAGTGTACCGGAAGGTGCACTTAGCACACCAAGATGTAGCTAAACATAAAGCATTCAGCTTACGCCTGAAAGATATCTACCATC-TATCGGATCATCTTGAAGCCCAACTCTAGCCCGACCATATCAATAA-CGAG-ACA-CACTAAGAAGCTACTCC-CCTACCAGATTAAACCA-TTTTTCCACAACTTAGTATAGGCGATAGAAAGGACACTTTGGCGCGATAGAGATATCTGTACCGCAAGGGAAAAATGAAATAATAATGAAAAACTCAAGCAACAAACAGCAAAGATAAACCCTTGTACCTTTCGCATCATGATTTAGCAAGAACAACCAAGCAAAATGAATTTTAGCCTGCCATCCCGAAACCTGAGCGAGCTACTTACAAGCAGCTACCCCAGAGCGAACCCGTCTCTGTTGCAAAAGAGTGGGAAGACTTGCCAGTAGAGGTGAAAAGCCTACCGAGCCAGGTGATAGCTGGTTGCCTGTGAAATGAATCTAAGTTCCCTCTTAATTTTCCTCTACGGAGCCCACCTAA-CCCCAACGTAGTGAATCAAGAGCTATTTAAAGGGGGTACAGCCCCTTTAAAAAAGGACACACCTCCCCTAGCGGATAA-TTACCCAACCTTACGTCCT-AACTTGTAGGCCCTTAAGCAGCCACCAGCAAAGAGTGCGTCAAAGCTCCACACATCAAAAAAATCTGAAAACCACATGACTCCCTTACCACTAACAGGCCAACCTATAACAATAGGAGAATCAATGCTAGAATAAGTAACTAGGGCCCCTCACCCTCTCAGGCGCAAGCTTACATCATTATATTATTAACAGACCAACTAATACCACAACTTTAACAAGATAGAATATTAAACCC-ACTCTGTTAACCCAACCCAGGAGCGCCCATAAGAAAGATTTAAATCTACAAAAGGAACTAGGCAAACCCAAGGCCCGACTGTTTACCAAAAACATAGCCTTCAGCCAACCAAGTATTGAAGGTGATGCCTGCCCAGTGACCCCACGTTTAACGGCCGCGGTATCCTAACCGTGCGAAGGTAGCGCAATCAATTGTCCCATAAATCGAGACTTGTATGAATGGCTAAACGAGGTCTTAACTGTCTCCTGTAGATAATCAGTGAAATTGATCTTCCTGTGCAAAAGCAGGAATAAACACATAAGACGAGAAGACCCTGTGGAACTTAAAAATCAGCAGCCACCACACAAC-AGACTCCCAAGCCTACCAGGCCCACATACCCCCCTCCAAACACTGGCCTGCATTTTTCGGTTGGGGCGACCTTGGAGAAAAACGAATCCTCCAAAAACAAGACCACACCTCTTAACCAAGAGCAACACCTCGACGTACTAACAGTACCCAGACCCAGCACAAGTCTGACCAATGGACCAAGCTACCCCAGGGATAACAGCGCAATCTCCTTCAAGAGCCCATATCGACAAGGAGGTTTACGACCTCGATGTTGGATCAGGACATCCTAATGGTGCAGCCGCTATTAAGGGTTCGTTTGTTCAACGATTAACAGTCCTACGTGATCTGAGTTCAGACCGGAGCAATCCAGGTCGGTTTCTATCTATGACAGA-CTTTTCCTAGTACGAAAGGACCGGAGAAGTAGGGCCAATGCTGCAGGTACGCCCTCCCCC-AAGCAATGAATCCAACTAAACCGCTAAAAGGACACACATAAACCCCGTACATCCTAGAAAAGGATCGCTAGCGTGGCAGAGCTCGGCAAATGCAAAAGGCTTAAGCCCTTTACCCA
dN-dS • 3.1k views
ADD COMMENT
1
Entering edit mode
9.3 years ago
Brice Sarver ★ 3.8k

I get around this by splitting the dataset into genes/transcripts of interest, then using sed (or any other find/replace) on a template control file to add the appropriate output filename, trees, and dataset. I then submit these control files either in batches on stand-alone systems or all at once to a distributed cluster. I've submitted up to 60k this way before (though how many you can submit simultaneously will be a function of your cluster's settings or stand-alone's resources). I make summary tables after the fact and delete any extraneous files so as to not bog down the filesystem.

Hope this helps.

ADD COMMENT

Login before adding your answer.

Traffic: 1909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6