I'm new to the 454 pipeline racket and have been looking around for the best pipeline to use to analyze the 16s sequences we'll be getting in the next couple of months. RDP and Mothur have cropped up on occasion but PANGEA has several reasons why it is better than RDP (last couple of paragraphs of the paper above).
The highlights are that the pipeline is stored and processed on your own site and sequences don't require uploading and the whole pipeline will run from one command.
I was just wondering if anyone out there has had any success/pitfalls with it. I'm currently setting it up to try with some sanger data we have knocking around but some real life experiences would be helpful!
I agree. Also, the taxcollector database used for taxanomic descriptions is proving difficult to set up with dead-end weblinks and some faltering python (I'm only perl-native), which is hampering my ability to report back. Shall do when I crack it!
To the file remdup.py I had to add lines to define seq and name as global variables
To the file remove_uncultured.py I had to add import sys
I'm new to python so don't know if these are the most correct fixes, but these allowed me to run the scripts to generate the taxcollector database.
Hope this helps.
(all run on biolinux)
ADD COMMENT
• link
updated 5.2 years ago by
Ram
44k
•
written 13.9 years ago by
Daniel
★
4.0k
0
Entering edit mode
Thanks for pointing out the type-o. I just finished writing a Rakefile to fetch the required databases, filter them and create the TaxCollected version of RDP. https://github.com/audy/taxcollector
There is a difference between this version of TaxCollector and the one described in the paper. This one considers species and subspecies/strain separately. Before, different strains were considered different species which resulted in a lower number of sequences being classified to the Species level.
I apologise for the time it's taken for me to get back to this. Busy busy busy.
The V2 taxcollector works perfect and I am a big fan. The rake is immensely useful. I cant remember where I got the original, sorry.
Re:PANGEA, Ive found a few case-sensitivity issues which you may want to pass on to whoever it concerns (Being a total Mac noob, I guess this isn't a problem on there?). Differences between the reference in the script and the filename. Look at clustertable.pl and 1.4_Barcode.
the documentation for Pangea seems severely lacking that's quite worrisome
I agree. Also, the taxcollector database used for taxanomic descriptions is proving difficult to set up with dead-end weblinks and some faltering python (I'm only perl-native), which is hampering my ability to report back. Shall do when I crack it!
I wrote TaxCollector and someone in my lab wrote Pangea. What dead-end web-links?
In the setup file readme.md the command:
directs to the wrong ftp address. It's actually found at:
I had to add lines to the
remdup.py
script to useseq
andname
as global variablesI also had to add
import sys
toremove_uncultured.py
I'm new to python so don't know if these are the most correct fixes, but these allowed me to run the scripts to generate the taxcollector database.
(all run on biolinux)
chose an answer as it keeps getting bumped by community and its annoying me. Very much like to commend taxcollector on its usefulness though.