Question

Experiences With Pangea

3

Entering edit mode

13.9 years ago

Daniel ★ 4.0k

Ref: http://www.nature.com/ismej/journal/v4/n7/full/ismej201016a.html

I'm new to the 454 pipeline racket and have been looking around for the best pipeline to use to analyze the 16s sequences we'll be getting in the next couple of months. RDP and Mothur have cropped up on occasion but PANGEA has several reasons why it is better than RDP (last couple of paragraphs of the paper above).

The highlights are that the pipeline is stored and processed on your own site and sequences don't require uploading and the whole pipeline will run from one command.

I was just wondering if anyone out there has had any success/pitfalls with it. I'm currently setting it up to try with some sanger data we have knocking around but some real life experiences would be helpful!

pipeline • 3.2k views

ADD COMMENT • link 13.9 years ago by Daniel ★ 4.0k

1

Entering edit mode

the documentation for Pangea seems severely lacking that's quite worrisome

ADD REPLY • link 13.9 years ago by Istvan Albert 102k

0

Entering edit mode

I agree. Also, the taxcollector database used for taxanomic descriptions is proving difficult to set up with dead-end weblinks and some faltering python (I'm only perl-native), which is hampering my ability to report back. Shall do when I crack it!

ADD REPLY • link 13.9 years ago by Daniel ★ 4.0k

0

Entering edit mode

I wrote TaxCollector and someone in my lab wrote Pangea. What dead-end web-links?

ADD REPLY • link 13.9 years ago by Science_Robot ★ 1.1k

0

Entering edit mode

In the setup file readme.md the command:

curl ftp://ftp.ncbi.nih.gov/pub/taxdump.tar.gz | gunzip | tar -xvf names.dmp nodes.dmp

directs to the wrong ftp address. It's actually found at:

ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

I had to add lines to the remdup.py script to use seq and name as global variables

I also had to add import sys to remove_uncultured.py

I'm new to python so don't know if these are the most correct fixes, but these allowed me to run the scripts to generate the taxcollector database.

(all run on biolinux)

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 13.9 years ago by Daniel ★ 4.0k

0

Entering edit mode

chose an answer as it keeps getting bumped by community and its annoying me. Very much like to commend taxcollector on its usefulness though.

ADD REPLY • link 13.2 years ago by Daniel ★ 4.0k

Ram · Accepted Answer · 2011-01-18

0

Entering edit mode

13.9 years ago

Daniel ★ 4.0k

(answered instead of commented for formatting reasons) Re:audyyy

In the setup file readme.md the command:

curl ftp://ftp.ncbi.nih.gov/pub/taxdump.tar.gz | gunzip | tar -xvf names.dmp nodes.dmp

directs to the wrong ftp address. It's actually found at:

ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

To the file remdup.py I had to add lines to define seq and name as global variables

To the file remove_uncultured.py I had to add import sys

I'm new to python so don't know if these are the most correct fixes, but these allowed me to run the scripts to generate the taxcollector database.

Hope this helps.

(all run on biolinux)

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 13.9 years ago by Daniel ★ 4.0k

0

Entering edit mode

Thanks for pointing out the type-o. I just finished writing a Rakefile to fetch the required databases, filter them and create the TaxCollected version of RDP. https://github.com/audy/taxcollector

There is a difference between this version of TaxCollector and the one described in the paper. This one considers species and subspecies/strain separately. Before, different strains were considered different species which resulted in a lower number of sequences being classified to the Species level.

I hope this helps.

ADD REPLY • link 13.9 years ago by Science_Robot ★ 1.1k

0

Entering edit mode

Thanks, Daniel.

ADD REPLY • link 13.9 years ago by Science_Robot ★ 1.1k

0

Entering edit mode

Where did you get this copy of taxcollector? The one that's on the sourceforge works.

Also, I've been maintaining TaxCollector on GitHub (I don't touch the SourceForge version). Try https://github.com/audy/taxcollector/tree/1.0.0

2.0.0 Has a Rakefile which creates the taxcollector database if you just type Rake

ADD REPLY • link 13.8 years ago by Science_Robot ★ 1.1k

0

Entering edit mode

Also, there's an already-made database and instructions here http://www.microgator.org/taxcollector/.

ADD REPLY • link 13.8 years ago by Science_Robot ★ 1.1k

0

Entering edit mode

I apologise for the time it's taken for me to get back to this. Busy busy busy. The V2 taxcollector works perfect and I am a big fan. The rake is immensely useful. I cant remember where I got the original, sorry.

Re:PANGEA, Ive found a few case-sensitivity issues which you may want to pass on to whoever it concerns (Being a total Mac noob, I guess this isn't a problem on there?). Differences between the reference in the script and the filename. Look at clustertable.pl and 1.4_Barcode.

Hope this helps

ADD REPLY • link 13.7 years ago by Daniel ★ 4.0k