Question

Program to plot 60,000 16S rRNA dataset for phylogenetic assessment?

0

Entering edit mode

4.2 years ago

prfsullivan • 0

Hello, I'm looking to do a large-scale phylogenetic analysis. I plan to build a PCA plot with 60,000+ DNA sequences. I'd be doing a beta-diversity analysis with one sample comprised of 60,000 sequences while the other three are <200 sequences. I want all of the individual sequences to be included in the plot, rather than datapoints representing the complete samples.

I've been looking at Parallel-Meta and Qiime. Does anyone have any other suggestions? I'd be running it on a 16 GB RAM, 8 thread environment.

Thanks, Peter

gene • 698 views

ADD COMMENT • link updated 4.2 years ago by h.mon 35k • written 4.2 years ago by prfsullivan • 0

0

Entering edit mode

What is the size of the individual sequences? If the sequences are redundant then there is no point in using all of them as is.

ADD REPLY • link 4.2 years ago by GenoMax 148k

0

Entering edit mode

The length would be ~1,000 bp. I was planning to get rid of redundancy so I'd reduce the sample size but my guess is the dataset would still 10,000-20,000.

ADD REPLY • link 4.2 years ago by prfsullivan • 0

score 0 · Answer 1 · 2020-10-20

0

Entering edit mode

4.2 years ago

h.mon 35k

Beta-diversity is a ratio between regional and local species diversity (abundance), and a PCA plot would depict the distances in some estimate of beta-diversity between the amostral units. Therefore, you can't include individual sequences in a PCA plot depicting beta-diversity, because individual sequences aren't diversity measures nor abundances.

Maybe you want a PCA biplot depicting the beta-diversity relation among samples, and also how each species (or other taxonomic unit you are using) relate to the beta-diversity differences among samples?

ADD COMMENT • link 4.2 years ago by h.mon 35k

0

Entering edit mode

Yeah I realized that I do not want to do a beta diversity analysis. I'm basically looking to analyze a small subset's phylogenetic relatedness amongst each other compared to a global population.

I essentially want to run a standard phylogenetic analysis (maximum likelihood method, most likely). But rather than depict the data in tree format, I want to depict the data in a PCA-like plot. I don't need the tree topology. Does that make sense?

I'm also a bit concerned about the size of the dataset.

ADD REPLY • link 4.2 years ago by prfsullivan • 0