Rapid Querying Of Metagenomic Data Given A Reference Genome
0
3
Entering edit mode
10.7 years ago
weslfield ▴ 90

I would like to design an automation for searching metagenomic data using a reference genome or a list of genes belonging to one organism. I would like to output multiple sequence alignments for each of these genes with the metagenomic sequences that match it up to a certain identity. I am aware of both JGI's IMG/M and MG-RAST as potential data sources. MG-RAST has an API while it appears IMG/M does not. MG-RAST's API uses synchronous GET requests. So I was planning on using Python's Requests package to make these requests and Clustal Omega to make the alignments... obviously speed is a major concern. Has anyone done something similar to this? Any suggestions? I just need to do this without downloading any whole metagenomes as these don't have lookup tables and this whole process would be extremely slow. Thanks for any suggestions.

python • 2.6k views
ADD COMMENT
2
Entering edit mode

When it comes to large scale analysis it is best if you run all tools locally and not connect to the web resources. These are rarely designed to withstand large throughput queries. What do you need to get from the remote source that you would not be able to produce more efficiently yourself?

ADD REPLY
0
Entering edit mode

The metagenome data mostly, which is very large and very computationally intensive to search. JGI's IMG can do something similar to this in a very short amount of time probably by leveraging look-up tables and annotated genomes/metagenomes. https://img.jgi.doe.gov/cgi-bin/m/main.cgi?section=GenomeHits

ADD REPLY

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6