Hello,
In trying to do typing of bacterial strains from metagenomic samples I recently came across this tool, strainEST:
https://github.com/compmetagen/strainest
https://www.nature.com/articles/s41467-017-02209-5
It uses mapping against an alignment of representative strains and SNV profiles to predict relative abundances of individual strains in metagenomic samples. As the species I am interested in does not have a pre-built database, the first step was to construct this. After reducing around 3500 reference strains to 650 clusters or representatives, which is quite a lot compared to what is used in the paper, I am now stuck in generating the alignments and am now also wondering if with this number the following steps will be feasible at all.
Does anyone have experience with this tool and knows what kind of resources are required for database construction and running the tool with a larger amount of representative strains? Also are there any comparable tools that you can recommend for the same purpose (identifying and tracking a large number of bacterial strains in few metagenomic samples)?
My testing:
Stuck at this point (at 92% for hours, so close...):
root@cccebff7163a:/strainest/representative_strains# strainest mapgenomes *fasta SR.fa MR.fasta
[#################################---] 92% 0d 00:10:04
Edit:After some more hours the step completed, but in the following step no SNV matrix is created (the step finishes but the resulting snp.dgrp is empty but for the header):
root@cccebff7163a:/strainest/representative_strains# strainest map2snp SR.fa MR.fasta snp.dgrp
Find the core...
Find SNPs [####################################] 100%
Build the SNP matrix [####################################] 100%
Write the snp matrix...
The empty SNP matrix is probably due to the fact that it is not possible to determine core regions that are shared across all the genomes. This is a requirement and nothing is reported in case no core region is found. Checking the script api/_map2snp.py
I noticed that the coverage is rather low for a few of the genomes, which I now removed and reran generation of MR.fasta (also making sure MR.fasta is not included as a genome here, because it will also have coverage 0).
Your prompt says:
You should not perform bioinformatics analyses as root, you should create and use a regular user for that.
this is inside docker