Dear all,
The available online classification model for enterotypes (http://enterotypes.org) states:
"An enterotype classification model, fit on 278 MetaHIT samples (E. Le Chatelier et al., 2013)"
In the De Costea et al., 2018 publication associated to the online classification model, it's also referring to
"278 MetaHIT samples 3", where ref [6] = E. Le Chatelier et al., 2013.
I'm looking for the FASTQ corresponding to those 278 samples.
Searching for these, I stumbled upon an associated ressource on MGnify, where it's the indicated sample count = 292.
This MGnify ressource is itself pointing toward ENA, project PRJEB4336 for actual raw data.
In this project, there are 595 study accession, each with paired-end FASTQ.
=> I fail to figure out how to identify the 278 samples I'm looking for among these 595 accessions.
I did thus read E. Le Chatelier et al., 2013:
292 non-diabetic individuals were included in the protocol Blockquote
Ok, I understand the 292. Now trying to figure out the 278:
As the number of genes detected had some dependence on the number of matched reads (Supplementary Fig. 1), we downsized the data set to 11 million reads, thus excluding 15 individuals and the bimodal distribution was again observed (Fig. 1b).
But 292 - 15 = 277, close but not quite the same.
I then tried to read De Costea et al., 2018 publication & its Supplementary material - Data source:
Projection onto a set of 278 Danish samples (MetaHIT samples)
Additional metagenomic shotgun sequencing data originate from samples (368 Chinese samples and 278 samples from the MetaHIT project) described in Qin et al. and Le Chatelier et al.
Data, together with code for generating the main figures can be found at: https://hub.docker.com/r/costeapaul/enterotype_figures/. Instructions for pulling and running the docker can also be found there. Blockquote
But it's not directly pointing ressources to download the 278 FASTQ.
I did find, https://www.sanger.ac.uk/resources/downloads/bacteria/metahit, but links are clearly outdated.
Then, I tried checking what's within the Docker image:
But it's not FASTQ data (as expected)
Lastly, I tried to understand the 275 using MGnifyR
Again, failing :
Any tips would be welcomed to help me identify the right 278 subset of FASTQ samples.