Metagenomics/amplicon sequencing: Most publicly available data already host removed sequences or no?
0
1
Entering edit mode
19 days ago
Maxwell ▴ 70

Wondering if publicly available data for microbiome studies already have removed host sequences before uploading to the SRA or other public database?

Is this typical or is it not uniform?

In my experience I think most are host removed, is this correct?

Thanks for any help

amplicon decontamination QC Microbiome metagenomics • 489 views
ADD COMMENT
1
Entering edit mode

Bioinformatician's creed #1: Never trustAlways be skeptical of others' data.

ADD REPLY
0
Entering edit mode

Yeah well that's why I'm asking, because you can't exactly check 300,000 accessions if that's your analysis..

I'm wondering if this is what is done in the publicly available data or not. So you're saying that it's variable then ?

ADD REPLY
0
Entering edit mode

For example this is on the SRA website, but it's not clear if this is for shotgun or amplicon or both. I think it's both but not sure:

Metagenomic data Human metagenomic studies may contain human sequences and require that the donor provide consent to archive their data in an unprotected database. If you would like to archive human metagenomic sequences in the public SRA database please contact the SRA and we will screen and remove human sequence contaminants from your submission

So then maybe it's only human host removed data sure, but what about other host species-- that's probably not removed because there's no privacy concerns?

ADD REPLY
0
Entering edit mode

Exactly, most institutes' ethics committees will not allow submission of human data to public repositories without broad consent for very valid privacy reasons.

I don't think it is such a problem for mice or other hosts, but the researchers might have screened the host reads out anyway, reasoning that they are not useful for other researchers.

ADD REPLY
0
Entering edit mode

Im trying to get at a rigorous answer, what is typically done? is this required for submission? How could the submissions include both host and metagenome and still be labeled as metagenome? Is there a way to tell whether or not the uploaded data includes both?

I think stating researchers might have screened the host reads out anyways isnt really getting at my question enough unfortunately as yeah I get that!

ADD REPLY

Login before adding your answer.

Traffic: 2277 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6