Hi all,
I've got sequencing data from the microbiome of a eukaryote that does not have a reference genome. I have performed plenty of pre-sequencing steps to exclude as much eukaryotic DNA as possible however, I still wish to determine if any made it through after sequencing and assembly. What could I do to at least classify the reads as eukaryotic vs prokaryotic?
Thanks.
can you elaborate a little on what you all have done already?
From the top of my head there is not much you can do I think
I extracted the guts of the organism, then placed them in a digestion cocktail to create a single celled suspension, I then filtered it to help break up any clumps. I stained the sample to prepare it for Flourescent cell sorting, we size separated cells to exclude anything larger than 5uM . ideally, this should get rid of the eukaryotic cells thus most if not all of the DNA, however there could be free floating DNA from cells that may have burst. So we checked that with qPCR to quantify the levels of the host DNA before and after sorting. We did see a decrease. So we proceeded with sequencing and assembly. This is the first time we've went through this entire process as a whole. so once we received the assembly stats, my PI wanted one final check after the meta genome assembly to see if there were any eukaryotic reads still present. The problem is that there isn't a reference genome for the eukaryotic organism we're doing this experiment with. When we run this again in the future we're likely going to run a DNAse treatment after cell sorting to degrade the free floating DNA that could be there.
Just run all the reads you have through something like
centrifuge
orkraken
and it'll fairly quickly identify whats what to a reasonably resolution.It may even let you segregate just the ones you want too but I'm not 100%.
We did run a Kraken analysis and had around 25% characterization, but we're not sure what of the uncharacterized is host or just bacteria that don't exists in the database. Given that our qPCR results suggested that we had little to no host DNA in our sample right before we sent it off for sequencing, we were a little stumped.