Question

Distinguishing sequencing reads as prokaryotic or eukaryotic without a reference genome

0

Entering edit mode

5.9 years ago

darinshrewsberry1994 • 0

Hi all,

I've got sequencing data from the microbiome of a eukaryote that does not have a reference genome. I have performed plenty of pre-sequencing steps to exclude as much eukaryotic DNA as possible however, I still wish to determine if any made it through after sequencing and assembly. What could I do to at least classify the reads as eukaryotic vs prokaryotic?

Thanks.

assembly sequencing genome • 2.8k views

ADD COMMENT • link updated 5.9 years ago by h.mon 35k • written 5.9 years ago by darinshrewsberry1994 • 0

0

Entering edit mode

can you elaborate a little on what you all have done already?

From the top of my head there is not much you can do I think

ADD REPLY • link 5.9 years ago by lieven.sterck 15k

0

Entering edit mode

I extracted the guts of the organism, then placed them in a digestion cocktail to create a single celled suspension, I then filtered it to help break up any clumps. I stained the sample to prepare it for Flourescent cell sorting, we size separated cells to exclude anything larger than 5uM . ideally, this should get rid of the eukaryotic cells thus most if not all of the DNA, however there could be free floating DNA from cells that may have burst. So we checked that with qPCR to quantify the levels of the host DNA before and after sorting. We did see a decrease. So we proceeded with sequencing and assembly. This is the first time we've went through this entire process as a whole. so once we received the assembly stats, my PI wanted one final check after the meta genome assembly to see if there were any eukaryotic reads still present. The problem is that there isn't a reference genome for the eukaryotic organism we're doing this experiment with. When we run this again in the future we're likely going to run a DNAse treatment after cell sorting to degrade the free floating DNA that could be there.

ADD REPLY • link 5.9 years ago by darinshrewsberry1994 • 0

0

Entering edit mode

Just run all the reads you have through something like centrifuge or kraken and it'll fairly quickly identify whats what to a reasonably resolution.

It may even let you segregate just the ones you want too but I'm not 100%.

ADD REPLY • link 5.9 years ago by Joe 21k

0

Entering edit mode

We did run a Kraken analysis and had around 25% characterization, but we're not sure what of the uncharacterized is host or just bacteria that don't exists in the database. Given that our qPCR results suggested that we had little to no host DNA in our sample right before we sent it off for sequencing, we were a little stumped.

ADD REPLY • link 5.9 years ago by darinshrewsberry1994 • 0

score 1 · Answer 1 · 2019-01-15

1

Entering edit mode

5.9 years ago

h.mon 35k

There are several tools for this task, I personally like BlobTools for assembled draft genomes. Here is what you get:

BlobTools

I am plagiarizing myself (Interpreting mapping contaminants):

I like to use BlobTools (blasting against NCBI NT) to explore the taxonomic assignment of an assembly, and detect possible contamination - that is, I check for contaminants post-assembly.

You can also use sketches to analyse contamination either on your raw data (pre-assembly) or on assemblies, see:

Mash Screen: what's in my sequencing run?

What’s in my metagenome?

Tool: BBSketch - A Tool for Rapid Sequence Comparison

Finally, you can also use kmer screening tools like Kraken or Centrifuge to screen and filter out contaminants.

ADD COMMENT • link 5.9 years ago by h.mon 35k

0

Entering edit mode

Thanks, I'll give this a look.

ADD REPLY • link 5.9 years ago by darinshrewsberry1994 • 0

0

Entering edit mode

So then you just remove the contigs that are the bacteria from the assembly (green in blobplot)?

ADD REPLY • link 4.2 years ago by kristina.mahan ▴ 170