Question

Removing bacterial sequences from assembled eukaryotic draft genomes ?

1

Entering edit mode

2.5 years ago

sunnykevin97 ▴ 990

Hi,

Are there any good straight forward pipelines/tools which can robustly remove the bacterial sequences from eukaryotic genomes ?

Suggestions please!

protein genome gene • 1.3k views

ADD COMMENT • link updated 2.5 years ago by Mensur Dlakic ★ 28k • written 2.5 years ago by sunnykevin97 ▴ 990

0

Entering edit mode

Unless they are present as separate contigs/pieces there is no way you are going to be able to detect/remove the sequences.

ADD REPLY • link 2.5 years ago by GenoMax 147k

0

Entering edit mode

Then, should I remove the contamination from raw reads and do assembly after cleaning ?

Again start from step1 very tedious.

Any suggestions please!

ADD REPLY • link 2.5 years ago by sunnykevin97 ▴ 990

score 4 · Answer 1 · 2022-06-07

4

Entering edit mode

2.5 years ago

Mensur Dlakic ★ 28k

It depends somewhat if you have a higher or lower eukaryote in your main assembly, but it is not a deal-breaker. Higher eukaryotes have such different usage of tetranucleotides that you can do simple 4n frequency binning with t-SNE. Prokaryotic contigs should be easy to separate - see an example here.

In the image below, three bacterial genomes are in the lower right corner and at 9 o'clock. Other small clusters are likely repetitive regions in eukaryotic DNA.

enter image description here

ADD COMMENT • link 2.5 years ago by Mensur Dlakic ★ 28k

2

Entering edit mode

Prokaryotic contigs should be easy to separate

As long as they are present as separate contigs in the data correct? There is nothing to be done with reads that may be misincorporated, rather than starting over.

ADD REPLY • link 2.5 years ago by GenoMax 147k

0

Entering edit mode

You're right.

ADD REPLY • link 2.5 years ago by sunnykevin97 ▴ 990

0

Entering edit mode

That is correct. I don't think there is any reason to expect "chimeric" contigs because it is unlikely that a eukaryote and a prokaryote have long stretches of near-identical DNA. This is especially the case for higher eukaryotes where coding density is very low. Even in complex environmental communities with many members that are considerably closer to each other than any eukaryote and any prokaryote could be, metagenomic assembly and binning are fairly successful.

ADD REPLY • link 2.5 years ago by Mensur Dlakic ★ 28k