Question

[Solved] Differentiating between chromosomal and plasmid DNA

1

Entering edit mode

7.9 years ago

Harry ▴ 10

Hello all,

Introduction

I am a recent graduate working for a Public Health Laboratory. I'm relatively new to bioinformatics, and most of what I know is based around NGS analysis. My lab director loves to challenge me. He wants to know different ways NGS (we have a MiSeq) could be implemented in our lab as an outbreak investigation tool.

I am aiming to do a study of Carbapenem-Resistant Enterobacteriaceae (CRE). The main goal is to be able to receive a CRE sample and use NGS to detect the genes (beta-lactamases) that are responsible. The idea is to be able to run quick analysis, while also compiling genetic information that could be used to connect the dots in an outbreak investigation (Phylogeny).

What I already Know

The genes that I am looking for can be found within the bacterial chromosome, or within its plasmids
For each gene, primers need to be designed for them.

The Actual Questions

First: If I wish to take the whole-genome-sequencing approach, how would I be able to tell which parts of my output (FastQ) are plasmids vs. which parts are chromosome?

Second: If I didn't want to do whole-genome, would it be possible to only sequence the genes that I'm looking for (if they are there)? And if so, how would I do it?

Open for Discussion

If anyone has any suggestions, solutions, or wishes to point me in a direction where I can learn more, please let me know. It would be a huge help, and is deeply appreciated.

Edit: Solution Found

Thanks to those who commented before, I know have a better understanding on how this all works. Also, it put me on a path to find an example of how this type of experiment is done in a clinical laboratory. You can find the study here.

genome sequencing plasmid CRE superbug • 2.9k views

ADD COMMENT • link 7.9 years ago by Harry ▴ 10

1

Entering edit mode

One of my co-workers tried using machine learning to distinguish between plasmids and main genome based on the genes present after annotation, with some degree of success. This is much easier on assembled contigs than raw reads, which are usually too short for annotation. It should also be theoretically possible to analyze the graph structure during assembly to determine which contigs are co-located and the size of the chromosome they are located on. This can also be done after the fact using a graph file that some assemblers produce.

You can certainly try selectively amplifying the genes in question with the correct primers, but I think WGS is probably simpler and more robust. You can assemble the reads and then compare the contigs to your genes in question, or simply map the raw reads to the genes in question; either works. The MiSeq has sufficient capacity to sequence 30+ bacteria per run with 40x coverage, depending on the run mode (that's in 24 hours at 2x150bp).

ADD REPLY • link 7.9 years ago by Brian Bushnell 20k

0

Entering edit mode

I'm starting my masters program soon, and machine learning is something I am really interested in learning. Any resources you could recommend on the subject? Also, based on your response (as well as Harold's), it seems like using WGS is what will make the most sense. Selectively amplifying genes might be something I might try later down the line, but I know I'm just not there yet.

Thanks for the response, it has put me on an avenue of progressive learning.

ADD REPLY • link 7.9 years ago by Harry ▴ 10

0

Entering edit mode

@Brian do you have a link to the tool? I'm interested in attempting a similar classification problem so would like to see the approach.

ADD REPLY • link 7.9 years ago by Joe 22k

2

Entering edit mode

No, the tool was never finished or made public, sorry. Though I will ask my co-worker about the status and results and report back if there's anything interesting to note.

ADD REPLY • link 7.9 years ago by Brian Bushnell 20k

score 4 · Accepted Answer · 2017-08-02

4

Entering edit mode

7.9 years ago

harold.smith.tarheel ★ 5.0k

1) From WGS data, chromosome vs episome can be distinguished by copy number (reflected in differences in read depth). With appropriate data, you can also assemble the genomes and distinguish the two by contigs.

2) Search for 'amplicon sequencing'. Note that the MiSeq, at 10M+ reads/run, is overkill unless you're barcoding 1000s of samples.

ADD COMMENT • link 7.9 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

Thank you for your response!

It definitely seems that doing WGS would make the most sense.

ADD REPLY • link 7.9 years ago by Harry ▴ 10