Convert Opgen To Paired End Reads
2
1
Entering edit mode
13.3 years ago
Lee Katz ★ 3.2k

Hi all, I am using OpGen data, which I would say is a really fantastic aid for genome assembly. Below are a few data points (334 sites total). Using these data points, I was able to discover misassemblies from my automated assembly tools (e.g. Newbler). My overall question is, how can I automate my assembly using these high-quality data points?

My immediate solution is to artificially convert these sites into 6-mer paired end reads. For example the first data point below describes a restriction fragment that is 14867 bp. In other words there are two NheI sites 14867 bp away from each other. So, my immediate question is, how can I convert these sites into paired end reads? What is a paired end read file format that Newbler would accept? The restriction site is G^CTAGC.

Thank you for your help.

  <RESTRICTION_MAP ID="XYZ" ENZYME="NheI" INSILICO="false">
    <MAP_DISPLAY DBID="13" EDITABLE="false" STICK="false" X="10000" Y="149" TRANS="255" ORDER="1320" ORIENTATION="1" CIRCULAR="true" GROUPID="-1" />
      <FRAGMENTS SHIFT="0" OFFSET="1">
        <F I="0" S="14867" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="1" S="7731" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="2" S="9070" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="3" S="2016" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="4" S="3175" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
        <F I="5" S="5418" STDDEV="0.000" HIGHLIGHT="false" HIDE="false" GAP="false" />
      </FRAGMENTS>

      <MAP_METRICS STRETCH="0" RECT_AVE="0.00" RECT_ALL="0.00" MID_STDDEV="0.00" R="0.00" WIGGLE="0.00" GAP_STDDEV="0.00" GAP_MAX="0" />
      <FEATURES>

      </FEATURES>

  </RESTRICTION_MAP>
paired assembly xml • 3.5k views
ADD COMMENT
4
Entering edit mode
13.3 years ago

SOMA and MapSolver is providing you with contig ordering. However, based on your response to Jeremy's answer, it sounds that you want to fix the false contig joins by mate pairs. So my suggestion is to identify those false joins based on the MapSolver (use their GUI program to identify the location), and break the pairing of the bad mates that caused the false joins in the first place.

For example, in your scaffolds, you have contig1 and contig2 adjacent, but upon examination in MapSolver, contig2 might go to a different place (would appear as crossing lines in their plot). You can then identify the reads with one end on contig1 and the other end on contig_2. Then move them to the unpaired set.

If you don't have many contigs, moving the contigs manually using the OM guide would also be a viable solution.

It is worthwhile to also mention Bambus. In principle, they accept XML format that can be many different data types such as synteny and genetic/physical map, but it is not straightforward to use.

ADD COMMENT
0
Entering edit mode

I wish I had mate pairs to correct! My assembly is based on single end reads. Your solution is a really good for a misassembly that involves paired end reads already. Bambus looks good too (one more tool to add to my toolbox!), but I would not know where to break my misassembled contig so that I could use it.

ADD REPLY
0
Entering edit mode

so you are saying that you have chimeric contigs? you can try to map your reads to your contigs, and look for the regions with low read coverage. remove those reads, reassemble. Bear in mind OM can also contain errors.

ADD REPLY
0
Entering edit mode

That's a very good point that OpGen maps can contain errors. In a recent seminar at my institution, they discussed how they may eventually bring in confidence scores (or something approximating that), but for now they do not and I am considering them as high confidence. I have chimeric contigs, but I do not have an assembly file (ace, afg, etc) due to my comprehensive assembly process. However, I may choose to just use Newbler so that I have an ace file, and then use your method. That is a good idea.

ADD REPLY
1
Entering edit mode
13.3 years ago

I take it you've tried SOMA?[?] ftp://ftp.cbcb.umd.edu/pub/software/soma/

http://bioinformatics.oxfordjournals.org/content/24/10/1229.abstract[?] Niranjan Nagarajan, Timothy D. Read and Mihai Pop[?]Scaffolding and validation of bacterial genome assemblies using optical restriction maps

ADD COMMENT
0
Entering edit mode

The article is starting to look good to me! I will try it out.

ADD REPLY
0
Entering edit mode

Ok... it's good but not exactly what I am looking for. I want to avoid misassemblies by using OpGen data at the time of assembly. Using MapSolver and SOMA both look at an assembly and suggest contig ordering but do not fix a misassembly.

ADD REPLY

Login before adding your answer.

Traffic: 1427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6