Entering edit mode
5.1 years ago
peggyw
•
0
I am working on a paper to considering understand and theorize a better way to mine a taxonomy information particle as it relates to the Kraken software. At my current point in the process I downloaded a zip file from NCBI gi_taxid_nucl.dmp.gz has a good chuck of what I am looking for like names.dmp, merged.dmp, division.dmp but one file gi_taxid_nucl.dmp is 11.3 GB and notepad++ is not going to open it. What tool or DB would I load this into to view the contents of the information? Am I looking at SQL, Oracle, MySQL?
Google 'gi_taxid_nucl.dmp.gz' yields a 2013 article https://www.polarmicrobes.org/some-things-should-be-easy/ which suggests sqlite or grep. Are you trying to view this file on Windows? Is your primary interest just to see the contents, or to build a database from it and/or link it to other data tables?
sqlLite - interesting and not heard of grep. Windows is my primary tool, yes I know we are all suppose to love Linux, sorry just can't. Right now the primary interest is to not only see the file but convert the information including name.dmp and node.dmp into a Graph Database. I will look for sqlLite tool to install and see if it works.
I took a look at the link and I might be chasing the wrong information. If the gi_taxid_nucl1.dmp is just two values for each row GI ID and Taxonomy ID then the one key item I need is missing is the taxonomy sequence.
Here is where I am trying to get to. I been working a little with Kraken and though a very interesting software there is a major hurdle in trying to use it if you don't have a Super Computer. It comes with four needed DB files and I use the term DB Files lightly. database.idx, database.kdb, names.dmp, and nodes.dmp and in my case with some of the work I been supporting requires a Terabyte of RAM to run. What I real want to get to and I though the gi_taxid_nucl1.dmp would get me there is a Taxomony ID and k-mer database. Looks like I may have to back up in the Kraken code to figure out how a k-mer DB is created and convert the code to create a graphical db vs flat file db.
On Windows, presuming the .dmp files are a delimited text format, you can view using the more command and the corresponding equivalent to 'grep' to search for text in the file is findstr. Those tools will not require you to load the full files into memory. They won't help you directly with your larger goal, but would allow you to view the file(s) to get started.
wrong place to post