I'm a sysadmin working for some bioinformaticians at Manchester University.
Today's chore would be trying to fix up some XML and compressed XML from a mildly customised Biomart-style mysql database we're moving off hardware soon to be 'retired'.
Essentially the problem is that the database contains some now invalid embedded absolute URLs that need altering -- naturally I could export and sed, but what to do about the compressed XML?
Does anyone know if there is a relatively painless way of doing this -- perhaps using the java utilities: 'marteditor' and friends?
Please do not edit the XML manually, but have a look at the MartEditor tool that is part of BioMart 0.7 (http://www.biomart.org/user-docs.pdf). Having said that, I am not quite sure what you mean by compressed XML in this context. You can either get the configuration of the BioMart from the database itself, or you load a plain-vanilla XML-file that you previously saved with MartEditor. Perhaps you are referring to the XML-files needed in MartBuilder, but those are only required during the creation of a mart.
Have a look under "Meta-Data Transfer" in http://bergmanlab.smith.man.ac.uk/?p=35. The blog-post is perhaps interesting for you anyway, because it describes the complete workflow of setting up a mart, which includes the use of MartBuilder and MartEditor.
ADD COMMENT
• link
updated 5.2 years ago by
Ram
44k
•
written 13.7 years ago by
Joachim
★
2.9k
1
Entering edit mode
Okay, I understand what you mean by compressed XML now. Please do not alter the meta_* tables yourself, but use MartEditor to load the data from these tables instead. You can just connect with MartEditor to that mart database and it will automatically retrieve the data from those tables. MartEditor also writes your modified config back into those tables.
The compressed xml lives in a table 'meta_confxmldm' as field three, 'compressedxml' of type 'longblob'. Field two is 'xml', i.e. uncompressed xml -- also a 'longblob'.
But OK -- I'll follow the suggestions here and refrain from hand hacking the raw SQL.
Hi Joachim: thanks for responding. The compressed xml lives in a table 'meta_confxmldm' as field three, 'compressedxml' of type 'longblob'. Field two is 'xml', i.e. uncompressed xml -- also a 'longblob'. The fourth field is actually a hash of either field two or three (or both?). But OK -- I'll follow the suggestions here and refrain from hand hacking the raw SQL. – jfsebastian 0 secs ago
I'd agree that it's not real clear what you mean by compressed Xml. The BioMart User Doc referenced by Jochim, references a compression module for Apache which shouldn't impact the data in the actual xml column.
Consider doing this with just SQL.
Here's some pseudo code.
[?][?]
Create table FixedXml
Clustered Index - Link back to the table that originally hosted the xml.
FixedXml
while records affected > 0
begin
insert into FixedXml ([Clustered Index], [Fixed Xml])
select top 1000
[Clustered Index], Replace(SourceXml, 'BadUrl', 'GoodUrl')
from SourceTable A
left join FixedXml B
on A.[ClusteredIndex] = B.[ClusteredIndex]
where B.[ClusteredIndex] is null
end
[?][?]
Once you've got a fixed version of the xml, update the original table and drop this fixedxml table.
here compressed_xml is just gzip of xml and message_digest is the (binary) md5: this can be verified by dumping the blobs to files with select ... into dumpfile ... and examining those files.
So after suitably sed-ing my dumped out xml, I could recreate the compressed_xml and message_digest with gzip -c and openssl dgst -md5 -binary respectively and then slurp back in with an update ... LOAD_FILE('') -- for some reason I had to temporarily set SElinux to permissive before that last step would work.
A similar surgery is possible upon meta_template__xml__dm.
After wiping the mod_perl cached registry and giving httpd a kick, amazingly this fixed my problem. I'm not suggesting anyone out there attempt this -- but sometimes you need to nuke from orbit.
Okay, I understand what you mean by compressed XML now. Please do not alter the meta_* tables yourself, but use MartEditor to load the data from these tables instead. You can just connect with MartEditor to that mart database and it will automatically retrieve the data from those tables. MartEditor also writes your modified config back into those tables.
Hi Joachim: thanks for responding.
The compressed xml lives in a table 'meta_confxmldm' as field three, 'compressedxml' of type 'longblob'. Field two is 'xml', i.e. uncompressed xml -- also a 'longblob'.
But OK -- I'll follow the suggestions here and refrain from hand hacking the raw SQL.
Hi Joachim: thanks for responding. The compressed xml lives in a table 'meta_confxmldm' as field three, 'compressedxml' of type 'longblob'. Field two is 'xml', i.e. uncompressed xml -- also a 'longblob'. The fourth field is actually a hash of either field two or three (or both?). But OK -- I'll follow the suggestions here and refrain from hand hacking the raw SQL. – jfsebastian 0 secs ago