I need to convert some big Solid XSQ files to FASTQ files. Is there a tool that can convert directly from XSQ to FASTQ?
Sofar I only found tools to convert first to CSFASTA and .Qual and subsequently from those to FASTQ.
I need to convert some big Solid XSQ files to FASTQ files. Is there a tool that can convert directly from XSQ to FASTQ?
Sofar I only found tools to convert first to CSFASTA and .Qual and subsequently from those to FASTQ.
The XSQ format is HDF5-based, and the only added need is routine to expand the bit-packed DNA and quality strings. There is at least one alternative to the conversion tools provided by Life Technologies: ngs_plumbing.xsq (although probably not tested in all situations).
István's page on color-space formats (link in his post on this page) is remarkably comprehensive yet clear. Do have a look at it in any case if working with SOLiD data.
The XSQ is a proprietary format. Your options are limited to the conversion tool provided by ABI.
Considering that process of converting a color space representation to FASTQ may mean different types of conversions I think it is safe to assume that this is not offered directly by any tool.
If you're still looking for something, I have code that works using PyTables, an alternative to h5py. I could use the extra eyes looking at it before I released it into the wild. I wrote it based on the XSQ spec released by ABI.
My tool converts XSQ files directly to FASTQ (optionally gzipped).
PyTables is using its own customizations atop HDF5 (or so it did last time I tried). This is not bad in the absolute but a potential issue when you will want to create / edit XSQ files.
The utilities in ngs_plumbing can convert XSQ data to FASTA-like (FASTA if ECC, or CSFASTA + QUAL) and FASTQ-like (FASTQ if SOLiD's ECC, or CSFASTQ).
In addition to that one can also generate FASTQC-like reports (May be nicer - this is all HTML and javascript) directly from the XSQ.
I also wrote a new XSQ to FASTQ converter myself in Java using the HDF5 java API http://www.hdfgroup.org/hdf-java-html/ .
A XSQ is a HDF5 file, a format that is also used in other "big data" sciences . This converter is faster than al the others I tried. I guess because I use the native C++ HDF5 libraries, which are OO wrapped by the HDF Java API. All I did was writing the minimal amount of code needed in Java to traverse the file, unpack the byte array to colorspace and qual values, and sink it a fastq file.
With this converter we can convert a 63GB Solid WildFire run with 1.2 billion (50 CS bases) reads in 50 minutes to fastq. The converter can both output normal Sanger CS fastq and BWA 0.5.9 CS fastq dialect. It chuncks the output by default to 1.000.000 reads fastq files to support mapping on clusters.
To make sure the output is correct I used a couple of linux command to diff the output to the output of our old converter. The diff says the output is the same.
The project has moved to github: https://github.com/WimS83/XSQConverter
We and other people are still working on and with this converter.
That's fast, and indeed probably the fastest tool. Good to have one more around.
I noted that my tools had a problem with CSFASTQ (invalid CSFASQ because of copy/paste issue - we are using either FASTQ or CSFASTA here, this remained unnoticed). After correction, the Python utility in ngs_plumbing is clocking a litte under 3 times slower (~2.5 hours for 63Gbp) for an XSQ -> CSFASTQ conversion. Going faster would require to move a block of few lines of Python down to C in order to compete with Java's runtime (with no certainty to beat it without spending more time on it than it is worth).
Hi William,
I am getting errors while I am trying to install your XSQConverter tool. I asked about it in the issue section of GitHub repository. Could please have a look at it.
Jan 13: Hi William, I updated the issue in github. Could you please check that? I could not find your contact.
D.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Neat library. Had I known about it would not have spent the time with my own implementation. Will link it to the tutorial.
I had a similar thought when finally finding your page on the color space (and there is already a link to it from the doc).
Did anyone get ngsplumbing.xsq working? First I had to rename the string class in the ngsplumbing package because it conflicts with the Python string class. See http://stackoverflow.com/questions/5889466/attributeerror-module-object-has-no-attribute-maketrans-while-running-cprof
After that the script xsqconvert.py now exits with the message 'Error: the Python package "h5py" is required but could not be imported. Bye.' but if I look in the script wherre the exception is thrown it tries to import ngsplumbing.xsq which I cant find anywhere in the ngsplumbing package or on the system. I would really like to use this tool.
The best might be share what you are exactly doing. I have been using it to convert SR and PE data and it appeared to work. Regarding the dependency to h5py, it is really needed: XSQ is built on HDF5.
Ok it now works good and fast!
The only issue is the clash with python string class. I fixed it by renaming the NGSplumbing string class to stringNGSUtil. The second error is the results of a cascaded import of the NGSplumbing string class in the dna class that now failes. Just edit the import in the class so it imports the renamed stringNGSUtil and the xsqconvert script works.
Fixes are now in the bitbucket repository and will be included with the next release (any time between now and the end of the summer).