I am trying to find a parser for the metadata generated by Illumina Novaseq (the .bin files generated under the interop directory). Eventually, I am trying to parse parameters such as the percentage of clusters passing filter, the percentage of reads with a Q score >=30, error rate and etc.
I have found an R package called savR, but I am afraid it does not handle the output from Novaseq files correctly.
Other than that I have found an InterOP python library. They claim it supports Novaseq output data as well, but I haven't tried yet, have only tried with their own Miseq example files.
This library appears to do the job, but I cannot find a proper documentation for it. They do show a few examples of its functionalities on their github page, yet I cannot find a full documentation with all the library functionalities.
Other than these two libraries/packages, would anyone recommend any other parser for Illumina Novaseq output data?
In fact those commands work, though they are shell commands. Or do they have anything to do with the python module that I installed (interop)? Also, the synthax for calling interop_summary, for instance, is:
At work we have another server in which anaconda is not installed, so that we installed the interop library with pip install interop. Although we managed to import the interop module within python, the same shell commands that you showed (and did the job for me) could not be called from the shell. Any idea of how/ if I can get the shell commands (e.g. interop_summary) without conda install -c bioconda illumina-interop, or only with pip install ? Thanks again
You can use sequence analysis viewerfrom Illumina (Note: Windows only), if you have access to InterOp folder and .xml files from the original NovaSeq data folder. This is a view-only option.
If you are looking for programmatic means to parse this information then Illumina has a set of c++ libraries on their GitHub site. Note: Illumina does not provide technical support for their open source software.
Library I linked above is c++. It specifically notes that it supports NovaSeq and all other Illumina sequencers (except oldest GA).
You could also parse summary files that can be found in a processed NovaSeq flowcell in FCID/Unaligned/Stats if you are looking to populate this information in a user/LIMS-like application.
In fact those commands work, though they are shell commands. Or do they have anything to do with the python module that I installed (interop)? Also, the synthax for calling
interop_summary
, for instance, is:interop_summary run_folder > path/to/my/output
(sourceinterop_summary -help
)Just fixed my typo in example - thanks. Yes those are shell command as alternative to parsing SAV data.
great, thanks very much, that does exactly what I wanted.
Glad to help you (I was working on the same task last week :-))!!!
At work we have another server in which anaconda is not installed, so that we installed the interop library with
pip install interop
. Although we managed to import the interop module within python, the same shell commands that you showed (and did the job for me) could not be called from the shell. Any idea of how/ if I can get the shell commands (e.g.interop_summary
) withoutconda install -c bioconda illumina-interop
, or only withpip install
? Thanks again