Hi,
I'm currently trying to create a dataset from FASTA reads for Kover using the following command:
kover dataset create from-reads --genomic-data where_read.tsv --phenotype-description "Rifampicin resistance" --phenotype-metadata pheno_read.tsv --output test --progress
where:
- where_read.tsv --> a tsv file containing the fasta file name ending in .fa, and the pathname to the folder in which the file is stored
- pheno_read.tsv --> a tsv file containing the fasta file name ending in.fa followed by either "sensitive" or "resistant".
This seems to trigger an IndexError in the from_reads file - the output of the command reading:
Traceback (most recent call last):
File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 1192, in <module>
CommandLineInterface()
File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 1150, in __init__
getattr(self, args.command)()
File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 1170, in dataset
getattr(dataset_tool, args.command)()
File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 250, in create
getattr(creation_tool, args.datasource)()
File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 224, in from_reads
progress=args.progress)
File "/home/yusuf/.local/lib/python2.7/site-packages/kover/dataset/create.py", line 488, in from_reads
list_reads_dsk_output.append(join(temp_dir, basename(splitext(files[-1])[0]) + ".h5"))
IndexError: list index out of range
Any thoughts on what's causing it?
Thanks,
Yusuf
Hi Alex,
Thanks for your previous post - I've managed to produce a "test" dataset by using "contigs" instead of "reads" and correcting the where_read file! However, I've now run into a different set of problems: The creation of the dataset produces an HDF5 output error (apologies for the image, If I were to include the text of the error message I'd go over the Biostars comment word limit): https://imgur.com/a/oFHKxWE
The test file is capable of retrieving information (using kover dataset info) like genome-count and genome-ids. However, when asked for a kmer-count it produces the following error
Any ideas on how to fix it? I really do appreciate you taking the time out to help :)
Thanks,
Yusuf
Hi M having the same problem? Did you get any solution to the same.
Hi Yusuf, sorry for the delayed response.
It looks like the dataset file was not created correctly or is corrupted. I've seen this happen before when the DSK (the tool used for k-mer counting) was not compiled properly during the installation process.
Can you show me the output of
h5ls test
? This will list the HDF5 datasets contained in the file. If there is nokmer_sequences
dataset in that HDF5 file, then something went wrong in the dataset creation.If that dataset is not there, can you try reinstalling kover and sending me the install log (displayed in the terminal during installation)? Also, if you're familiar with Docker, you can try using our prebuilt image that contains a working installation of kover (https://hub.docker.com/r/aldro61/kover).
Another way to figure out what's going on is to download the example dataset provided here and try running the
kover dataset info
commands. If it works on that dataset and not on yours, then there is an issue with your installation of Kover.Keep me posted!
Cheers, Alex
Thank you for your response. My problem is solved now. I had two genomes for which data was not there in the generated file, as soon as i removed those blank file, program runs well.
@yusuf.sgs, did this resolve your issue? You can also reach out by email if you need further assistance.
Cheers, Alex
Hi Alex,
Sorry for the late reply - I've been bogged down with exams. I've decided to use the docker version of Kover instead, though this is still presenting problems. I still get the same HDF5 errors when I create and run dataset info kmer-count on my dataset. The h5ls test gives me this output:
genome_identifiers Dataset {3} kmer_by_matrix_column Dataset {986} kmer_matrix Dataset {1, 986} phenotype Dataset {3} phenotype_tags Dataset {2}
with kmer_sequences absent from the output. The kover dataset info commands seem to work on the example dataset you provided though, and since I'm using the docker installation it can't be an issue with the installation. I've sent a link to a google drive folder containing the exact files I'm using to create the dataset - it also contains a .txt file called 'kover command' which details the exact command I'm running in the terminal https://drive.google.com/open?id=1-Dc1SkaSq_YypS1F8JEjlvsa_hTFlreE Thanks again for helping out!
Yusuf
Hi I have created a dataset using contigs but having the same problem as mention above.Can you please tell us where we went wrong..