I'm trying to run the example here: https://github.com/aldro61/kover2_paper/tree/master/data to reproduce the paper's results.
I'm stuck at step 3 with command:
kover dataset create from-tsv --genomic-data genome_paths.tsv --phenotype-description "Kanamycin resistance" --phenotype-metadata 'mycobacterium tuberculosis'/kanamycin/metadata.tsv --output example2.kover --progress
I'm getting an error:
(kover) T00057442@radius:~/RA/kover/kover2_paper/data$ \kover dataset create from-contigs --genomic-data genome_paths.tsv --phenotype-description "Kanamycin resistance" --phenotype-metadata 'mycobacterium tuberculosis'/kanamycin/metadata.tsv --output example2.kover --progress
Traceback (most recent call last):
File "/home/T00057442/RA/kover/kover/bin/kover", line 1192, in <module>
CommandLineInterface()
File "/home/T00057442/RA/kover/kover/bin/kover", line 1150, in __init__
getattr(self, args.command)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 1170, in dataset
getattr(dataset_tool, args.command)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 250, in create
getattr(creation_tool, args.datasource)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 159, in from_contigs
progress=args.progress)
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/site-packages/kover/dataset/create.py", line 371, in from_contigs
progress=progress)
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/site-packages/kover/dataset/tools/kmer_count.py", line 37, in contigs_count_kmers
"-progress", str(progress)])
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/subprocess.py", line 172, in call
return Popen(*popenargs, **kwargs).wait()
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/subprocess.py", line 394, in __init__
errread, errwrite)
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I'm not sure what files they are referring to but in step 2, genome_paths.tsv was not created in genomes_dir but rather in the directory above it so I am referring to that as an argument to my command above.
I also used from-contigs
because this is what I get with the other options:
from-tsv
shows error:
Traceback (most recent call last):
File "/home/T00057442/RA/kover/kover/bin/kover", line 1192, in <module>
CommandLineInterface()
File "/home/T00057442/RA/kover/kover/bin/kover", line 1150, in __init__
getattr(self, args.command)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 1170, in dataset
getattr(dataset_tool, args.command)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 250, in create
getattr(creation_tool, args.datasource)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 97, in from_tsv
progress_callback=progress)
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/site-packages/kover/dataset/create.py", line 155, in from_tsv
kmer_count = get_kmer_count(tsv_path)
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/site-packages/kover/dataset/create.py", line 136, in get_kmer_count
raise Exception()
Exception
There are no kmer_count's in the genome_paths.tsv like in the kmer_matrix.tsv used in the tutorial here: https://aldro61.github.io/kover/doc_tut_scm.html so getting this error makes sense.
I also tried from-reads
Traceback (most recent call last):
File "/home/T00057442/RA/kover/kover/bin/kover", line 1192, in <module>
CommandLineInterface()
File "/home/T00057442/RA/kover/kover/bin/kover", line 1150, in __init__
getattr(self, args.command)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 1170, in dataset
getattr(dataset_tool, args.command)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 250, in create
getattr(creation_tool, args.datasource)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 224, in from_reads
progress=args.progress)
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/site-packages/kover/dataset/create.py", line 485, in from_reads
files = [join(reads_folder_by_genome_id[id], file) for file in listdir(reads_folder_by_genome_id[id])
OSError: [Errno 20] Not a directory: '/home/T00057442/RA/kover/kover2_paper/data/genomes_dir/1773.204.fna'
1773.204.fna looks like a fasta file so this error also makes sense.
>JKXM01000001 Mycobacterium tuberculosis strain MAL020165 adOWA-supercont1.1.C1, whole genome shotgun sequence. [Mycobacterium tuberculosis strain MAL020165 | 1773.204]
GCCGTCGCCGCCCTGGCCGCCGGCCCCGCCGTTTCCGCCGCCGCCGCCATCGCCGATGAT
GTTTTCCCCGCCCTTGCCGCCAGCCCCAGCGTTCCCGCCGGCTCCGCCACTGGCGCCGGT
GCCGCCGGGTGCAACGGCGTTGGCGCCGTTACCGCCGTTGCCGCCTTTGCCCCCGGTGTC
TGCAAAGTCGGGGGTCGCACCCTGCGCGGCGCGGGTCACGCCGTCACCGCTGAGCCCCCC
GAGCCCGCCAGCGCCGCTGAAGCCAGGATTGCCGCCGTTGCCGCCATGGCCGCCGTTGGC
ACCGGGTGCGACGGCGTTGCCGCCGGTCCCGCCGACCCCACCGTTGCCGCCTTTACCACC
GTCCTGGCCACGCTCGCCCGCGGTGGTGGCATTGGCACCCTCGGCACCACTACCACCGAG
CCCGCCGTCTGCGCCGCGGCCGCCAGTCCCACCGGCCCCGCCATTGCCGGCGAGAGTTCC
GCCGTCGCCGCCGGCGCCGCCCTGGCCGCCGTTGCCGCCGCTATTGCCTTTGCCACCGAC
TGCGCCCGAATCGCTCGCGTTCGTCCCTGCGGCGCCGTTGGCGCCGTTGCCGCCGGCGCC
GCCGTTGCCGACCAGCCCGCCATGGCCGCCGGGTCCGCCGTTGGCGCCGTTGGTGCCCGC
GGTGGTGGCGTTGGCGCCGTTGCCGCCGGCACCGCCGTTGCCGCCGCTGGTGGGGGTGGC
GCCGATGGCGCCCTGAGCGCCGGTGATGGAGCCGGCTCCGCCGGTGCCTCCGGCCCCGCC
GGTGCCGGGGTTGCCGCCGTTGCCGCCGTGACCGCCGGCACCTGCGTTGAAGGCCTGGTT
GCCGTTGGCGCCGGCTCCGCGGTCACCGCCGACGCCACCAGCGCCGCCGGTCCCGCCGGC
CCCGCCGGCGCCTTGGCCGCCCAGCAGGCTGATCAGGCCGCCGGCCCCGCCGACCCCGCC
GACCCCACCGGCACCGCCGCTACCACCGGCACCGCCGGCCTGTCCGGTGGCAATCACCAG
AGAATGGCCGCCCCCGCCGGCCCCACCGGCCCCGCCGATACCGCCGTCCCCGCCGGCCCC
GCCGGCACCGGCCAGCCAGCCACCCCGGCCCCCGGCCCCGCCATCGCCGGCGTCGCCGCC
GACCCCACCGGACGTACCGTGCGGGGACAAGTCCTCACCGGCTGCGCCGGCCACACCCTC
.
.
.
which leaves from-contigs
as a default last option which as mentioned first did not work.
genome_paths.tsv
looks like:
1448490.3 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1448490.3.fna
1773.5071 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5071.fna
1773.5072 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5072.fna
1773.5073 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5073.fna
1773.5074 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5074.fna
1773.5075 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5075.fna
1773.5076 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5076.fna
1773.5077 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5077.fna
1773.5078 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5078.fna
1773.5080 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5080.fna
1773.5081 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5081.fna
1773.5082 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5082.fna
1773.5086 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5086.fna
1773.5087 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5087.fna
1773.5089 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5089.fna
1773.5090 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5090.fna
1773.5091 /home/nobu/Desktop/BioInformatics/ResearchAssistant/kover2_paper/data/genomes_dir/1773.5091.fna
.
.
.
with index numbers starting from 1 in the very left.
Here: https://aldro61.github.io/kover/doc_dataset.html#creating-a-dataset
Reading that also makes me think from-contigs
is the correct choice.
Does anyone know what I should be doing?
I am going to use Pycharm debugger to see what is happening at create.py
but I am new to bioinformatics so any advice is appreciated.
I moved genome_paths.tsv
into genomes_dir and tried using absolute paths but still getting errors.
The command:
(kover) T00057442@radius:~/RA/kover/kover2_paper/data/genomes_dir$ kover dataset create from-contigs --genomic-data /home/T00057442/RA/kover/kover2_paper/data/genomes_dir/genome_paths.tsv --phenotype-description "Kanamycin resistance" --phenotype-metadata /home/T00057442/RA/kover/kover2_paper/data/'mycobacterium tuberculosis'/kanamycin/metadata.tsv --output example2.kover --progress
and the error:
Traceback (most recent call last):
File "/home/T00057442/RA/kover/kover/bin/kover", line 1192, in <module>
CommandLineInterface()
File "/home/T00057442/RA/kover/kover/bin/kover", line 1150, in __init__
getattr(self, args.command)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 1170, in dataset
getattr(dataset_tool, args.command)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 250, in create
getattr(creation_tool, args.datasource)()
File "/home/T00057442/RA/kover/kover/bin/kover", line 159, in from_contigs
progress=args.progress)
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/site-packages/kover/dataset/create.py", line 371, in from_contigs
progress=progress)
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/site-packages/kover/dataset/tools/kmer_count.py", line 37, in contigs_count_kmers
"-progress", str(progress)])
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/subprocess.py", line 172, in call
return Popen(*popenargs, **kwargs).wait()
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/subprocess.py", line 394, in __init__
errread, errwrite)
File "/home/T00057442/anaconda3/envs/kover/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Edit: Ran it in Pycharm, I don't think it is a Python issue/bug so I am stuck again. Any help or suggestion is welcome.