Question

No could not fit >15 [Assemblied by SOAPdenovo]

0

Entering edit mode

10.9 years ago

margxenscienculo ▴ 50

Hi folks. I am trying to use your software KmerGenie. I have some problems. Because it does not draw the histograms at >15 value. It is hilarious when i put without --diploid command. It could tell me the best k but without doing at >15. I tried from 15 to 55 because at more k (to121K) it is more slower (1h30min) at the result is the same "could not fit".

sudo ./kmergenie /home/soba/out_WGS_k25_quitado.fasta --diploid  -l 15 -k 55  -o prueba
running histogram estimation
Linear estimation: ~15 M distinct 38-mers are in the reads
K-mer sampling: 1/2
|

processing                                                                                         |
[going to estimate histograms for values of k: 55 45 35 25 15
---------------------------------------------------------------------------------Total time Wallclock  34.736 s
fitting model to histograms to estimate best k
could not fit prueba-k15.histo
could not fit prueba-k35.histo
could not fit prueba-k25.histo
could not fit prueba-k55.histo
could not fit prueba-k45.histo
could not predict a best k value
Execution of decide failed (return code 0)

sudo ./kmergenie /home/soba/Documentos/out_WGS_k25_quitado.fasta   -l 15 -k 55  -o prueba
running histogram estimation
Linear estimation: ~15 M distinct 38-mers are in the reads
K-mer sampling: 1/2
| processing                                                                                         |
[going to estimate histograms for values of k: 55 45 35 25 15
---------------------------------------------------------------------------------Total time Wallclock  35.6302 s
fitting model to histograms to estimate best k
could not fit prueba-k25.histo
could not fit prueba-k35.histo
could not fit prueba-k45.histo
could not fit prueba-k55.histo
estimation of the best k so far: 15
refining estimation around [15; 21], with a step of 2
running histogram estimation
Linear estimation: ~19 M distinct 21-mers are in the reads
K-mer sampling: 1/3
| processing                                                                                         |
[going to estimate histograms for values of k: 21 19 17 15
---------------------------------------------------------------------------------Total time Wallclock  22.5893 s
fitting model to histograms to estimate best k
could not fit prueba-k19.histo
could not fit prueba-k17.histo
could not fit prueba-k21.histo
could not fit prueba-k25.histo
could not fit prueba-k45.histo
could not fit prueba-k35.histo
could not fit prueba-k55.histo
table of predicted num. of genomic k-mers: prueba.dat
recommended coverage cut-off for best k: 18
best k: 15

KmerGenie • 5.4k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 10.9 years ago by margxenscienculo ▴ 50

3

Entering edit mode

Just a note: try not to run programs as root (e.g. do not use sudo).

ADD REPLY • link 10.9 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

Ok. Now it showed the line was failed.

./kmergenie /home/soba/Documentos/out_WGS_k25_quitado.fasta   -l 15 -k 55  -o prueba
running histogram estimation
Linear estimation: ~15 M distinct 38-mers are in the reads
K-mer sampling: 1/2
| processing                                                                                         |
[going to estimate histograms for values of k: 55 45 35 25 15
---------------------------------------------------------------------------------Total time Wallclock  33.6264 s
fitting model to histograms to estimate best k
could not fit prueba-k25.histo
could not fit prueba-k35.histo
could not fit prueba-k45.histo
could not fit prueba-k55.histo
Traceback (most recent call last):
  File "/home/soba/Documentos/Ensambladores/kmergenie-1.6663/scripts/decide", line 115, in <module>
    generate_report(histograms_prefix, best_k, max_genomic_kmers, None, is_diploid)
  File "/home/soba/Documentos/Ensambladores/kmergenie-1.6663/scripts/generate_report.py", line 170, in generate_report
    with open(output,'w') as f:
IOError: [Errno 13] Permission denied: 'prueba_report.html'
Execution of decide failed (return code 1)

./kmergenie /home/soba/Documentos/out_WGS_k25_quitado.fasta --diploid  -l 15 -k 55  -o prueba
running histogram estimation
Linear estimation: ~15 M distinct 38-mers are in the reads
K-mer sampling: 1/2
| processing                                                                                         |
[going to estimate histograms for values of k: 55 45 35 25 15
---------------------------------------------------------------------------------Total time Wallclock  35.2915 s
fitting model to histograms to estimate best k
could not fit prueba-k25.histocould not fit prueba-k35.histo

could not fit prueba-k15.histo
could not fit prueba-k55.histo
could not fit prueba-k45.histo
cannot generate histogram for file prueba-k15.histo stdout:  ; stderr: Error en if (zp.copy <= 1 | shape.e <= 0 | scale.e <= 0 | sd.v <= 0 |  :
  valor ausente donde TRUE/FALSE es necesario
Calls: source -> withVisible -> eval -> eval -> model
Ejecución interrumpida

Traceback (most recent call last):
  File "/home/soba/Documentos/Ensambladores/kmergenie-1.6663/scripts/decide", line 115, in <module>
    generate_report(histograms_prefix, best_k, max_genomic_kmers, None, is_diploid)
  File "/home/soba/Documentos/Ensambladores/kmergenie-1.6663/scripts/generate_report.py", line 170, in generate_report
    with open(output,'w') as f:
IOError: [Errno 13] Permission denied: 'prueba_report.html'
Execution of decide failed (return code 1)

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 10.9 years ago by margxenscienculo ▴ 50

0

Entering edit mode

this one is due to a failure to overwrite the (possibly root) file prueba_report.html

ADD REPLY • link 10.9 years ago by Rayan Chikhi ★ 1.6k

0

Entering edit mode

It is me, the same. So, May it be a problem with R program? What are the correct installed packages?

$ uname -a
Linux soba 3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux

$ R
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 10.9 years ago by margxenscienculo ▴ 50

0

Entering edit mode

Thanks for reporting this issue. Could you please send the prueba_report.html file? I would like to see what the histograms that didn't fit look like. It seems to be a tiny dataset.

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 10.9 years ago by Rayan Chikhi ★ 1.6k

0

Entering edit mode

Yes, that is. It is a small dataset 25,770,300pb. I have sent email. Regards.

ADD REPLY • link 10.9 years ago by margxenscienculo ▴ 50

Ram · Answer 1 · 2014-06-16

Thanks much for sending me the histograms by email.

It appears that you data set is very low coverage. If you do not mind me posting one of the histograms here:

I wrote a paragraph below but I now realize that your problem might have a more simple explanation: did you give the reads, or the SOAPdenovo assembly, as input to Kmergenie? Only the former is correct, and I suspect that you did the latter.

Kmergenie expects to see a mixture of distributions in a k-mer histogram constructed from the reads, including one (or two in diploid mode) Gaussian distribution for genomic k-mers. In this histogram, we clearly see a large amount of low-coverage k-mers (with more or less an exponentially decreasing distribution), that Kmergenie assumes to be sequencing errors, but no clear Gaussian distribution that should have appeared at higher abundances. Hence Kmergenie was not able to fit its model to this histogram. The effect is even more significant for the larger k values you tested (25, etc..).

Ram · Answer 2 · 2014-06-17

1

Entering edit mode

10.9 years ago

5heikki 11k

I would try to address:

File "/home/soba/Documentos/Ensambladores/kmergenie-1.6663/scripts/generate_report.py", line 170, in generate_report
    with open(output,'w') as f:
IOError: [Errno 13] Permission denied: 'prueba_report.html'

By:

chown and/or chmod +x /home/soba/Documentos/Ensambladores/kmergenie-1.6663/scripts/generate_report.py

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by 5heikki 11k

Ram · Answer 3 · 2014-06-17

0

Entering edit mode

10.9 years ago

margxenscienculo ▴ 50

Ok. I failed. Now I am running with one of the two fastaq without assembly which I deleted the bad quality reads<30. After I m going to talk you how the analysis worked.

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by margxenscienculo ▴ 50

Ram · Answer 4 · 2014-06-17

Hi. I guess that the k-best at 17 it is an artifact. Also there are two peaks one at 42 and other at 72. What do you think about?

Besides I wonder why don't work with the --diploid option.

Other question. I have the genome in two files. Have I to do the analysis with every file? Or I have to collapse both in a single file.

I appreciate much your advice for a newcomer.

./kmergenie /home/soba/Documentos/Ensambladores/mirdeep2/tutorial_dir/SRR653456_2_e.fastq --diploid  -l 15 -k 55  -o prueba2
running histogram estimation
Linear estimation: ~240 M distinct 38-mers are in the reads
K-mer sampling: 1/45
| processing                                                                                         |
[going to estimate histograms for values of k: 55 45 35 25 15
-----------------------------------------------------------------------------------------------------------------------Total time Wallclock  289.346 s
fitting model to histograms to estimate best k
could not fit prueba2-k15.histo
could not fit prueba2-k25.histo
could not fit prueba2-k35.histo
could not fit prueba2-k55.histo
could not fit prueba2-k45.histo
could not predict a best k value
Execution of decide failed (return code 0)

Here the success trial.

./kmergenie /home/soba/Documentos/Ensambladores/mirdeep2/tutorial_dir/SRR653456_2_e.fastq   -l 15   -o prueba2

running histogram estimation
Linear estimation: ~191 M distinct 51-mers are in the reads
K-mer sampling: 1/36
| processing                                                                                         |
[going to estimate histograms for values of k: 81 71 61 51 41 31 21
-----------------------------------------------------------------------------------------------------------------------Total time Wallclock  396.958 s
fitting model to histograms to estimate best k
estimation of the best k so far: 21
refining estimation around [15; 27], with a step of 2
running histogram estimation
Linear estimation: ~296 M distinct 24-mers are in the reads
K-mer sampling: 1/56
| processing                                                                                         |
[going to estimate histograms for values of k: 27 25 23 21 19 17 15
-----------------------------------------------------------------------------------------------------------------------Total time Wallclock  444.631 s
fitting model to histograms to estimate best k
table of predicted num. of genomic k-mers: prueba2.dat
recommended coverage cut-off for best k: 6
best k: 17

Ram · Answer 5 · 2014-06-19

You are right. The size is around 25.000kpb. I run the test with -l 15 and -l 29 with a unique merged file. And the second trial is said that the best k is 35.

TRIAL 1

./kmergenie /home/soba/Documentos/Ensambladores/mirdeep2/tutorial_dir/list_filesTY.fastq  -l 15   -o trial_l15
running histogram estimation
Linear estimation: ~379 M distinct 51-mers are in the reads
K-mer sampling: 1/72
| processing                                                                                         |
[going to estimate histograms for values of k: 81 71 61 51 41 31 21
-----------------------------------------------------------------------------------------------------------------------Total time Wallclock  815.1 s
fitting model to histograms to estimate best k
estimation of the best k so far: 21
refining estimation around [15; 27], with a step of 2
running histogram estimation
Linear estimation: ~587 M distinct 24-mers are in the reads
K-mer sampling: 1/112
| processing                                                                                         |
[going to estimate histograms for values of k: 27 25 23 21 19 17 15
-----------------------------------------------------------------------------------------------------------------------Total time Wallclock  895.178 s
fitting model to histograms to estimate best k
table of predicted num. of genomic k-mers: trial_l15.dat
recommended coverage cut-off for best k: 14
best k: 15

-l 15

-l 29 TRIAL 2

./kmergenie /home/soba/Documentos/Ensambladores/mirdeep2/tutorial_dir/list_filesTY.fastq  -l 29   -o prueba5
running histogram estimation
Linear estimation: ~379 M distinct 51-mers are in the reads
K-mer sampling: 1/72
| processing                                                                                         |
[going to estimate histograms for values of k: 81 71 61 51 41 31
-----------------------------------------------------------------------------------------------------------------------Total time Wallclock  504.639 s
fitting model to histograms to estimate best k
estimation of the best k so far: 31
refining estimation around [29; 37], with a step of 2
running histogram estimation
Linear estimation: ~546 M distinct 29-mers are in the reads
K-mer sampling: 1/104
| processing                                                                                         |
[going to estimate histograms for values of k: 37 35 33 31 29
-----------------------------------------------------------------------------------------------------------------------Total time Wallclock  527.335 s
fitting model to histograms to estimate best k
table of predicted num. of genomic k-mers: prueba5.dat
recommended coverage cut-off for best k: 12
best k: 35