Hi,
I have run Abyss-pe (v2.2.4) with different kmers (31-69) using illumina reads (2*150). But all of them have identical contigs.fa and scaffold.fa. Is this normal?
Thanks for your help.
Joy
abyss-pe command:
nohup abyss-pe k=51 v=-v name=x32 in='../trimmed/Xylem32_R1_trimmed.fq ../trimmed/Xylem32_R2_trimmed.fq' &> a32.51.oe &
abyss-fac:
abyss-fac x32-unitigs.fa x32-contigs.fa x32-scaffolds.fa |tee x32-stats.tab
n n:500 L50 min N75 N50 N25 E-size max sum name
2404872 9439 3962 500 536 585 693 683 6526 5884968 x32-unitigs.fa
2404832 9431 3957 500 536 585 694 687 6526 5888336 x32-contigs.fa
2404832 9431 3957 500 536 585 694 687 6526 5888336 x32-scaffolds.fa
abyss-map :
abyss-map -v -j2 -l40 ../trimmed/Xylem32_R1_trimmed.fq ../trimmed/Xylem32_R2_trimmed.fq x32-6.fa \
|abyss-fixmate -v -l40 -h x32-6.hist \
|sort -snk3 -k4 \
|DistanceEst -v --dot --median -j2 -k51 -l40 -s1000 -n10 -o x32-6.dist.dot x32-6.hist
Reading from standard input...
Reading `x32-6.fa'...
Using 202 MB of memory and 83.9 B/sequence.
Reading `x32-6.fa'...
Building the suffix array...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
Read 286 MB in 2404832 contigs.
Using 2.71 GB of memory and 9.49 B/bp.
Read 1000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 2000000 alignments. Hash load: 2 / 4 = 0.5 using 369 kB.
Read 3000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 4000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 5000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 6000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 7000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 8000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 9000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 10000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 11000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 12000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 13000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 14000000 alignments. Hash load: 2 / 4 = 0.5 using 369 kB.
Read 15000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 16000000 alignments. Hash load: 2 / 4 = 0.5 using 369 kB.
Read 17000000 alignments. Hash load: 2 / 4 = 0.5 using 369 kB.
Mapped 15254206 of 17394198 reads (87.7%)
Mapped 13611151 of 17394198 reads uniquely (78.3%)
Read 17394198 alignments
Mateless 0
Unaligned 683759 7.86%
Singleton 772474 8.88%
FR 2758829 31.7%
RF 61 0.000701%
FF 21 0.000241%
Different 4481955 51.5%
Total 8697099
Usually this is not normal behaviour indeed (though it can happen).
before I can give a conclusive answer: could you post the complete run log of the abyss pipeline?
and do you mean each Kmer gave the same result or for each kmer the contig and scaffold gave the same result?
Can we correctly assume you're working with genome data btw? (and thus not transcriptome?)
Thanks for your reply. The complete run log is too many words to post. I don't know how to do this. But I also posted the question on google groups. That platform can carry files ([https://groups.google.com/forum/#!topic/abyss-users/SyTgYAj_iDU]). Different Kmer gave different result. But all of the results had same contigs and scaffolds. The data is genome data captured by probe designed by transcriptome.
I see, and I had a look at the google group post as well.
what Lauren mentioned there is exactly what I was referring to as well. (and would also have been my suggestion).
Concerning your data: so this is not a full genome WGS dataset? but some captured data? if so, it's not surprising to have such low stats. For an average conifer genome (and I do have quite some experience in that) the assembly result is very very small, like 1000 - 5000 times too small.
can you confirm again that you are doing genome assembly and not transcriptome assembly?
I'm not sure what you mean about genome assembly you mentioned.Does it mean that this sequence is used to assemble whole genomes? I didn't have the budget to do whole genome assembly. The data are sequenced from reduced representation libraries. The libraries are gDNA and captured by probes. I assembly the sequence for calling variant. Thank you for your advice and help.
So you have a reference genome?