Hello Everyone!!!!
I am totally new to RNA seq denovo analysis.
My question is for deciding kmer size what length should be preferable i mean whether to use shorter kmer size or longer ? What will be its demerit and merit for using shorter or longer kmer size?
The "optimal" kmer size length topic have been widely discussed across several posts. I would recommend you to do a quick search to get in contact to de novo transcriptome assembly issues.
Speaking about the Kmer size, generally, a good choice is to set it between half to 2/3rd of the read length. A too small length will lead to high amount of short contigs, most of them partial length assemblies, while if you choose a longer size, will result in few long contigs.
Yo may need to perform several trial runs with different kmer lengths, and select the best one according to several statistics such as N50, total scaffold length... etc.
Thanks for the help iraun!!!!!!!!!!!!!!
I agree , but some of the paper says shorter the kmer will result in number of ambiguities repeat . How is this possible i mean i need some basic answer to understand. Basic in the sence what exactly happens to contigs when i use kmer shorter or longer
As iarun pointed out that kmer size is 1/2 to 2/3 the length of the read. I would say why not try a dry test of running with 3-4 different kmer intervals and see it yourself. It is important to do what iarun said. You can assemble the with Trinity and then use the same kmer intervals with Salmon/Sailfish to index the assembly and see how the quantification/transcript abundance are changing are running the quantification. I recommend salmon since they are quite fast and light-weight bases and one can do without alignment mode so you can do this assessment pretty fast.
I think the kmer size for jellyfish is set at 25 for Jellyfish during Trinity. I don't think you can change this, unless you can add the parameter to new versions of the program.
Thanks for the help iraun!!!!!!!!!!!!!! I agree , but some of the paper says shorter the kmer will result in number of ambiguities repeat . How is this possible i mean i need some basic answer to understand. Basic in the sence what exactly happens to contigs when i use kmer shorter or longer
As iarun pointed out that kmer size is 1/2 to 2/3 the length of the read. I would say why not try a dry test of running with 3-4 different kmer intervals and see it yourself. It is important to do what iarun said. You can assemble the with Trinity and then use the same kmer intervals with
Salmon/Sailfish
to index the assembly and see how the quantification/transcript abundance are changing are running the quantification. I recommend salmon since they are quite fast and light-weight bases and one can do without alignment mode so you can do this assessment pretty fast.okay thanks vchris_ngs
I think the kmer size for jellyfish is set at 25 for Jellyfish during Trinity. I don't think you can change this, unless you can add the parameter to new versions of the program.
Now you can, but only to a maximum of k=32 (they are limited to this number because of memory saving, as I read elsewhere).