What web sites or programs could I use to calculate genome side using the sequencings of the reads obtained from NGS systems like Illumina, Ion Torrent? and what is the necessary sequencing coverage to make a RNA-Seq experiment?
What web sites or programs could I use to calculate genome side using the sequencings of the reads obtained from NGS systems like Illumina, Ion Torrent? and what is the necessary sequencing coverage to make a RNA-Seq experiment?
You can use BBTools's kmercountexact program for this purpose, as outlined here.
As for min coverage in RNA-seq, that depends on the goal of your experiment.
Look at papers which explains coverage Vs Replicates issue. They might give you an idea. For eg http://www.nature.com/nrg/journal/v15/n2/full/nrg3642.html
This post also could help you.
About this approach, what if the Genome is heterozygous tetraploid, and the k-mer graph shows 3 peaks, in a 1:2:4 ratio? Assumin the 3rd peak is the homozygous regions in single copy, would it be correct to count genome size as:
3rd_peak(nr of kmers) + 0.5*2nd_peak(nr of kmers) + 0.25(nr of kmers)?
Does BBTools handle these genomes?
If the genome is tetraploid with 1:2:4-ratio primary peaks, then yes, the 3rd peak should be single-copy het content, so your math is correct. I have not looked at a tetraploid with this program but I would expect that if there was a 1-copy peak, there should also be a 3-copy peak of similar magnitude, so you'd probably end up with 4 peaks, with the 3rd needing a 0.75 multiplier.
BBTools does handle this, with one caveat - the peak detection is not very sophisticated so it may lump smaller peaks together. It generally identifies the two to four most prominent peaks correctly, but if there are more than that you may need to do some manual labor on the kmer frequency histogram for a precise estimate. I developed it primarily for microbes and fungi, which are haploid and haploid or diploid; I may put in more robust peak modeling later.
Also note that if you are working on a large dataset for a large genome, you can reduce memory consumption by enabling a bloom filter for error kmers with the "prefilter" flag. That's not necessary for microbes or most fungi which tend to be much smaller than plants and animals.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
'side' ? do you mean 'strand' ?
May be its "size"
That's probably the best bet.
Sorry!! It is a mistake. 'Genome Size'. Thank you so much!