Hi!
Just I want to know what are the differences between the two outputs files, scaffolds.fasta and contigs.fasta in the assembler spades.py.
The manual don't provide any useful information at all. Please I don't want the definition of a conting or a scaffold, what I trying to understand is the technical part behind the spades algorithm, for example maybe spades.py is assuming that I am using any Illumina 1.9 sequencing machine but that's not enough information to determine the scaffolds or is it? How a scaffold is determined in the De Bruijn graph versus a conting? How spades.py knows the insert size of the specific protocol used to sequence, or just assumes one?
Help I am a little lost.
Any information will be useful.
Here is the manual's link http://spades.bioinf.spbau.ru/release3.6.2/manual.html#sec3.5
Because scaffolds and contigs are terms used by most assemblers. A scaffold is a construct of multiple contigs bridged by poly N characters. What it means is that a region between contigs cannot be resolved by the assembler despite knowing the orientation of 2 contigs relative to each other.
can you explain the contigs orientation?
have an example please.
Thanks
Try this http://genome.jgi.doe.gov/help/scaffolds.html
thanks.
how we know the distance between the two paired reads?
SPAdes uses a k-bimer approach to estimate distances.
https://en.wikipedia.org/wiki/SPAdes_(software)
There is a range of papers describing various parts of the SPAdes algorithm (PMID: 22506599, PMID: 24931996, PMID: 26040456).
You can map your paired reads to a reference, like an assembled genome, and create a bam file. Then you can use CollectInsertSizeMetric from picard tools to estimate distance between two paired ends.