hello, I have an bacterial genome with an average read length of 100bp, sequenced from Illumina platform. I want to assemble this genome. It would be really nice if you would help me with some queries...
Will SPADES be a usefull assembler for such low read length ?
How can I select an reference genome for this bacterial genome ?
How can I calculate the coverage of the genome ?
SPAdes doesn't take a reference as its a de novo assembler, though you could provide a file of trusted contigs if you wanted to. You might however want to check read quality and coverage in which case you may well need a reference genome. As for "how" you choose it, you simply download the genome sequence you expect to be closest to your strain of your bacteria. E.g. if you had sequenced the common lab E. coli strain DH5a, you could just download the genome from NCBI and align your reads against it to find out where your E. coli sequence is different. If your bacteria has never been sequenced before though, you can't get a reference genome for it (obviously). Qualimap is my favourite tool for estimating genome coverage and assembly stats etc, but it will require you to create a .bam file first, so you'll need to use a read aligner like bwa or bowtie2 etc.
I have done its assembly by MIRA and that too by denovo, but I am having a problem over there.
After running the cmd -> "mira manifest.conf >&log_assembly.txt", I am not getting any results/contigs files in the projectname_d_results.
With MiSeq data (2x250 or 2x300), probably the best assembly you will get is A5_MiSeq. I believe, though I have not tested, it will do a good job with shorter reads. Its log output is really rich in information, including final assembly average coverage.
SPAdes will do a fine job as well.
A google search on "genome coverage calculator" would lead you to this page...
My data is sequenced with HiSeq(2x100), I tried out SPADES but the number of contigs I am getting in beyond 4000 (with default kmer). Hence, I am trying out MIRA.
Please have look first here: Best software to assemble bacterial genomes
SPAdes doesn't take a reference as its a de novo assembler, though you could provide a file of trusted contigs if you wanted to. You might however want to check read quality and coverage in which case you may well need a reference genome. As for "how" you choose it, you simply download the genome sequence you expect to be closest to your strain of your bacteria. E.g. if you had sequenced the common lab E. coli strain DH5a, you could just download the genome from NCBI and align your reads against it to find out where your E. coli sequence is different. If your bacteria has never been sequenced before though, you can't get a reference genome for it (obviously).
Qualimap
is my favourite tool for estimating genome coverage and assembly stats etc, but it will require you to create a.bam
file first, so you'll need to use a read aligner likebwa
orbowtie2
etc.I have done its assembly by MIRA and that too by denovo, but I am having a problem over there. After running the cmd -> "mira manifest.conf >&log_assembly.txt", I am not getting any results/contigs files in the projectname_d_results.
Where I am going wrong ? please help ...
Is MIRA giving you any error messages? Those can usually help in figuring out what may be going wrong.
It is not giving any error message