Hello, I would like to know which is the simplest method to add the names of the chromosomes from D. melanogaster for each contig from a fasta a file of a draft assembly obtained with the Canu application. What tools should I use in order to map the contigs to the reference genome and to obtain a list of the chromosomes associated with each contig ? Then, in which way I could print the name of the chromosome and add it to the headers from the fasta file with contigs. Thank you !
Thank you for your answer! It would be wrong just to BLAST all the contigs from the draft assembly against each reference chromosome of D.melanogaster and to assign the chromosomes for the contigs that show significant hits ?
How many contigs do you have? If the contigs are ~2x number of Drosophila chromosomes (meaning you have long/large contigs) then using a long read aligner like
minimap2
or even BLAT may be better then BLAST.If the number of contigs was so small I would manually check each alignment and edit the header of each contig,inserting the corresponding chromosome name,but my draft assembly is pretty fragmented, containing around 3K contigs with a reference genome coverage of ~95%.
I would still recommend using
minimap2
and aligning against reference. It should go pretty fast. You can either work with PAF (default) format or make BAM file to work with other tools/visualize.For a pure contig to Dmel chromosome BLAST should be OK.
I am fan of LAST because it is fairly easy to convert MAF to SAM/BAM and then view the alignments in IGV. This is not always needed, but "nice-to-have" if you want to verify something, like check scaffolding or resolve bad joins between you contigs.
edit:
See GenoMax answer. minimap2 should be better than BLAST