Contigs to chromosomes annotation
1
2
Entering edit mode
18 months ago

Hello, I would like to know which is the simplest method to add the names of the chromosomes from D. melanogaster for each contig from a fasta a file of a draft assembly obtained with the Canu application. What tools should I use in order to map the contigs to the reference genome and to obtain a list of the chromosomes associated with each contig ? Then, in which way I could print the name of the chromosome and add it to the headers from the fasta file with contigs. Thank you !

Assembly mapping contigs • 1.7k views
ADD COMMENT
1
Entering edit mode
17 months ago
Darked89 4.7k

Not sure if it is the easiest, but you can map your new assembly to i.e. Dmel genome using LAST:

With a draft assembly you will get several contigs mapping to the same chromosome. More importantly, without an almost perfect syntheny expect that some of your draft assembly contigs will map to different chromosomes. It is up to you to decide if you assign your contig to just top hit Dmel chromosome, take 2-3 top hits etc.

Certainly it would make sense to look for assembly artefacts/broken syntheny.

ADD COMMENT
0
Entering edit mode

Thank you for your answer! It would be wrong just to BLAST all the contigs from the draft assembly against each reference chromosome of D.melanogaster and to assign the chromosomes for the contigs that show significant hits ?

ADD REPLY
2
Entering edit mode

How many contigs do you have? If the contigs are ~2x number of Drosophila chromosomes (meaning you have long/large contigs) then using a long read aligner like minimap2 or even BLAT may be better then BLAST.

ADD REPLY
0
Entering edit mode

If the number of contigs was so small I would manually check each alignment and edit the header of each contig,inserting the corresponding chromosome name,but my draft assembly is pretty fragmented, containing around 3K contigs with a reference genome coverage of ~95%.

ADD REPLY
1
Entering edit mode

I would still recommend using minimap2 and aligning against reference. It should go pretty fast. You can either work with PAF (default) format or make BAM file to work with other tools/visualize.

ADD REPLY
1
Entering edit mode

For a pure contig to Dmel chromosome BLAST should be OK.

I am fan of LAST because it is fairly easy to convert MAF to SAM/BAM and then view the alignments in IGV. This is not always needed, but "nice-to-have" if you want to verify something, like check scaffolding or resolve bad joins between you contigs.

edit:

See GenoMax answer. minimap2 should be better than BLAST

ADD REPLY

Login before adding your answer.

Traffic: 2397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6