Question

Visualizing contig alignment

0

Entering edit mode

6.9 years ago

arunprasanna83 ▴ 60

I have two genome assemblies created using different platforms for same species. I am doing some post analysis to check the performance of assemblers. For this, I have the following:

Contig files (multi-fasta) for reference (910 MB) and query (510 MB)
Blast hits (tabular format) for reference vs query.

Here, I would like to make a circular track for Reference genome and see how the hits of query assembly distribute. Please suggest the possible methods to do it. I looked for UCSC genome browser, where it displays alignment for pre-defined set of organism. My organism is not in their list.

Kindly help.

Thanks in Advance.

Assembly alignment genome blast • 4.8k views

ADD COMMENT • link 6.9 years ago by arunprasanna83 ▴ 60

1

Entering edit mode

Circos might be an option or D-genies

ADD REPLY • link 6.9 years ago by lieven.sterck 15k

0

Entering edit mode

D-genies crashes while I upload the data in .gz format.

ADD REPLY • link 6.9 years ago by arunprasanna83 ▴ 60

0

Entering edit mode

that's a pitty.

I'm not on the development team myself ;-) but I know some of them, I'll pass on the message. But don't let that hold you back to get in touch with them (== send bug report) yourself.

ADD REPLY • link 6.9 years ago by lieven.sterck 15k

1

Entering edit mode

Thanks @lieven.sterck. I did send them an email 3 weeks back. But did not get any response !. So moved on to explore other options.

ADD REPLY • link 6.9 years ago by arunprasanna83 ▴ 60

0

Entering edit mode

There is a bigger problem here. Most of these programs crash due to large file size. i.e File 1 has 44k contigs (510MB) and File 2 has 5900 contigs (910 MB). To make it easier, I tried to merged respective files into single long sequence. But still, the programs crashes ! Any solutions to handle these big data ?

ADD REPLY • link 6.9 years ago by arunprasanna83 ▴ 60

0

Entering edit mode

Are you sure the problem is not the hardware you have access to?

ADD REPLY • link 6.9 years ago by GenoMax 152k

0

Entering edit mode

Sure ! I am doing it in my linux workstation with 150 GB RAM and 40 Cores.

ADD REPLY • link 6.9 years ago by arunprasanna83 ▴ 60

0

Entering edit mode

6.9 years ago

harish ▴ 470

If you have mummer alignments, you can probably use Circlize or AliTV.

Alternatively, you can also use Gepard or minidot etc to view dot-plot alignments.

ADD COMMENT • link 6.9 years ago by harish ▴ 470

0

Entering edit mode

Will try Circlize or AliTv. Because, I tried both Gepard & minidot. Both failed due to large file size !

ADD REPLY • link 6.9 years ago by arunprasanna83 ▴ 60

0

Entering edit mode

As I mentioned in comments above, AliTV is also working fine for smaller files. But for bigger files it is taking forever. I guess there is no parallelization option.

ADD REPLY • link 6.8 years ago by arunprasanna83 ▴ 60

score 2 · Accepted Answer · 2018-09-02

2

Entering edit mode

6.9 years ago

Philipp Bayer 8.8k

Are dotplots not an option?

Minidot is very fast, but in my experience doesn't work that great with 'bad' assemblies: https://github.com/thackl/minidot

Symap is a bit older and slower, but works better with low quality assemblies (again, in my experience), and can also make a circular plot http://www.agcol.arizona.edu/software/symap/

Circos should be able to take both output files with some fiddling. You should be able to take minidot's or symap's alignment output files, write a tiny parser to turn them into the tabular format circos wants, and then run bundlelinks on that.

ADD COMMENT • link 6.9 years ago by Philipp Bayer 8.8k

0

Entering edit mode

Thanks ! Out of these, symap is doing a neat job !.

ADD REPLY • link 6.9 years ago by arunprasanna83 ▴ 60

0

Entering edit mode

Symap worked for comparing small vs big assembly. But it is still running for other case 'big vs big' assembly (910 Mb, 5931 contigs) for self comparison over 5 days !. Any tips ?

ADD REPLY • link 6.8 years ago by arunprasanna83 ▴ 60

0

Entering edit mode

I usually remove small contigs (<10kb) because these will only add chaos to the graph anyway.

ADD REPLY • link 6.8 years ago by Philipp Bayer 8.8k