Artemis Comparison Tool "error while reading: cannot understand comparison file format"
0
0
Entering edit mode
5.1 years ago
jleehan ▴ 120

CI am trying to visualize the differences in the genome of an evolved strain of Bacillus subtilis and have attempted using numerous methods including mauve, BRIG and ACT. I haven't been able to get any of them to work, but feel as though ACT has been as close as I've been able to get.

To generate the comparison file, I followed instructions from the guide that is included in this article as an additional file and used blast to create the comparison file with the following commands in Powershell:

PS C:\[path to folder containing fasta genomes]> makeblastdb -in Bsubtilis168.fasta -dbtype nucl

PS C:\[path to folder containing fasta genomes]> blastn -query 1663_S2.fa -db Bsubtilis168.fasta -evalue 1 -task megablast -outfmt 6 > 1663vWT.crunch

I then run ACT and load the reference genome as sequence 1, the evolved genome assembly as sequence 2 and the crunch file as the comparison file. I then click apply and the program immediately uses up >90% of my CPU often times choking the computer to being unusable, so I just walk away and let it run, because if I try to do anything else on the computer, the program will stop responding. Then if I leave it alone, when I come back it gives me the error in the title of this post.

I'm at a loss of what to do here and any advice on this situation would be greatly appreciated.

ACT Artemis Comparative genomics • 2.1k views
ADD COMMENT
0
Entering edit mode

Try running the blast as just a straightforward query versus subject (do away with the database).

Can you post the first and last dozen or so lines from your crunch file too?

ADD REPLY
0
Entering edit mode

Here are the first 12 lines:

FS10000238:8:BPA73012-1025:1:1101:12160:1970/1  NC_000964.3 100.000 145 0   0   7   151 1   145 6.31e-73    268
FS10000238:8:BPA73012-1025:1:1102:0:0/1 NC_000964.3 100.000 145 0   0   6   150 1   145 6.26e-73    268
FS10000238:8:BPA73012-1025:1:1106:0:0/1 NC_000964.3 100.000 144 0   0   7   150 1   144 2.25e-72    267
FS10000238:8:BPA73012-1025:1:1112:0:0/1 NC_000964.3 100.000 140 0   0   12  151 1   140 3.80e-70    259
FS10000238:8:BPA73012-1025:1:1115:0:0/1 NC_000964.3 100.000 130 0   0   21  150 1   130 1.36e-64    241
FS10000238:8:BPA73012-1025:1:1115:0:0/2 NC_000964.3 100.000 135 0   0   17  151 1   135 2.28e-67    250
FS10000238:8:BPA73012-1025:1:1116:0:0/2 NC_000964.3 100.000 129 0   0   23  151 1   129 4.95e-64    239
FS10000238:8:BPA73012-1025:1:1101:14660:2930/2  NC_000964.3 100.000 130 0   0   1   130 130 1   1.36e-64    241
FS10000238:8:BPA73012-1025:1:1102:0:0/2 NC_000964.3 100.000 127 0   0   1   127 127 1   6.35e-63    235
FS10000238:8:BPA73012-1025:1:1105:0:0/1 NC_000964.3 100.000 129 0   0   1   129 129 1   4.95e-64    239
FS10000238:8:BPA73012-1025:1:1106:0:0/2 NC_000964.3 100.000 128 0   0   1   128 128 1   1.78e-63    237
FS10000238:8:BPA73012-1025:1:1107:0:0/1 NC_000964.3 100.000 129 0   0   1   129 129 1   4.95e-64    239

and the last 12 lines:

FS10000238:8:BPA73012-1025:1:1101:8270:3970/1   NC_000964.3 100.000 46  0   0   1   46  3131268 3131313 6.82e-18    86.1
FS10000238:8:BPA73012-1025:1:1101:9720:4020/1   NC_000964.3 100.000 92  0   0   4   95  4148162 4148071 1.83e-43    171
FS10000238:8:BPA73012-1025:1:1101:9720:4020/2   NC_000964.3 100.000 92  0   0   1   92  4148071 4148162 1.82e-43    171
FS10000238:8:BPA73012-1025:1:1101:13670:4020/2  NC_000964.3 100.000 107 0   0   1   107 128917  128811  8.39e-52    198
FS10000238:8:BPA73012-1025:1:1102:0:0/1 NC_000964.3 86.000  150 21  0   2   151 274013  273864  1.10e-40    161
FS10000238:8:BPA73012-1025:1:1102:0:0/2 NC_000964.3 98.675  151 2   0   1   151 273845  273995  6.31e-73    268
FS10000238:8:BPA73012-1025:1:1103:0:0/1 NC_000964.3 100.000 68  0   0   1   68  598956  598889  4.02e-30    126
FS10000238:8:BPA73012-1025:1:1103:0:0/1 NC_000964.3 100.000 46  0   0   62  107 598844  598889  6.82e-18    86.1
FS10000238:8:BPA73012-1025:1:1103:0:0/2 NC_000964.3 100.000 68  0   0   40  107 598889  598956  2.67e-30    126
FS10000238:8:BPA73012-1025:1:1103:0:0/2 NC_000964.3 100.000 46  0   0   1   46  598889  598844  4.53e-18    86.1
FS10000238:8:BPA73012-1025:1:1106:0:0/2 NC_000964.3 94.667  150 8   0   1   150 3965    4114    2.30e-62    233 
FS10000238:8:BPA73012-1025:1:1116:0:0/1 NC_000964.3 94.444  90  5   0   2   91  3397504 3397593 5.16e-34    139
FS10000238:8:BPA73012-1025:1:1116:0:0/2 NC_000964.3 100.000 89  0   0   1   89  3397592 3397504 8.51e-42    165

I'll do the blast without the database and see what happens.

ADD REPLY
0
Entering edit mode

Are you BLASTing reads against the reference? That's what your sequence ID's look like.

ADD REPLY
0
Entering edit mode

I am BLASTing my reads against the reference genome

ADD REPLY
0
Entering edit mode

This isn't really the best use case for ACT. ACT only really works for comparing whole genomes, and even then, one of them must be complete.

If you want to look at how reads align to a reference, create a BAM file and then examine it in IGV or Tablet or similar.

ADD REPLY
0
Entering edit mode

Okay, thanks! I already have the BAMs

ADD REPLY

Login before adding your answer.

Traffic: 2341 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6