Question

How should I interpret this Mummer output files?

3

Entering edit mode

10.5 years ago

madkitty ▴ 690

I have this output file from mummer, which shows mulpitle 100% Identity Ref/Query. My reference is the gene DNMT1 (specie A) and my query is a bunch of scaffold for a specie that hasn't been mapped yet to the chromosome level (closely related specie B). Other genes showed only one scaffold (query) having 100% identity and multiple alignment showed that indeed only a few SNPs are present in the whole gene. Though in certain genes, mummer returns a bunch 100% identity scaffold which do not align at all to the reference (or with millions of SNPs).

I already thought of looking at scaffold having a maximum COV Q and COV R, but these parameters are variable with the size and does not tell me the amount of SNPs in the sequence.

Is there a way for me to filter more effectively which scaffold is truly a match to my reference?

Here is a sample of what my show-coords output file looks like for 100% Identity:

[S1]    [E1]    [S2]     [E2]     [LEN 1]   [LEN 2]   [% IDY]   [LEN R]   [LEN Q]   [COV R]   [COV Q]   [TAGS]
13898   13988   57004    57094    91        91        100       35452     107079    0.26      0.08      gi|194246388:49748443-49783894   scaffold11691
13898   13964   949      1015     67        67        100       35452     36637     0.19      0.18      gi|194246388:49748443-49783894   scaffold15913
13898   13964   1040     974      67        67        100       35452     2729      0.19      2.46      gi|194246388:49748443-49783894   scaffold47627
13900   13983   78963    79046    84        84        100       35452     161922    0.24      0.05      gi|194246388:49748443-49783894   scaffold12557
13900   13989   11716    11627    90        90        100       35452     30624     0.25      0.29      gi|194246388:49748443-49783894   scaffold25396
13900   13970   1089     1019     71        71        100       35452     1502      0.2       4.73      gi|194246388:49748443-49783894   scaffold27987
19908   19994   10242    10328    87        87        100       35452     17701     0.25      0.49      gi|194246388:49748443-49783894   scaffold52071
19909   19994   278272   278357   86        86        100       35452     353920    0.24      0.02      gi|194246388:49748443-49783894   scaffold1991
19910   19996   19486    19400    87        87        100       35452     81941     0.25      0.11      gi|194246388:49748443-49783894   scaffold14370
19910   19994   1805     1889     85        85        100       35452     2036      0.24      4.17      gi|194246388:49748443-49783894   scaffold46791
19911   19986   510      585      76        76        100       35452     1364      0.21      5.57      gi|194246388:49748443-49783894   scaffold84138
19912   19997   9499     9414     86        86        100       35452     61074     0.24      0.14      gi|194246388:49748443-49783894   scaffold44157
19912   19997   8587     8502     86        86        100       35452     15813     0.24      0.54      gi|194246388:49748443-49783894   scaffold9318
19922   19998   939      863      77        77        100       35452     1465      0.22      5.26      gi|194246388:49748443-49783894   scaffold35518
19928   19999   1018     1089     72        72        100       35452     1502      0.2       4.79      gi|194246388:49748443-49783894   scaffold27987
19929   19995   23559    23493    67        67        100       35452     28327     0.19      0.24      gi|194246388:49748443-49783894   scaffold27519
19932   19997   344      409      66        66        100       35452     5264      0.19      1.25      gi|194246388:49748443-49783894   scaffold13914
19935   20000   974      1039     66        66        100       35452     2729      0.19      2.42      gi|194246388:49748443-49783894   scaffold47627

Here is the code I use to retrieve these information:

(Specie A: reference genome is known // Specie B: reference genome currently unknown, only scaffolds are available).

$ nucmer --prefix=ref_qry Gene_Specie_A.fasta Specie_B.fa
$ show-coords -rclT ref_qry.delta > ref_qry.coords

mapping mummer show-coords • 5.9k views

ADD COMMENT • link updated 2.4 years ago by carlosgonzalezcruz327 ▴ 20 • written 10.5 years ago by madkitty ▴ 690

score 0 · Answer 1 · 2022-06-17

0

Entering edit mode

2.4 years ago

carlosgonzalezcruz327 ▴ 20

hi man, did you get one interpretation of your information? I'm in the same situation

ADD COMMENT • link 2.4 years ago by carlosgonzalezcruz327 ▴ 20

2

Entering edit mode

I have found a solution: https://biohpc.cornell.edu/doc/alignment_exercise2.html

i hope it usefull for somebody.

ADD REPLY • link 2.4 years ago by carlosgonzalezcruz327 ▴ 20