How should I interpret this Mummer output files?
1
3
Entering edit mode
10.5 years ago
madkitty ▴ 690

I have this output file from mummer, which shows mulpitle 100% Identity Ref/Query. My reference is the gene DNMT1 (specie A) and my query is a bunch of scaffold for a specie that hasn't been mapped yet to the chromosome level (closely related specie B). Other genes showed only one scaffold (query) having 100% identity and multiple alignment showed that indeed only a few SNPs are present in the whole gene. Though in certain genes, mummer returns a bunch 100% identity scaffold which do not align at all to the reference (or with millions of SNPs).

I already thought of looking at scaffold having a maximum COV Q and COV R, but these parameters are variable with the size and does not tell me the amount of SNPs in the sequence.

Is there a way for me to filter more effectively which scaffold is truly a match to my reference?

Here is a sample of what my show-coords output file looks like for 100% Identity:

[S1]    [E1]    [S2]     [E2]     [LEN 1]   [LEN 2]   [% IDY]   [LEN R]   [LEN Q]   [COV R]   [COV Q]   [TAGS]
13898   13988   57004    57094    91        91        100       35452     107079    0.26      0.08      gi|194246388:49748443-49783894   scaffold11691
13898   13964   949      1015     67        67        100       35452     36637     0.19      0.18      gi|194246388:49748443-49783894   scaffold15913
13898   13964   1040     974      67        67        100       35452     2729      0.19      2.46      gi|194246388:49748443-49783894   scaffold47627
13900   13983   78963    79046    84        84        100       35452     161922    0.24      0.05      gi|194246388:49748443-49783894   scaffold12557
13900   13989   11716    11627    90        90        100       35452     30624     0.25      0.29      gi|194246388:49748443-49783894   scaffold25396
13900   13970   1089     1019     71        71        100       35452     1502      0.2       4.73      gi|194246388:49748443-49783894   scaffold27987
19908   19994   10242    10328    87        87        100       35452     17701     0.25      0.49      gi|194246388:49748443-49783894   scaffold52071
19909   19994   278272   278357   86        86        100       35452     353920    0.24      0.02      gi|194246388:49748443-49783894   scaffold1991
19910   19996   19486    19400    87        87        100       35452     81941     0.25      0.11      gi|194246388:49748443-49783894   scaffold14370
19910   19994   1805     1889     85        85        100       35452     2036      0.24      4.17      gi|194246388:49748443-49783894   scaffold46791
19911   19986   510      585      76        76        100       35452     1364      0.21      5.57      gi|194246388:49748443-49783894   scaffold84138
19912   19997   9499     9414     86        86        100       35452     61074     0.24      0.14      gi|194246388:49748443-49783894   scaffold44157
19912   19997   8587     8502     86        86        100       35452     15813     0.24      0.54      gi|194246388:49748443-49783894   scaffold9318
19922   19998   939      863      77        77        100       35452     1465      0.22      5.26      gi|194246388:49748443-49783894   scaffold35518
19928   19999   1018     1089     72        72        100       35452     1502      0.2       4.79      gi|194246388:49748443-49783894   scaffold27987
19929   19995   23559    23493    67        67        100       35452     28327     0.19      0.24      gi|194246388:49748443-49783894   scaffold27519
19932   19997   344      409      66        66        100       35452     5264      0.19      1.25      gi|194246388:49748443-49783894   scaffold13914
19935   20000   974      1039     66        66        100       35452     2729      0.19      2.42      gi|194246388:49748443-49783894   scaffold47627

Here is the code I use to retrieve these information:

(Specie A: reference genome is known // Specie B: reference genome currently unknown, only scaffolds are available).

$ nucmer --prefix=ref_qry Gene_Specie_A.fasta Specie_B.fa
$ show-coords -rclT ref_qry.delta > ref_qry.coords
mapping mummer show-coords • 5.9k views
ADD COMMENT
0
Entering edit mode
2.4 years ago

hi man, did you get one interpretation of your information? I'm in the same situation

ADD COMMENT
2
Entering edit mode

I have found a solution: https://biohpc.cornell.edu/doc/alignment_exercise2.html

i hope it usefull for somebody.

ADD REPLY

Login before adding your answer.

Traffic: 2234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6