MUMmer alignment result interpretation
0
0
Entering edit mode
4.7 years ago
AP ▴ 80

Hello Biostars,

I am doing whole genome alignment using NUCmer (a program under MUMmer). I am using this alignment to separate core and accessory chromosome. From NUCmer alignment I generated delta files which I filtered using options -r and -g and generated the coordinate file. This coordinate file looks like this:

[S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q]  | [TAGS]
===============================================================================================================================
       3     1062  |     2882     3943  |     1060     1062  |    87.29  |    47164    22944  |     2.25     4.63  | sca_100_unmapped  scaffold_479
       3     1046  |     2196     3240  |     1044     1045  |    88.52  |    47164   201231  |     2.21     0.52  | sca_100_unmapped   scaffold_68
    2091     2303  |    24338    24550  |      213      213  |    88.02  |    47164    27763  |     0.45     0.77  | sca_100_unmapped   scaffold_442
    9756    11454  |   108083   106395  |     1699     1689  |    93.47  |    47164   181231  |     3.60     0.93  | sca_100_unmapped   scaffold_81
   13817    15198  |    54353    55731  |     1382     1379  |    87.49  |    47164   146674  |     2.93     0.94  | sca_100_unmapped   scaffold_110
   46400    46664  |     7992     7731  |      265      262  |    84.27  |    47164    30552  |     0.56     0.86  | sca_100_unmapped   scaffold_418
    2236     3032  |    64822    65618  |      797      797  |    83.71  |    46409    72978  |     1.72     1.09  | sca_101_unmapped   scaffold_232
    2239     3578  |    21278    19939  |     1340     1340  |    79.63  |    46409    28656  |     2.89     4.68  | sca_101_unmapped   scaffold_438
   11309    11945  |    41233    40596  |      637      638  |    85.76  |    46409    48260  |     1.37     1.32  | sca_101_unmapped   scaffold_316
   12138    12918  |    40117    39337  |      781      781  |    86.04  |    46409    48260  |     1.68     1.62  | sca_101_unmapped   scaffold_316
   12840    16991  |   198620   202766  |     4152     4147  |    85.95  |    46409   284610  |     8.95     1.46  | sca_101_unmapped   scaffold_48
   24138    24287  |    48814    48963  |      150      150  |    96.67  |    46409   178768  |     0.32     0.08  | sca_101_unmapped   scaffold_84

As you can see from the table one of my scaffold in reference genome is matching with many scaffolds in the query genome. Another problem I have is the higher number of scaffolds in both of my reference and query genome. I am having trouble on how to further filter my result and separate the core and accessory region in my query genome. I am stuck in this step from quite some time and I could not find any resource which will tell me what to do. I will really appreciate for any suggestions.

Thank you, Ambika

MUMmer Nucmer genome alignment • 1.3k views
ADD COMMENT
1
Entering edit mode

Not directly answering your question, but I suggest trying Anvi'o. I don't use the program, but I know someone who dealt with the same issue as you and Anvi'o was his solution. It has a nice graphical interface and many tutorials are available.

If this is a complete table, there isn't much overlap between the two genomes. Aside from a single ~4kb match, everything else is below 2kb. It may be more informative if you translate/annotate the genomes, and compare at protein level.

ADD REPLY
0
Entering edit mode

It is not a complete table I have some overlap regions more than 20kb as well. I do have the gene annotation file. Do you think comparing the protein sequences will help in distinguishing the accessory region of the genome. And thank you for suggesting that program I will look into that.

ADD REPLY

Login before adding your answer.

Traffic: 2005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6