Tool:Converting MUMmer snps file to a real VCF file
0
4
Entering edit mode
5.3 years ago

Hi all,

UPDATE:

the tool is now available as part of a growing set of tools that I called all2vcf. You can clone the repository from here:

https://github.com/MatteoSchiavinato/all2vcf

ORIGINAL:

For my own necessity I have recently made this tool that, contrary to the original mummer-2-vcf.pl script, converts the snps output of the show-snps MUMmer tool to a real VCF file (with header and the like).

https://github.com/MatteoSchiavinato/Utilities/blob/master/my-mummer-2-vcf.py

EDIT: please use the --input-header option if your input *.snps file has a header. Otherwise you will have an IndexError somewhere in the code.

It worked with my data, and it should work with yours. Anyone interested in giving it a test run?

VCF mummer SNP • 7.8k views
ADD COMMENT
1
Entering edit mode

I'll try it right now! thank you so much for your contribution! I was looking exactly for some way to convert the output of my mummer snps to vcf :D

ADD REPLY
0
Entering edit mode

This would be very useful but it's not quite working for me.

I'm trying to compare 2 viral (30k) genomes.

I used MUMmer's nucmer and show-snps commands:

nucmer --prefilx=1000-X70047 nc_045512_wg.fasta 1000-X70047.fasta
show-snps -T 1000-X70047.delta > 1000-X70047.snps

The snps file looks like this:

/Users/moduff/Desktop/SARS2/nc_045512_wg.fasta /Users/moduff/Desktop/SARS2/1000-X70047.fasta
NUCMER

[P1]    [SUB]   [SUB]   [P2]    [BUFF]  [DIST]  [R]     [Q]     [FRM]   [TAGS]
8782    C       T       8383    8369    8383    0       0       1       1       NC_045512        X70047_consensus
17747   C       T       17348   111     12155   0       0       1       1       NC_045512       X70047_consensus
17858   A       G       17459   111     12044   0       0       1       1       NC_045512       X70047_consensus
18060   C       T       17661   202     11842   0       0       1       1       NC_045512       X70047_consensus
27525   A       G       27126   187     2377    0       0       1       1       NC_045512        X70047_consensus
28144   T       C       27745   619     1758    0       0       1       1       NC_045512        X70047_consensus

When I fed this to mummer-2-vcf.pl via

my-mummer-2-vcf.py -s 1000-X70047.snps -n --output-header --reference nc_045512_wg.fasta

I get this error message:

Traceback (most recent call last):
  File "my-mummer-2-vcf.py", line 313, in <module>
    Vcf_lines = [ convert_snps_to_vcf(line) for line in Lines ]
  File "my-mummer-2-vcf.py", line 313, in <listcomp>
    Vcf_lines = [ convert_snps_to_vcf(line) for line in Lines ]
  File "my-mummer-2-vcf.py", line 56, in convert_snps_to_vcf
    scaffold = lst[10]
IndexError: list index out of range

Any suggestions?

ADD REPLY
0
Entering edit mode

Yes, please read through this github issue that I closed some weeks ago, I believe it's the same issue :)

https://github.com/MatteoSchiavinato/Utilities/issues/1

(to improve readability, could you wrap the code parts of your answer inside a code block?)

ADD REPLY
0
Entering edit mode

Hi, I seem to be getting the same error even though I used -T with my show-snps. Was this error ever resolved/confirmed?

ADD REPLY
0
Entering edit mode

Thank you for your practical code!

ADD REPLY
0
Entering edit mode

I converted my .snp file to .vcf without problems, later I tried to filter with the .vcf as an input but the Qual column appears empty. Do I have to do anything before filtering?. Many thanks for your code.

INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL."
CHROM POS ID REF ALT QUAL FILTER INFO
NW_003020038.1 240 . A AG . . INDEL scaffold2:2512040 NW_003020038.1 240 . A AA . . INDEL scaffold2:2512041 NW_003020038.1 240 . A AC . . INDEL scaffold2:2512042 NW_003020038.1 240 . A AC . . INDEL scaffold2:2512043 NW_003020038.1 240 . A AG . . INDEL scaffold2:2512044 NW_003020038.1 240 . A AA . . INDEL scaffold2:2512045 NW_003020038.1 281 . a aA . . INDEL scaffold2:2511996 NW_003020038.1 281 . a aA . . INDEL scaffold2:2511997 NW_003020038.1 281 . a aG . . INDEL scaffold2:2511998 NW_003020038.1 391 . aTT a . . INDEL scaffold2:2511886 NW_003020038.1 394 . T A . . . scaffold2:2511886 NW_003020038.1 493 . C T . . . scaffold2:2511787 NW_003020038.1 1393 . G A . . . scaffold2:2510887 NW_003020038.1 1395 . G T . . . scaffold2:2510885 NW_003020038.1 1875 . C T . . . scaffold2:2510405 NW_003020038.1 1881 . G A . . . scaffold2:2510399 NW_003020038.1 2336 . G C . . . scaffold2:2509944 NW_003020038.1 2882 . t tT . . INDEL scaffold2:2509394
ADD REPLY
0
Entering edit mode

Hi, I am not sure you can carry on any type of quality from mummer snps to a VCF file. Do you have any quality indication in your snps file?

ADD REPLY
0
Entering edit mode

Hi Matteo

This my .snps file.

[P1]  [SUB]  [SUB]    [P2][BUFF]  [DIST] [R]    [Q]     [FRM]   [TAGS]
241 .   G   2512040 0   241 0   0   1   -1  NW_003020038.1  scaffold2
241 .   A   2512041 0   241 0   0   1   -1  NW_003020038.1  scaffold2
241 .   C   2512042 0   241 0   0   1   -1  NW_003020038.1  scaffold2
241 .   C   2512043 0   241 0   0   1   -1  NW_003020038.1  scaffold2
241 .   G   2512044 0   241 0   0   1   -1  NW_003020038.1  scaffold2
241 .   A   2512045 0   241 0   0   1   -1  NW_003020038.1  scaffold2
282 .   A   2511996 0   282 0   0   1   -1  NW_003020038.1  scaffold2
282 .   A   2511997 0   282 0   0   1   -1  NW_003020038.1  scaffold2
282 .   G   2511998 0   282 0   0   1   -1  NW_003020038.1  scaffold2
392 T   .   2511886 1   392 0   0   1   -1  NW_003020038.1  scaffold2
393 T   .   2511886 1   393 0   0   1   -1  NW_003020038.1  scaffold2
394 T   A   2511886 1   394 0   0   1   -1  NW_003020038.1  scaffold2
493 C   T   2511787 99  493 0   0   1   -1  NW_003020038.1  scaffold2
1393    G   A   2510887 2   1393    0   0   1   -1  NW_003020038.1  scaffold2
ADD REPLY

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6