Converting ms or msms output to PLINK or VCF files
2
2
Entering edit mode
7.8 years ago

There are two widely used coalescent simulation tools: Hudson’s MS (Hudson 2002), as well as Ewing’s MSMS (Ewing and Hermisson 2010), which incorporates selection.

I was wondering if anyone knew or had a script that could convert the output format of these two programs (which are the same) to either PLINK or VCF format.

Example output from ms or msms:

segsites: 3
positions: 0.05509 0.21466 0.70900
000
110
100
100
100
100
100
100
001
001
SNP genome sequence • 4.3k views
ADD COMMENT
0
Entering edit mode

hello I have the same question, do you have resolve this ? thanks

ADD REPLY
0
Entering edit mode

This is a tangent but I really like msprime as it can give you VCF format directly.

ADD REPLY
0
Entering edit mode

another comment, I do not know what you are trying to do but the ms output produces this output under an infinite sites model and these represent mutations occurring in different branches in the tree. To quote the manual: "An infinite sites model of mutation is assumed, and thus multiple-hits and back mutations do not occur. However, when used in conjunction with other programs, finite site mutation models or micro-satellite models can be studied. For example, the gene trees themselves can be output, and these gene trees can be used as input to other programs which will evolve the sequences under a variety of finite-site models." I used seq-gen in the past to do this using the gene trees as the manual describes.

ADD REPLY
0
Entering edit mode

Hello, I would also like to do this. Is there any more straight forward way to do this rather than using msprime?

Many thanks

ADD REPLY
0
Entering edit mode

Hello! I'm facing the same issue as well... I've tried to use ms2geno but haven't managed to make it work properly. Has anyone found a better option?

ADD REPLY
1
Entering edit mode
4.5 years ago

I've written a C++ utility 'ms2vcf' which converts ms output into vcf; for simplicity I added the code to a small package with other ms-related stuff: see https://sourceforge.net/projects/coatli/ and the wiki there (last entry).

ADD COMMENT
0
Entering edit mode

I download the package coatli and trying ms2vcf function. But I wonder how can I set the parameter for the input file? There's no option for the input. Could you please give me some comments or a full command line?

ADD REPLY
0
Entering edit mode

The function is intended for piping only and thus has no options to specify input and output files, sorry. A full command line is given in the Wiki there: ms 10 2 -t 10 -precision 16 | ms2vcf -length 1000 > msout.vcf

However, note that a conversion of ms output to vcf entails a violation of the infinite-sites-model, since a vcf can accommodate only a finite number of integer positions. This obviously limits its usefulness.

ADD REPLY
1
Entering edit mode
2.3 years ago

I know this post is old but I want to add another option for anyone who still trying to convert the ms output to vcf.

I write a program in R that take input from ms output (restricted to even number of sequences per repetition) and convert it to vcf.gz.

see : https://github.com/thehung92/shareUtils/tree/main/ms2vcf

example command: Rscript ms2vcf.R -r 2:3E7-4E7 -p example/my_sim_pop example/output.ms

limitation: This is not a standalone program as it depend on many other R library and bash binary that you have to install yourself.

ADD COMMENT

Login before adding your answer.

Traffic: 2248 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6