I quickly wrote something for a CLUSTALW alignment (not fasta). See: http://lindenb.github.io/jvarkit/MsaToVcf.html
It seems to work with the following clustalw input:
$ curl https://raw.github.com/biopython/biopython/master/Tests/Clustalw/opuntia.aln
CLUSTAL W (1.81) multiple sequence alignment
gi|6273285|gb|AF191659.1|AF191 TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273284|gb|AF191658.1|AF191 TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273287|gb|AF191661.1|AF191 TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273286|gb|AF191660.1|AF191 TATACATAAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273290|gb|AF191664.1|AF191 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273289|gb|AF191663.1|AF191 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273291|gb|AF191665.1|AF191 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
******* **** *************************************
gi|6273285|gb|AF191659.1|AF191 TATATA----------ATATATTTCAAATTTCCTTATATACCCAAATATA
gi|6273284|gb|AF191658.1|AF191 TATATATA--------ATATATTTCAAATTTCCTTATATACCCAAATATA
gi|6273287|gb|AF191661.1|AF191 TATATA----------ATATATTTCAAATTTCCTTATATATCCAAATATA
gi|6273286|gb|AF191660.1|AF191 TATATA----------ATATATTTATAATTTCCTTATATATCCAAATATA
gi|6273290|gb|AF191664.1|AF191 TATATATATA------ATATATTTCAAATTCCCTTATATATCCAAATATA
gi|6273289|gb|AF191663.1|AF191 TATATATATA------ATATATTTCAAATTCCCTTATATATCCAAATATA
gi|6273291|gb|AF191665.1|AF191 TATATATATATATATAATATATTTCAAATTCCCTTATATATCCAAATATA
****** ******** **** ********* *********
gi|6273285|gb|AF191659.1|AF191 AAAATATCTAATAAATTAGATGAATATCAAAGAATCCATTGATTTAGTGT
gi|6273284|gb|AF191658.1|AF191 AAAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGT
gi|6273287|gb|AF191661.1|AF191 AAAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGT
gi|6273286|gb|AF191660.1|AF191 AAAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGT
gi|6273290|gb|AF191664.1|AF191 AAAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGT
gi|6273289|gb|AF191663.1|AF191 AAAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTAT
gi|6273291|gb|AF191665.1|AF191 AAAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGT
************************************ *********** *
gi|6273285|gb|AF191659.1|AF191 ACCAGA
gi|6273284|gb|AF191658.1|AF191 ACCAGA
gi|6273287|gb|AF191661.1|AF191 ACCAGA
gi|6273286|gb|AF191660.1|AF191 ACCAGA
gi|6273290|gb|AF191664.1|AF191 ACCAGA
gi|6273289|gb|AF191663.1|AF191 ACCAGA
gi|6273291|gb|AF191665.1|AF191 ACCAGA
******
$ curl https://raw.github.com/biopython/biopython/master/Tests/Clustalw/opuntia.aln" |\
java -jar dist/biostar94573.jar
##fileformat=VCFv4.1
##Biostar94573CmdLine=
##Biostar94573Version=ca765415946f3ed0827af0773128178bc6aa2f62
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth.">
##contig=<ID=chrUn,length=156>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT gi|6273284|gb|AF191658.1|AF191 gi|6273285|gb|AF191659.1|AF191 gi|6273286|gb|AF191660.1|AF191 gi|6273287|gb|AF191661.1|AF191 gi|6273289|gb|AF191663.1|AF191 gi|6273290|gb|AF191664.1|AF191 gi|6273291|gb|AF191665.1|AF191
chrUn 8 . T A . . DP=7 GT:DP 0:1 0:1 1:1 0:1 0:1 0:1 0:1
chrUn 13 . A G . . DP=7 GT:DP 0:1 0:1 0:1 0:1 1:1 1:1 1:1
chrUn 56 . ATATATATATA ATA,A,ATATA . . DP=7 GT:DP 1:1 2:1 2:1 2:1 3:1 3:1 0:1
chrUn 74 . TCA TAT . . DP=7 GT:DP 0:1 0:1 1:1 0:1 0:1 0:1 0:1
chrUn 81 . T C . . DP=7 GT:DP 0:1 0:1 0:1 0:1 1:1 1:1 1:1
chrUn 91 . T C . . DP=7 GT:DP 1:1 1:1 0:1 0:1 0:1 0:1 0:1
chrUn 137 . T C . . DP=7 GT:DP 0:1 1:1 0:1 0:1 0:1 0:1 0:1
chrUn 149 . G A . . DP=7 GT:DP 0:1 0:1 0:1 0:1 1:1 0:1 0:1
Edit
I added a support for fasta/MSA
$ cat input.msa
>Ind1
ACGTGGCTAGATCA
>Ind2
ACGTGGCTAGATCA
>Ind3
ACGTGCCTAGATCA
get the output:
$ cat input.msa | java -jar dist/biostar94573.jar
##fileformat=VCFv4.1
##Biostar94573CmdLine=
##Biostar94573Version=4094eaed3dd2c9364309c20f88c6f79ad54a7450
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth.">
##contig=<ID=chrUn,length=14>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Ind1 Ind2 Ind3
chrUn 6 . G C . . DP=3 GT:DP 0:1 0:1 1
what's your input ? a CLUSTALW aln file ?
The input file is a txt file (.fas for example) that contains only the aligned sequences (two lines per individual : one of comment, the second one for the sequence itself). Here is an example:
how are managed the indels ?
It will be nice to manage the indels too to make the script more general. But in my case, I do not need to consider these markers.
I added a support to MSA/Fasta. See below.