Hi everyone,
I'm working on a CentOS based computer cluster and wanted to run vcfnulldotslashdot on some qualityfiltered and normalized vcf-files but got an Error message regarding line 18 of the vcfnulldotslashdot-script:
Line 18: flatgls = ",".join([str(x) for x in [0]*multcoeff(alleles,2)])
TypeError: can't multiply sequence by non-int of type 'float'
This is the nulldotslashdotscript:
#!/usr/bin/env python3
2. # Rewrite null genotypes to ./.
3. #
4. import sys
5. import math
6.
7. def bincoeff(n,k): return math.factorial(n) / (math.factorial(n-k)*math.factorial(k))
8. def multcoeff(n,k): return bincoeff(n+k-1,k)
9.
10. for line in sys.stdin:
11. if line.startswith("#"):
12. print(line.strip())
13. continue
14. fields = line.strip().split("\t")
15. alleles = len(fields[4].split(","))+1
16. # assume that we have GT:GL
17. # how many genotypes? assume diploid
18. flatgls = ",".join([str(x) for x in [0]*multcoeff(alleles,2)])
19. for i in range(9, len(fields)):
20. if fields[i] == ".":
21. fields[i] = "./.:" + flatgls
22. print("\t".join(fields))
I figured the problem was likely that for some reason the multcoeff got formatted as a float number which then caused some trouble when applying to the [0]*multcoeff command. As the binomialcoefficient should be a whole number in theory I tried to resolve this by changing line 8 of the code a little and just converting multcoeff to an integer.
Like this:
1. #!/usr/bin/env python3
2. # Rewrite null genotypes to ./.
3. #
4. import sys
5. import math
6.
7. def bincoeff(n,k): return math.factorial(n) / (math.factorial(n-k)*math.factorial(k))
8. def multcoeff(n,k): return int(bincoeff(n+k-1,k))
9.
10. for line in sys.stdin:
11. if line.startswith("#"):
12. print(line.strip())
13. continue
14. fields = line.strip().split("\t")
15. alleles = len(fields[4].split(","))+1
16. # assume that we have GT:GL
17. # how many genotypes? assume diploid
18. flatgls = ",".join([str(x) for x in [0]*multcoeff(alleles,2)])
19. for i in range(9, len(fields)):
20. if fields[i] == ".":
21. fields[i] = "./.:" + flatgls
22. print("\t".join(fields))
It seemed to work, at least I didn't get the error anymore. But I'm quite new to programming and bioinformatics and am a bit insecure now. I'm wondering if I can really trust the resulting vcf-file and the change of the code didn't change anything about the function. Also I'm puzzled why nobody else seems to have had this issue before, at least I couldn't find anything on google about it.
Would be very grateful if someone with more expertise could comment on this.