Hi, your question motivated me to add a new Python API method called pyvcf.plot_af_correlation
(documentation) to the fuc
package I wrote. Take a look at the example below and see if this achieves your goal.
>>> from fuc import pyvcf, common
>>> import matplotlib.pyplot as plt
>>> data1 = {
... 'CHROM': ['chr1', 'chr1', 'chr1', 'chr1', 'chr1', 'chr1'],
... 'POS': [100, 101, 102, 103, 104, 105],
... 'ID': ['.', '.', '.', '.', '.', '.'],
... 'REF': ['G', 'T', 'G', 'T', 'A', 'C'],
... 'ALT': ['A', 'C', 'C', 'G,A', 'C', 'T'],
... 'QUAL': ['.', '.', '.', '.', '.', '.'],
... 'FILTER': ['.', '.', '.', '.', '.', '.'],
... 'INFO': ['.', '.', '.', '.', '.', '.'],
... 'FORMAT': ['GT:DP', 'GT', 'GT', 'GT', 'GT', 'GT'],
... 'A': ['0/1:30', '0/0', '1/1', '0/1', '1/1', '0/1'],
... 'B': ['0/0:30', '0/0', '0/1', '0/1', '1/1', '0/1'],
... 'C': ['1/1:30', '0/0', '1/1', '0/1', '1/1', '0/1'],
... 'D': ['0/0:30', '0/0', '0/0', '0/0', '1/1', '0/1'],
... 'E': ['0/0:30', '0/0', '0/0', '1/2', '1/1', '0/1'],
... }
>>> vf1 = pyvcf.VcfFrame.from_dict([], data1)
>>> data2 = {
... 'CHROM': ['chr1', 'chr1', 'chr1', 'chr1', 'chr1'],
... 'POS': [101, 102, 103, 104, 105],
... 'ID': ['.', '.', '.', '.', '.'],
... 'REF': ['T', 'G', 'T', 'A', 'C'],
... 'ALT': ['C', 'C', 'G,A', 'C', 'T'],
... 'QUAL': ['.', '.', '.', '.', '.'],
... 'FILTER': ['.', '.', '.', '.', '.'],
... 'INFO': ['.', '.', '.', '.', '.'],
... 'FORMAT': ['GT', 'GT', 'GT', 'GT', 'GT'],
... 'F': ['0/0', '0/1', '0/1', '1/1', '0/0'],
... 'G': ['0/0', '0/1', '0/1', '1/1', './.'],
... 'H': ['0/0', '0/1', '0/1', '1/1', '1/1'],
... 'I': ['0/0', '0/1', '0/0', '1/1', '1/1'],
... 'J': ['0/0', '0/1', '1/2', '1/1', '0/1'],
... }
>>> vf2 = pyvcf.VcfFrame.from_dict([], data2)
>>> pyvcf.plot_af_correlation(vf1, vf2)
>>> plt.tight_layout()
Please note that since this method is still being implemented in a development branch (0.28.0-dev
) of fuc, in order to use it you need to install the fuc
package locally:
$ git clone https://github.com/sbslee/fuc
$ cd fuc
$ git checkout 0.28.0-dev
$ pip install .
Please let me know if you have any questions and/or suggestions regarding this.
Hi, I tried to do the same thing with another dataset. It seems that the join command introduces zeroes, although all values are known. After running bcftools query command my dataset1.tsv looks like this:
My dataset2.tsv looks like this:
After running
join -a 1 -a 2 -o '1.2,2.2' -t $'\t' -1 1 -2 1 -e 0.0 dataset1.tsv dataset2.tsv > merged.tsv
I get this when less merged.tsv:
Is there are a reason for introducing the zeroes here? Both files have the same number of SNPs.
Does anyone know maybe?