Question

Alelle frequency plot

1

Entering edit mode

3.7 years ago

am29 ▴ 60

Hi,

I have to plot allele frequencies of two different SNP chip datasets. I have two VCF files and would like to make a scatterplot in which these 2 datasets are plotted one against each other. What is the easiest way to do this? I apologize if this question was already posted.

allelefrequency plot • 2.5k views

ADD COMMENT • link 3.6 years ago by am29 ▴ 60

0

Entering edit mode

Hi, I tried to do the same thing with another dataset. It seems that the join command introduces zeroes, although all values are known. After running bcftools query command my dataset1.tsv looks like this:

SNP1 0.520344
SNP2 0.951487
SNP3 0.467136

My dataset2.tsv looks like this:

SNP1 0.549
SNP2 0.034
SNP3 0.557

After running join -a 1 -a 2 -o '1.2,2.2' -t $'\t' -1 1 -2 1 -e 0.0 dataset1.tsv dataset2.tsv > merged.tsv

I get this when less merged.tsv:

0.520344        0.0
0.0     0.549
0.0     0.034
0.951487        0.0
0.0     0.557

Is there are a reason for introducing the zeroes here? Both files have the same number of SNPs.

ADD REPLY • link 3.7 years ago by am29 ▴ 60

0

Entering edit mode

Does anyone know maybe?

ADD REPLY • link 3.7 years ago by am29 ▴ 60

score 4 · Accepted Answer · 2021-11-05

bcftools query -i 'N_ALT=1' -f '%CHROM\_%POS\_%REF\_%ALT\t%INFO/AF\n' in1.vcf.gz | sort -t $'\t' -k1,1 > a.tsv
bcftools query -i 'N_ALT=1' -f '%CHROM\_%POS\_%REF\_%ALT\t%INFO/AF\n' in2.vcf.gz | sort -t $'\t' -k1,1 > b.tsv
join -a 1 -a 2  -o '1.2,2.2'  -t $'\t' -1 1 -2 1 -e 0.0 a.tsv b.tsv > compare.tsv

plot with R : https://stackoverflow.com/questions/42232340/how-to-read-tab-separated-files-in-r-to-generate-scatterplot

score 2 · Accepted Answer · 2021-11-20

Hi, your question motivated me to add a new Python API method called pyvcf.plot_af_correlation (documentation) to the fuc package I wrote. Take a look at the example below and see if this achieves your goal.

enter image description here

>>> from fuc import pyvcf, common
>>> import matplotlib.pyplot as plt
>>> data1 = {
...     'CHROM': ['chr1', 'chr1', 'chr1', 'chr1', 'chr1', 'chr1'],
...     'POS': [100, 101, 102, 103, 104, 105],
...     'ID': ['.', '.', '.', '.', '.', '.'],
...     'REF': ['G', 'T', 'G', 'T', 'A', 'C'],
...     'ALT': ['A', 'C', 'C', 'G,A', 'C', 'T'],
...     'QUAL': ['.', '.', '.', '.', '.', '.'],
...     'FILTER': ['.', '.', '.', '.', '.', '.'],
...     'INFO': ['.', '.', '.', '.', '.', '.'],
...     'FORMAT': ['GT:DP', 'GT', 'GT', 'GT', 'GT', 'GT'],
...     'A': ['0/1:30', '0/0', '1/1', '0/1', '1/1', '0/1'],
...     'B': ['0/0:30', '0/0', '0/1', '0/1', '1/1', '0/1'],
...     'C': ['1/1:30', '0/0', '1/1', '0/1', '1/1', '0/1'],
...     'D': ['0/0:30', '0/0', '0/0', '0/0', '1/1', '0/1'],
...     'E': ['0/0:30', '0/0', '0/0', '1/2', '1/1', '0/1'],
... }
>>> vf1 = pyvcf.VcfFrame.from_dict([], data1)
>>> data2 = {
...     'CHROM': ['chr1', 'chr1', 'chr1', 'chr1', 'chr1'],
...     'POS': [101, 102, 103, 104, 105],
...     'ID': ['.', '.', '.', '.', '.'],
...     'REF': ['T', 'G', 'T', 'A', 'C'],
...     'ALT': ['C', 'C', 'G,A', 'C', 'T'],
...     'QUAL': ['.', '.', '.', '.', '.'],
...     'FILTER': ['.', '.', '.', '.', '.'],
...     'INFO': ['.', '.', '.', '.', '.'],
...     'FORMAT': ['GT', 'GT', 'GT', 'GT', 'GT'],
...     'F': ['0/0', '0/1', '0/1', '1/1', '0/0'],
...     'G': ['0/0', '0/1', '0/1', '1/1', './.'],
...     'H': ['0/0', '0/1', '0/1', '1/1', '1/1'],
...     'I': ['0/0', '0/1', '0/0', '1/1', '1/1'],
...     'J': ['0/0', '0/1', '1/2', '1/1', '0/1'],
... }
>>> vf2 = pyvcf.VcfFrame.from_dict([], data2)
>>> pyvcf.plot_af_correlation(vf1, vf2)
>>> plt.tight_layout()

Please note that since this method is still being implemented in a development branch (0.28.0-dev) of fuc, in order to use it you need to install the fuc package locally:

$ git clone https://github.com/sbslee/fuc
$ cd fuc
$ git checkout 0.28.0-dev
$ pip install .

Please let me know if you have any questions and/or suggestions regarding this.