Alelle frequency plot
2
1
Entering edit mode
3.1 years ago
am29 ▴ 40

Hi,

I have to plot allele frequencies of two different SNP chip datasets. I have two VCF files and would like to make a scatterplot in which these 2 datasets are plotted one against each other. What is the easiest way to do this? I apologize if this question was already posted.

allelefrequency plot • 2.0k views
ADD COMMENT
0
Entering edit mode

Hi, I tried to do the same thing with another dataset. It seems that the join command introduces zeroes, although all values are known. After running bcftools query command my dataset1.tsv looks like this:

SNP1 0.520344
SNP2 0.951487
SNP3 0.467136

My dataset2.tsv looks like this:

SNP1 0.549
SNP2 0.034
SNP3 0.557

After running join -a 1 -a 2 -o '1.2,2.2' -t $'\t' -1 1 -2 1 -e 0.0 dataset1.tsv dataset2.tsv > merged.tsv

I get this when less merged.tsv:

0.520344        0.0
0.0     0.549
0.0     0.034
0.951487        0.0
0.0     0.557

Is there are a reason for introducing the zeroes here? Both files have the same number of SNPs.

ADD REPLY
0
Entering edit mode

Does anyone know maybe?

ADD REPLY
4
Entering edit mode
3.1 years ago
bcftools query -i 'N_ALT=1' -f '%CHROM\_%POS\_%REF\_%ALT\t%INFO/AF\n' in1.vcf.gz | sort -t $'\t' -k1,1 > a.tsv
bcftools query -i 'N_ALT=1' -f '%CHROM\_%POS\_%REF\_%ALT\t%INFO/AF\n' in2.vcf.gz | sort -t $'\t' -k1,1 > b.tsv
join -a 1 -a 2  -o '1.2,2.2'  -t $'\t' -1 1 -2 1 -e 0.0 a.tsv b.tsv > compare.tsv

plot with R : https://stackoverflow.com/questions/42232340/how-to-read-tab-separated-files-in-r-to-generate-scatterplot

ADD COMMENT
2
Entering edit mode
3.0 years ago
sbstevenlee ▴ 480

Hi, your question motivated me to add a new Python API method called pyvcf.plot_af_correlation (documentation) to the fuc package I wrote. Take a look at the example below and see if this achieves your goal.

enter image description here

>>> from fuc import pyvcf, common
>>> import matplotlib.pyplot as plt
>>> data1 = {
...     'CHROM': ['chr1', 'chr1', 'chr1', 'chr1', 'chr1', 'chr1'],
...     'POS': [100, 101, 102, 103, 104, 105],
...     'ID': ['.', '.', '.', '.', '.', '.'],
...     'REF': ['G', 'T', 'G', 'T', 'A', 'C'],
...     'ALT': ['A', 'C', 'C', 'G,A', 'C', 'T'],
...     'QUAL': ['.', '.', '.', '.', '.', '.'],
...     'FILTER': ['.', '.', '.', '.', '.', '.'],
...     'INFO': ['.', '.', '.', '.', '.', '.'],
...     'FORMAT': ['GT:DP', 'GT', 'GT', 'GT', 'GT', 'GT'],
...     'A': ['0/1:30', '0/0', '1/1', '0/1', '1/1', '0/1'],
...     'B': ['0/0:30', '0/0', '0/1', '0/1', '1/1', '0/1'],
...     'C': ['1/1:30', '0/0', '1/1', '0/1', '1/1', '0/1'],
...     'D': ['0/0:30', '0/0', '0/0', '0/0', '1/1', '0/1'],
...     'E': ['0/0:30', '0/0', '0/0', '1/2', '1/1', '0/1'],
... }
>>> vf1 = pyvcf.VcfFrame.from_dict([], data1)
>>> data2 = {
...     'CHROM': ['chr1', 'chr1', 'chr1', 'chr1', 'chr1'],
...     'POS': [101, 102, 103, 104, 105],
...     'ID': ['.', '.', '.', '.', '.'],
...     'REF': ['T', 'G', 'T', 'A', 'C'],
...     'ALT': ['C', 'C', 'G,A', 'C', 'T'],
...     'QUAL': ['.', '.', '.', '.', '.'],
...     'FILTER': ['.', '.', '.', '.', '.'],
...     'INFO': ['.', '.', '.', '.', '.'],
...     'FORMAT': ['GT', 'GT', 'GT', 'GT', 'GT'],
...     'F': ['0/0', '0/1', '0/1', '1/1', '0/0'],
...     'G': ['0/0', '0/1', '0/1', '1/1', './.'],
...     'H': ['0/0', '0/1', '0/1', '1/1', '1/1'],
...     'I': ['0/0', '0/1', '0/0', '1/1', '1/1'],
...     'J': ['0/0', '0/1', '1/2', '1/1', '0/1'],
... }
>>> vf2 = pyvcf.VcfFrame.from_dict([], data2)
>>> pyvcf.plot_af_correlation(vf1, vf2)
>>> plt.tight_layout()

Please note that since this method is still being implemented in a development branch (0.28.0-dev) of fuc, in order to use it you need to install the fuc package locally:

$ git clone https://github.com/sbslee/fuc
$ cd fuc
$ git checkout 0.28.0-dev
$ pip install .

Please let me know if you have any questions and/or suggestions regarding this.

ADD COMMENT
0
Entering edit mode

Thank you very much for this!

ADD REPLY

Login before adding your answer.

Traffic: 2117 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6