Hi,
I am using VcfTools to parser VCF files.
I can use it to generate a 012 matrix. This matrix is 2D, with the shape of (num of individuals, num of SNPs).
In each cell in the matrix, there is the number of occurrences, of the alternative allele for the specific SNP in a specific individual.
This is great for a biallelic data - means for every SNP there is a single allele.
In my case there are at most n
alleles per SNP,and I would like to have n
matrices, each one is for an "allele index" and it specify in each cell how many occurrences of that allele are, in the specific Individual in the specific SNP.
Does anyone familiar with a tool that can provide that?
Thanks
I want to ask: what mean 1? 0 and 2 in the last df ? also gt how changed after split alt for example 1/2 how will be changed after split alt?
The numbers in the last
df
represent allele count. For instance, since theA
sample is heterozygous forchr1-100-G-A
the sample has1
and so on.I'm not sure if I understand your second question. If you are asking about multiallelic loci, they will be split into multiple rows. For instance, the position
chr1-101
has two alternative alleles but in the lastdf
they were split into three rows (two alt + one ref).the last df represent allele count for reference? the second question yes I know that split the multiallelic but 1/2 is changed to what? 0/1 then it's count the allel? .. Also, I want to ask what is mean the GG or TT in last df? also why not split only multiallelic and keep bi allel as it self?
No, they represent the counts for both reference and alternative alleles. Continuing to use the
A
sample as an example, it is heterozygous forchr1-100-G-A
. Therefore, it has an count of 1 forchr1-100-G-A
(alt) and 1 forchr1-100-G-G
(ref).Yes.
That depends on what your end goal is. When I first developed this function, I wanted to quickly get allele counts for both alt and ref alleles. You can easily modify the above code to extract allele counts only for alt if that's what you want.
Also , I want to ask the fuc package work on windows and python? I can't download it.
It should work in Windows too. It's just a Python library. If you can use popular libraries like
panda
andnumpy
, you should be able to usefuc
as well. I recommend usingconda
to installfuc
.I'm using coda but the problem I think that need (pysam). And I can't download it until now.
Try this: