Entering edit mode
9.1 years ago
bingnas
▴
10
Hi all
I called six SNP's files as individual, I want to merge them such that considering the position and location. I want to do that for converting them as integer numbers 0,1,2.
The question is:
Could anyone please help me how I can merge them as following?
REF is hg19 , ALT1 is first patient, ALT2 second patient ... so on till ALT6 sixth patient.
#CHROM POS REF ALT1 ALT2 ALT3 ALT4 ALT5 ALT6
chrM 3 T C G A C T C
chrM 4 C A C T A G C
chrM 150 T C T C C G A
chrM 195 C T C T C A T
chrM 410 A T T C C T C
chrM 711 G A C T T G G
chrM 1890 G . C T C A C
chrM 2354 C T T C A G C
chrM 2485 C T A G G A C,T
chrM 3457 T C G A G A C
chrM 4162 C T T A T C,A A
chrM 4217 T C G T A G T
chrM 4918 A G C . G A A
chrM 5581 C T G A A G .
chrM 8698 G A G A A C A
chrM 8702 G A G C G C A
chrM 9378 G A C T G A C
chrM 9541 C T C T C T C
chrM 10284 A G G A A C C
chrM 10399 G A G A A G T
chrM 10464 T C C G T C G
chrM 10820 G A G T . C A
chrM 10874 C T G T G C,T G
chrM 11018 C T C T A C C
chrM 11252 A G . C G A T
chrM 11723 C T . A C T T
chrM 11813 A G G A C A C
Is that possible? I wrote period because someone told me you should have these periods if the positions there!
Thank you in advance
Bing
If I understand correctly you want to recode SNPs from ACTG to 0,1,2 ?
You can use plink. First convert VCF into plink format, then run plink --recode12. If you are more comfortable working with vcf, you can convert it back to VCF again
Thank you stolarek for you answer, yes you got what i want. I will try
Bing
Hi ebrown1955,
Thank you very much for your a great answer, I would like to show you what I got from first command (CombineVariants):
and from second command (
variantsToTable
) is:could you please tell me what I should do now? I would give Dominant Homozygous 2 and recessive Homozygous 0 and give Heterozygous 1.
Thank you
Bing
You could write a Python program to do this for you. You'll have to parse each line one by one separate each genotype by "/" and check to see if it's homozygous or heterozygous. I have a script that tells if a genotype includes the alternative allele and can be modified to do what you'd like it to do.
Thank you ebown1955 for your help
Yes please, I would like to see that code if you do not mind!
To be honest I am not familiar with bioinformatics, this is first time dealing with SNP's data, and would to convert the data to 0,1,2 and 5 that I can use Regression Analysis.
Bing