Hi there,
I am looking for hire someone for reasonable price
I have bam files for 22 subjects (human) mapped by Bowtie2 with hg-19.
1- I want SNP data vs reference genome (i.e hg19) from these samples. 2- Convert SNP genotype to 0,1,2 and 5. Where 0 is recessive homozygous and 2 dominant homozygous, 1 hetrozigous and 5 for missing. 3- Merge these 22 subjects in matrix as following:
Chromosome postion reference Subject1 Subject2 ……………………. Subject22 Ch1 335453 A 0 2 ...……………………. 0 Chr1 336565 G 1 5 ……...………………. 2 . .
. . Ch22 3546372 C 1 0 ……….....…………… 1
Thanks
I assume you mean 0: reference homozygous, 1: heterozygous variant and 2: homozygous variant. Dominant and recessive doesn't make sense on the variant level. A variant can have a dominant/recessive effect on a phenotype, but it's not a variant state.
The job you are asking for is quite easy.
Thank you WouterDeCoster for your answer! could you help me how to do it please or I would send you the data?
Bing
I assume this is whole exome sequencing data or whole genome sequencing data. The gatk best practices are quite well documented and commonly accepted way of doing data processing and variant calling. You will obtain vcf files after variant calling, which can be converted to the numerical output (plink format, right?) you ask for using vcftools
./vcftools --vcf input_data.vcf --plink --chr 1 --out output_in_plink
yes I want it like PLINK format, I see you put --chr 1, you mean I should convert them by chromosome? in other word can I convert whole chromosomes in one time?
I will do it and let you know what is going on!
Thank you for your help
According to https://vcftools.github.io/man_latest.html (see SITE FILTERING OPTIONS) that is just a method to filter the file by inclusion or exclusion of a certain chromosome and the command I posted is just an example I copy pasted from the documentation. It's probably not an essential argument to the function.