Question

simulation of chromosome

0

Entering edit mode

19 months ago

Eliza ▴ 30

Hi, I'm using the sim1000G package to simulate genotype data : https://adimitromanolakis.github.io/sim1000G/inst/doc/SimulatingFamilyData.html

in their example, they use data from the CHR4 region : enter image description here

But it only contains 567 SNPs from 95 patients is there a way to get data on more regions so I would have more SNPs?

I tried following their manual : enter image description here

I downloaded the data for chrY just for example because its is a small one anf it looks like this :

enter image description here

the problem is that the ID column is empty and there for it prevents me from using sim1000G as in the code it uses the ID of the varaiants :

vcf_file = file.path(examples_dir,"region.vcf.gz") vcf = readVCF( vcf_file, maxNumberOfVariants = 400 , min_maf = 0.01, max_maf = 1)#@param subset A subset of individual IDs to use for simulation

so i was also wondering where can i get this data BUT with the variants IDs as they used IGSR data base so there must be the variant IDs(http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/)

snps simulation chromosome • 1.1k views

ADD COMMENT • link 19 months ago by Eliza ▴ 30

score 1 · Answer 1 · 2023-08-21

1

Entering edit mode

19 months ago

bk11 ★ 3.0k

For example: You can use more complete genotype data from 1000 Genome.

https://www.cog-genomics.org/plink/1.9/resources

If you want more SNPs, you can impute them in Michigan Imputation or TOPMed Imputation Servers.

https://imputationserver.sph.umich.edu/index.html#!

https://imputation.biodatacatalyst.nhlbi.nih.gov/#!

ADD COMMENT • link 19 months ago by bk11 ★ 3.0k

1

Entering edit mode

Note that the plink 1.9 resource is from 1000 Genomes phase 1. 1000 Genomes phase 3 (https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg ) is more complete.

ADD REPLY • link 19 months ago by chrchang523 11k

0

Entering edit mode

bk11 thank you, Just to clarify sim1000G can simulate SNPs which are in linkage disequilibrium from the input vcf file-which is what I want , I was wondering if the link you provided from plink gives SNPS for multiple individuals with SNPS that are in linkage disequilibrium ?

ADD REPLY • link 19 months ago by Eliza ▴ 30

1

Entering edit mode

Data from 1KG includes ~2500 subjects from 26 population. These data have all genotyped SNPs across the genome. And yes you will find SNPs that are in linkage disequilibrium for sure. Just test them in the region that you are interested in.

ADD REPLY • link 19 months ago by bk11 ★ 3.0k