simulation of chromosome
1
0
Entering edit mode
16 months ago
Eliza ▴ 30

Hi, I'm using the sim1000G package to simulate genotype data : https://adimitromanolakis.github.io/sim1000G/inst/doc/SimulatingFamilyData.html

in their example, they use data from the CHR4 region : enter image description here

But it only contains 567 SNPs from 95 patients is there a way to get data on more regions so I would have more SNPs?

I tried following their manual : enter image description here

I downloaded the data for chrY just for example because its is a small one anf it looks like this :

enter image description here

the problem is that the ID column is empty and there for it prevents me from using sim1000G as in the code it uses the ID of the varaiants :

vcf_file = file.path(examples_dir,"region.vcf.gz") vcf = readVCF( vcf_file, maxNumberOfVariants = 400 , min_maf = 0.01, max_maf = 1)#@param subset A subset of individual IDs to use for simulation

so i was also wondering where can i get this data BUT with the variants IDs as they used IGSR data base so there must be the variant IDs(http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/)

snps simulation chromosome • 965 views
ADD COMMENT
1
Entering edit mode
16 months ago
bk11 ★ 3.0k

For example: You can use more complete genotype data from 1000 Genome.

https://www.cog-genomics.org/plink/1.9/resources

If you want more SNPs, you can impute them in Michigan Imputation or TOPMed Imputation Servers.

https://imputationserver.sph.umich.edu/index.html#!

https://imputation.biodatacatalyst.nhlbi.nih.gov/#!

ADD COMMENT
1
Entering edit mode

Note that the plink 1.9 resource is from 1000 Genomes phase 1. 1000 Genomes phase 3 (https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg ) is more complete.

ADD REPLY
0
Entering edit mode

bk11 thank you, Just to clarify sim1000G can simulate SNPs which are in linkage disequilibrium from the input vcf file-which is what I want , I was wondering if the link you provided from plink gives SNPS for multiple individuals with SNPS that are in linkage disequilibrium ?

ADD REPLY
1
Entering edit mode

Data from 1KG includes ~2500 subjects from 26 population. These data have all genotyped SNPs across the genome. And yes you will find SNPs that are in linkage disequilibrium for sure. Just test them in the region that you are interested in.

ADD REPLY

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6