How To Get Non-Snp Sites Or Invariant Sites Based On 1000 Genomes Project
2
0
Entering edit mode
11.4 years ago
Gangcai ▴ 230

Hi everyone, I need to find the human genomic sites which are invariant among all the individuals (identified by sequencing in all individuals and also without SNP). I know the sites which are not included in the 38 million SNPs identified by the 1000 genome project are possible candidates. However, not all of them are necessarily sequenced or covered in all individuals. One possible way to get such invariant sites is to first find the sites which are covered in all individuals and then subtract the SNP sites. Does anybody know how to get such information or where to get it (exclude extracting such information from the raw mapping data)? Thanks very much.

1000genomes snp • 5.3k views
ADD COMMENT
0
Entering edit mode

The vcf file by definition can record monomorphic sites (sites without alternate alleles), however the vcf files from the 1000 genomes release (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/) seems only contain SNPs, INDELS, SVS but no monomorphic or invariant sites.

ADD REPLY
1
Entering edit mode
11.4 years ago

use GATK SelectVariants to keep the variant where all the samples have been covered; something like (not tested):

-select 'vc.getGenotype("SAMPLE1").isCalled() && vc.getGenotype("SAMPLE2").isCalled() && vc.getGenotype("SAMPLE3").isCalled()  .... etc...'

see also GATK multi-sample VCF VariantFiltration

use vcftools to remove the SNP from the VCF of the 1K genomes projects http://vcftools.sourceforge.net/docs.html#isec

ADD COMMENT
0
Entering edit mode

Hi Pierre, thanks for your reply. For the first step, which should be the input file? I have checked the 1000 genomes release files, they seems only contain the sequencing information for variant sites (eg SNPs, INDELs, SVS) but no invariant sites or monomorphic sites.

ADD REPLY
0
Entering edit mode
11.1 years ago

hello Gangcai, did u get any way to get non-SNP or invariant sites based on 1000 genome project ??? I am trying Unified Genotyper of GATK-2.7-2 for the same, will it give any direction for finding invariants in exomes ???

ADD COMMENT

Login before adding your answer.

Traffic: 2937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6