Tools To Identify Compound Heterozygous Variants?
4
7
Entering edit mode
12.7 years ago
Ahdf-Lell-Kocks ★ 1.6k

Are there any tools to identify compound heterozygous variants in resequencing data? By compound heterozygosity I mean a catalog of multiple deleterious mutations (PolyPhen? or SIFT?) that happen to fall as hets into different locations of the same genes.

software • 11k views
ADD COMMENT
0
Entering edit mode

To call a 'compound heterozygous' I think you need the parents otherwise it might be just double hets. What are the files you have: BAM , VCF, etc?

ADD REPLY
3
Entering edit mode
12.3 years ago

My lab is working on a project called gemini for exploring genomic variation in the context of genome annotations. Briefly, you load a VCF (eventually BCF2) into our local sqlite database framework and from there you can explore the variation therein in several ways. Take a look through the README for details. The project is still in an alpha state, but we do have a tool for detecting compound heterozygotes from VCF files that have phased genotypes. This section of the README covers it, but the workflow would be as follows:

# load a VCF that has been annotated by snpEff
> gemini load -v my.vcf -t snpEff my.vcf.db

# extract candidate compound hets
> gemini comp_hets my.vcf.db
sample  gene    het1    het2
NA19002 GTSE1   chr22,46722400,46722401,G,A,G|A,stop_gain,exon_22,0.005,1   chr22,46704499,46704500,C,A,A|C,stop_gain,exon_22,0.005,0

# restrict to compound hets that are clearly loss-of-function
> gemini comp_hets --only_lof my.vcf.db
ADD COMMENT
0
Entering edit mode

A big plus one to GEMINI for this. Also, if you don't have phased genotypes or a trio you can get all of the potential compound hets, which is just all of the genes with multiple heterozygous calls in them.

ADD REPLY
0
Entering edit mode

Hi Aaronquinlan,

I am trying to use gemini comp_hets tool to filter for compound heterozygous mutations from whole exome sequencing data. It worked perfectly on one of my datasets, but it seemed to be failed on the other. So I am wandering is there any prerequisite for the VCF file, such as any specific TAG in FORMAT field is required for your phasing step? And I am guessing the problem may due to the absence of phased genotype in that VCF file. Could you indicate the specific amount of phased genotypes in the VCF in order to have the comp_hets tool work?

More specifically, I was able to load VCF (after normalization and snpeff) and PED files. The comp_hets tool does not produce any error but just being running for more than 20h hours and the size of the output file stayed at 0. Could that be the VCF is too large (there are more than 240 samples in the VCF)? But the thing is the output file does not grow. I also tried to limit the analysis to a single family, then the program was able to finish but still gives an empty output.

Could you please help figure out any possible reasons for my situation? Many thanks!!

ADD REPLY
0
Entering edit mode

comp_hets doesn't require a phased VCF, it does the phasing for you based on the PED file.

ADD REPLY
1
Entering edit mode
10.6 years ago

Hello,

I have written a software called genmod to annotate the common genetic models in VCF files, including compound heterozygotes.

It works with arbitrary pedigrees, including single individuals and can also take phasing into account.

Hope this can help out!

ADD COMMENT
0
Entering edit mode

Both of method describe here for downloading and installing software is not working.

ADD REPLY
0
Entering edit mode

Hmm, in what steps do you have trouble?

You can mail me or post an issue on github and I will try to help you out.

It is working for several users...

ADD REPLY
0
Entering edit mode

When I am trying via PIP then getting this errors.

creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/pysam
creating build/temp.linux-x86_64-2.7/samtools
creating build/temp.linux-x86_64-2.7/samtools/misc
creating build/temp.linux-x86_64-2.7/samtools/bcftools
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE= -Isamtools -Ipysam -I/usr/include/python2.7 -c pysam/csamtools.c -o build/temp.linux-x86_64-2.7/pysam/csamtools.o -Wno-error=declaration-after-statement
pysam/csamtools.c:8:22: fatal error: pyconfig.h: No such file or directory
 #include "pyconfig.h"
                      ^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status

and getting this error while using git too

git clone git@github.com:moonso/genmod.git
Cloning into 'genmod'...
Warning: Permanently added the RSA host key for IP address '192.30.252.128' to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
ADD REPLY
0
Entering edit mode
12.7 years ago

The VAAST paper described how you can identify compound heterozygous mutations. It can also handle locus heterogeneity. Sorry for the loose language.

Yandell et al. Genome Research 2011

ADD COMMENT
1
Entering edit mode

Are you sure this addresses the question? Locus heterogeneity != compound heterozygote. Maybe you could supply an example of how to do this with VAAST?

ADD REPLY
0
Entering edit mode
10.8 years ago
ff.cc.cc ★ 1.3k

...just to enrich the survey/discussion...I recently faced the same task,

and found also this paper: Filtering for Compound Heterozygous Sequence Variants in Non-Consanguineous Pedigrees

Their tool is online here

ADD COMMENT
0
Entering edit mode

Just like "yet another short-read mapper" I think we need a "yet another web-only tool" category. Don't get me wrong, I think making web tools is great in order to have good tools used by more than just bioinformaticians, but writing web-only services is getting tiring. Many of us, for legal reasons, should not be uploading our samples to outside servers.

ADD REPLY

Login before adding your answer.

Traffic: 1490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6