Paternity Testing from WGS Trio
2
2
Entering edit mode
6.2 years ago
ClkElf ▴ 50

Hello all,

I am a newbie in bioinformatics and would like to process a Trio (Parents and Index) in order to understand the concept. Here is my question: How can I analyse a WGS Trio in order to conduct paternity testing? Or is it possible to do this?

The question may sound stupid but I am very curious about it.

Many thanks!

WGS trio paternity test DNA-Seq • 3.5k views
ADD COMMENT
8
Entering edit mode
6.2 years ago

It is definitely possible to assess paternity from whole genome sequence (WGS) data. Paternity can probably be established with as little as a few dozen or maybe hundreds of well-chosen single nucleotide polymorphisms (SNPs). If you have decent WGS data you can expect to genotype millions of SNPs. So, paternity assessment would be very confident from such data. What data do you have available? Assuming that you have raw sequence data (e.g., fastq or unaligned bam files) you will first need to align to an appropriate reference genome.

There are several online tutorials to give you the general idea:

Note. Both the above tutorials are a little out of date. Current best practice would be to use bwa mem (available with current bwa installations). See http://bio-bwa.sourceforge.net/bwa.shtml

Once you have aligned your data you will probably want to mark duplicate reads and perform base quality score recalibration (BQSR). For some sample commands taking you through bwa mem alignment, duplicate marking, and BQSR see here: http://pmbio.org/module%202/0002/01/31/Alignment/

Next, you will want to run GATK variant caller. For a trio analysis I suggest you try running GATK HaplotypeCaller in GVCF mode and then performing joint genotyping. See here for a tutorial on this topic: https://gatkforums.broadinstitute.org/gatk/discussion/7869/howto-discover-variants-with-gatk-a-gatk-workshop-tutorial

This is all explained in great detail in the excellent GATK Best Practices for Variant Discovery workshops organized by the Broad. See https://drive.google.com/drive/folders/1U6Zm_tYn_3yeEgrD1bdxye4SXf5OseIt

Finally, assuming you get through the above. You should have a VCF with genotype calls for millions of SNPs for your trio. You then need to look at SNP genotype concordance between individuals in your trio to estimate kinship. This is itself a complicated area of research that I am not very familiar with. But, paternity should be one of the simpler relationships to prove. I believe the KING tool is popular for this and could take the above VCF as a starting point.

http://people.virginia.edu/~wc9c/KING/manual.html

ADD COMMENT
2
Entering edit mode

Or the implementation of the KING algorithm in VCFTools using the -relatedness2 argument https://vcftools.github.io/man_latest.html

ADD REPLY
6
Entering edit mode
6.2 years ago

Take a look at Peddy

ADD COMMENT
3
Entering edit mode

Using the updated version Somalier from Brent, I followed his recommendations in the README.md;

$ for bam in *.bam; do
    echo $bam;
    docker run -v $PWD:$PWD -w $PWD sommelier:v0.2.16 somalier extract \
        -d extracted/ --sites sites.hg38.vcf.gz -f hs38DH.fa $bam; 
done

$ docker run -v $PWD:$PWD -w $PWD somalier:v0.2.16 somalier relate --infer extracted/*.somalier

In my case, it was a Trio that I was not sure about the PED and that is why I used infer option. To create the docker image you can use the Dockerfile present in the repository.

docker build -t somalier:v0.2.16 .
docker run somalier:v0.2.16 somalier
ADD REPLY

Login before adding your answer.

Traffic: 1474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6