Question

Whole Genome Sequencing Data Annotation

0

Entering edit mode

5.2 years ago

ashley_hertzog • 0

Hello experts,

My centre is working on an analysis pipeline for whole genome sequencing data. The sequencing and alignment are being performed off-site and my centre will be receiving VCF files for annotation and curation of variants.

There is little to no literature on validating pipelines for WGS. If anyone has any, would you kindly share?

Does anyone have a proposed pipeline for annotation? We will be using Alissa 5.3 Interpret and were thinking of initially filtering variants out by read depth and then sorting them into variant type (SNV, CNV, and SV). Or would it be better to have two separate pipelines for annotation? One for CNVs and one for SNVs?

Following the variant type filter for CNVs, would it then make sense to sort them by size (> or < 5kb)?

I was just hoping to bounce some ideas back and forth as this is a first for our centre and we currently do not have access to a bioinformatician.

Thank you for any and all help!

whole-genome-sequencing CNV • 2.0k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 5.2 years ago by ashley_hertzog • 0

1

Entering edit mode

https://github.com/imgag/megSAP - this is how we done it at our clinics

ADD REPLY • link 5.2 years ago by German.M.Demidov ★ 3.0k

0

Entering edit mode

Have you checked gatk tool?

[https://gatk.broadinstitute.org/hc/en-us][1]

ADD REPLY • link 5.2 years ago by Mehmet ▴ 820

0

Entering edit mode

Sarek which is a Nextflow pipeline is quite nice for WGS analysis.

ADD REPLY • link 5.2 years ago by husensofteng ▴ 410

0

Entering edit mode

We will be using Alissa 5.3 I

Looks like you are planning to use a commercial tool for the annotation of VCF. If you have no command line expertise/access to unix servers then this may be the way to go. All the tools being mentioned in this thread will require you to have access to and some expertise with command line.

There is little to no literature on validating pipelines for WGS.

Since you are not going to do primary analysis of data there is no validation of that part of the pipeline. There is literature available for pipeline validation (paper1, paper2 etc).

GDC has a defined DNAseq analysis pipeline. GATK best practices workflows are a good place to start as well.

ADD REPLY • link 5.2 years ago by GenoMax 153k

0

Entering edit mode

Hi everyone,

You've given me a bit more direction and I'm feeling less lost now. I definitely do not have command line expertise or access to unix servers. The purpose of this exercise is to first implement the variant annotation in a research setting to then be transitioned into a diagnostic workflow for rapid WGS for acute care patients.

Really appreciate the feedback!

ADD REPLY • link 5.2 years ago by ashley_hertzog • 0

1

Entering edit mode

I can also give an ad about the tool I wrote during my PhD studies - https://github.com/imgag/ClinCNV . It is to detect CNVs in clinical settings (1KB for 30x, one can go into higher resolution with higher coverage, but not more than 500bp I'd say - files become huge). It works in maybe 4 hospitals as for now. It can be not the best tool in terms of precision/recall (but it is surely decent) - but I got a massive feedback from clinicians and was implementing everything they asked me. Several hundreds of patients were diagnosed with it in our clinics only.

Here is the presentation: https://github.com/imgag/ClinCNV/blob/master/doc/ClinCNV_thesis_presentation.pdf

ADD REPLY • link 5.2 years ago by German.M.Demidov ★ 3.0k

score 3 · Answer 1 · 2020-07-20

3

Entering edit mode

5.2 years ago

Shalu Jhanwar ▴ 540

SnpEff (http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf), ANNOVAR https://www.nature.com/articles/nprot.2015.105, and Variant Effect Predictor https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0974-4 are some of the most commonly used tools for variant annotation.

ADD COMMENT • link 5.2 years ago by Shalu Jhanwar ▴ 540