Question

Find Pathogenic Variants

0

Entering edit mode

19 months ago

davidmaimoun ▴ 50

Hi dear community,

I don't have any experience in variant calling, and I have to solve this problem:

Using the most recent VCF file describing ClinVar variants and a bed/gff file of the coding sequence of curated RefSeq genes, write a script that outputs all the pathogenic and likely pathogenic variants that are found inside genes and have coverage less than 10x in the BAM file. the script should output a table with each ClinVar variant’s chromosome, genomic position, reference and alternate alleles, coverage in the BAM file and all the RefSeq transcripts that are affected by the variant.

I have an access to a basespace project which contains analysis (vcf, bam...) of some biosamples (s01-NFE-CEX-NA12878-demo...)

And I really don't know how to start to solve this problem

I will glad to get some help

Thank you very much

vcf variant-calling • 1.8k views

ADD COMMENT • link 19 months ago by davidmaimoun ▴ 50

0

Entering edit mode

Is there a reason why you are not willing to give it a try in the first instance?

ADD REPLY • link 19 months ago by Sej Modha 5.3k

0

Entering edit mode

I don't know what to do, for instance how to get the most recent VCF file describing ClinVar variants and a bed/gff file of the coding sequence of curated RefSeq genes I am new in the field

ADD REPLY • link 19 months ago by davidmaimoun ▴ 50

2

Entering edit mode

You can find ClinVar VCF data here, you probably want the GRCh38 build : https://ftp.ncbi.nlm.nih.gov/pub/clinvar/

GFF file for GRCh38 genome build is here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.26_GRCh38/GCF_000001405.26_GRCh38_genomic.gff.gz

You can convert the coding regions from GRCh38 to BED format using: all coding regions .bed file hg38 Whole Genome Sequencing

ADD REPLY • link 19 months ago by GenoMax 147k

0

Entering edit mode

Thank you very much! I have my vcf and gff files, and converted them to dataframe. I have a Bam file from an analysis. But I don't know how to link them together in order to find pathogenic variant. I need to find pathogenic variants that are found inside genes and have coverage less than 10x in the BAM file.

Do you have an idea?

Thank you

ADD REPLY • link 19 months ago by davidmaimoun ▴ 50

0

Entering edit mode

Thank you very much for the help!

ADD REPLY • link 19 months ago by davidmaimoun ▴ 50

score 1 · Answer 1 · 2023-04-13

1

Entering edit mode

19 months ago

Gabriel ▴ 10

I would recommend the book "Bioinformatics and Functional Genomics" which has a chapter (chapter 9) dedicated to it. There are several packages available to perform this analysis. You can choose one and try it out for yourself.(e.g. GATK, HaplotypeCaller, SAMtools.)

ADD COMMENT • link 19 months ago by Gabriel ▴ 10