Find Pathogenic Variants
1
0
Entering edit mode
19 months ago
davidmaimoun ▴ 50

Hi dear community,

I don't have any experience in variant calling, and I have to solve this problem:

Using the most recent VCF file describing ClinVar variants and a bed/gff file of the coding sequence of curated RefSeq genes, write a script that outputs all the pathogenic and likely pathogenic variants that are found inside genes and have coverage less than 10x in the BAM file. the script should output a table with each ClinVar variant’s chromosome, genomic position, reference and alternate alleles, coverage in the BAM file and all the RefSeq transcripts that are affected by the variant.

I have an access to a basespace project which contains analysis (vcf, bam...) of some biosamples (s01-NFE-CEX-NA12878-demo...)

And I really don't know how to start to solve this problem

I will glad to get some help

Thank you very much

vcf variant-calling • 1.8k views
ADD COMMENT
0
Entering edit mode

Is there a reason why you are not willing to give it a try in the first instance?

ADD REPLY
0
Entering edit mode

I don't know what to do, for instance how to get the most recent VCF file describing ClinVar variants and a bed/gff file of the coding sequence of curated RefSeq genes I am new in the field

ADD REPLY
2
Entering edit mode

You can find ClinVar VCF data here, you probably want the GRCh38 build : https://ftp.ncbi.nlm.nih.gov/pub/clinvar/

GFF file for GRCh38 genome build is here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.26_GRCh38/GCF_000001405.26_GRCh38_genomic.gff.gz

You can convert the coding regions from GRCh38 to BED format using: all coding regions .bed file hg38 Whole Genome Sequencing

ADD REPLY
0
Entering edit mode

Thank you very much! I have my vcf and gff files, and converted them to dataframe. I have a Bam file from an analysis. But I don't know how to link them together in order to find pathogenic variant. I need to find pathogenic variants that are found inside genes and have coverage less than 10x in the BAM file.

Do you have an idea?

Thank you

ADD REPLY
0
Entering edit mode

Thank you very much for the help!

ADD REPLY
1
Entering edit mode
19 months ago
Gabriel ▴ 10

I would recommend the book "Bioinformatics and Functional Genomics" which has a chapter (chapter 9) dedicated to it. There are several packages available to perform this analysis. You can choose one and try it out for yourself.(e.g. GATK, HaplotypeCaller, SAMtools.)

ADD COMMENT

Login before adding your answer.

Traffic: 3103 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6