How to choose correct VCF for Human reference genome?

0

Entering edit mode

6.9 years ago

marongiu.luigi ▴ 750

Dear all,

based on this post, it is clear that the human reference sequence provided by NCBI is the best, as I also experienced computationally (see this post). My question is now on the downstream application. What would be the correct VCF associated with the GRCh38 file GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz?

Is clinvar.vcf.gz the right one?

And what about the Homo_sapiens.GRCh38.dna.toplevel.fa.gz? Would it use the same VCF or there is another specific one?

Finally, how can I check beforehand if the headings of the reference fasta file match with those of the VCF file? This to avoid problems such as this or this.

Thank you.

alignment genome VCF • 1.8k views

ADD COMMENT • link 6.9 years ago by marongiu.luigi ▴ 750

1

Entering edit mode

Hello marongiu.luigi ,

the only things you have to take care about are:

The vcf is based on hg38 if you aligned to one of the hg38 reference genomes or hg19 respectivly
the naming convention for the chromosomes is the same as in the reference you've aligned to

fin swimmer

ADD REPLY • link 6.9 years ago by finswimmer 16k

1

Entering edit mode

Is clinvar.vcf.gz the right one?

This totally depends on the question you want to answer. "Technically correct" is everything that is based on the same reference genome.

ADD REPLY • link 6.9 years ago by ATpoint 89k

Login before adding your answer.