Dear all,
based on this post, it is clear that the human reference sequence provided by NCBI is the best, as I also experienced computationally (see this post).
My question is now on the downstream application. What would be the correct VCF associated with the GRCh38 file GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
?
Is clinvar.vcf.gz the right one?
And what about the Homo_sapiens.GRCh38.dna.toplevel.fa.gz? Would it use the same VCF or there is another specific one?
Finally, how can I check beforehand if the headings of the reference fasta file match with those of the VCF file? This to avoid problems such as this or this.
Thank you.
Hello marongiu.luigi ,
the only things you have to take care about are:
vcf
is based on hg38 if you aligned to one of the hg38 reference genomes or hg19 respectivlyfin swimmer
This totally depends on the question you want to answer. "Technically correct" is everything that is based on the same reference genome.