Dear Biostar community,
I have a targeted resequencing experiment (Illumina) with the goal to detect mutations in certain genes. For this purpose, to align reads, I used the GRCh37 genome from NCBI (https://www.ncbi.nlm.nih.gov/genome/guide/human/). I used bcftools to call the variants and until this step everything was fine. However, when I reached the annotation step and used a prebuilt database from snpEff with the command:
java -Xmx32g -jar snpEff.jar GRCh37.75 variants_norm.vcf > annotated.vcf
It does not produce an appropriate annotation .vcf file. Instead .vcf file is full of "ERROR_CHROMOSOME_NOT_FOUND"
So far, it is one of the most common problems described in snpEff documentation: https://pcingola.github.io/SnpEff/se_troubleshooting/
Chromosome names in genome .fasta file are looked as
NC_000001.10 Homo sapiens chromosome 1, GRCh37.p13 Primary Assembly
It seems to me, ensemble names were used in a pipeline.
Could you help me please, how can I convert the reference genome to a format that snpEff can process or where I can find a release that could suit the snpEff variant annotation format? I tried to search for a solution and have not found it.
Thank you in advance
Thank you, it worked