These are the three bacterial data base i get hit from the snpEff database when I query which are these.
Genome Organism
Bacillus_pacificus_gca_001884025 Bacillus_pacificus_gca_001884025
Bacillus_pacificus_gca_003858675 Bacillus_pacificus_gca_003858675
Bacillus_pacificus_gca_006349595 Bacillus_pacificus_gca_006349595
To test the above i tool out the output from the DRAGEN Small Whole Genome Sequencing
MiSeq i100: sWGS(5 GB)
Project and downloaded the Bpacificus-ATCC10987-rep3-sWGS-MiSeqi100-241111.hard-filtered.vcf
file and filtered those which are only in the PASS category. I tried to annotate the filtered vcf file using SnpEff
snpEff Bacillus_pacificus_gca_003858675 Bpacificus-ATCC10987-rep3-sWGS-MiSeqi100-241111.hard-filtered.vcf > Bpacificus-ATCC10987_annot.vcf
I get something like this
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Bpacificus-ATCC10987-rep3-sWGS-MiSeqi100-241111
chr 293795 . C CA . PASS DP=54;MQ=250.00;FractionInformativeReads=1.000;SoftClipRatio=0.01;ANN=CA||MODIFIER|||||||||||||ERROR_CHROMOSOME_NOT_FOUND GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 1:66.12:0,54:1.0000:0,32:0,22:54:0,0,28,26:0,0,28,26
chr 394760 . T TG . PASS DP=71;MQ=250.00;FractionInformativeReads=0.986;SoftClipRatio=0.03;ANN=TG||MODIFIER|||||||||||||ERROR_CHROMOSOME_NOT_FOUND GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 1:66.42:0,70:1.0000:0,30:0,40:70:0,0,39,31:0,0,37,33
chr 399776 . A AT . PASS DP=72;MQ=250.00;FractionInformativeReads=1.000;SoftClipRatio=0.00;ANN=AT||MODIFIER|||||||||||||ERROR_CHROMOSOME_NOT_FOUND GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 1:66.44:0,72:1.0000:0,33:0,39:72:0,0,29,43:0,0,38,34
chr 630844 . G GT . PASS DP=56;MQ=250.00;FractionInformativeReads=0.982;SoftClipRatio=0.00;ANN=GT||MODIFIER|||||||||||||ERROR_CHROMOSOME_NOT_FOUND GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 1:66.14:0,55:1.0000:0,26:0,29:55:0,0,30,25:0,0,25,30
where I see the chromosome not found error. Even I tried in galaxy it the same result.
Any suggestion how do I match the chromosome name or it has some other issues in the vcf file which is causing the error.
in https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_001884025.1/ and if you look at the files under the FTP tab, you'll see many different names for each contig: not just "chr"
okay I will explore this and update...
by the way, you could also look in the snpeff data directory where data are usually grouped on the name of the chromosomes.
Im yet to explore that I was trying to find from their site repository, but this I will check