Bacterial VCF file annotation using snpEff error
0
0
Entering edit mode
1 day ago
1769mkc ★ 1.2k

These are the three bacterial data base i get hit from the snpEff database when I query which are these.

Genome  Organism
    Bacillus_pacificus_gca_001884025    Bacillus_pacificus_gca_001884025        
    Bacillus_pacificus_gca_003858675    Bacillus_pacificus_gca_003858675        
    Bacillus_pacificus_gca_006349595    Bacillus_pacificus_gca_006349595    

To test the above i tool out the output from the DRAGEN Small Whole Genome Sequencing

MiSeq i100: sWGS(5 GB) 

Project and downloaded the Bpacificus-ATCC10987-rep3-sWGS-MiSeqi100-241111.hard-filtered.vcf

file and filtered those which are only in the PASS category. I tried to annotate the filtered vcf file using SnpEff

snpEff Bacillus_pacificus_gca_003858675  Bpacificus-ATCC10987-rep3-sWGS-MiSeqi100-241111.hard-filtered.vcf > Bpacificus-ATCC10987_annot.vcf

I get something like this

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Bpacificus-ATCC10987-rep3-sWGS-MiSeqi100-241111
chr 293795  .   C   CA  .   PASS    DP=54;MQ=250.00;FractionInformativeReads=1.000;SoftClipRatio=0.01;ANN=CA||MODIFIER|||||||||||||ERROR_CHROMOSOME_NOT_FOUND   GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  1:66.12:0,54:1.0000:0,32:0,22:54:0,0,28,26:0,0,28,26
chr 394760  .   T   TG  .   PASS    DP=71;MQ=250.00;FractionInformativeReads=0.986;SoftClipRatio=0.03;ANN=TG||MODIFIER|||||||||||||ERROR_CHROMOSOME_NOT_FOUND   GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  1:66.42:0,70:1.0000:0,30:0,40:70:0,0,39,31:0,0,37,33
chr 399776  .   A   AT  .   PASS    DP=72;MQ=250.00;FractionInformativeReads=1.000;SoftClipRatio=0.00;ANN=AT||MODIFIER|||||||||||||ERROR_CHROMOSOME_NOT_FOUND   GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  1:66.44:0,72:1.0000:0,33:0,39:72:0,0,29,43:0,0,38,34
chr 630844  .   G   GT  .   PASS    DP=56;MQ=250.00;FractionInformativeReads=0.982;SoftClipRatio=0.00;ANN=GT||MODIFIER|||||||||||||ERROR_CHROMOSOME_NOT_FOUND   GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  1:66.14:0,55:1.0000:0,26:0,29:55:0,0,30,25:0,0,25,30

where I see the chromosome not found error. Even I tried in galaxy it the same result.

Any suggestion how do I match the chromosome name or it has some other issues in the vcf file which is causing the error.

SnpEff • 234 views
ADD COMMENT
2
Entering edit mode

in https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_001884025.1/ and if you look at the files under the FTP tab, you'll see many different names for each contig: not just "chr"

#gtf-version 2.2
#!genome-build ASM188402v1
#!genome-build-accession NCBI_Assembly:GCF_001884025.1
#!annotation-date 04/11/2024 13:00:15
#!annotation-source NCBI RefSeq GCF_001884025.1-RS_2024_04_11
NZ_MACD01000002.1
NZ_MACD01000003.1
NZ_MACD01000004.1
NZ_MACD01000005.1
NZ_MACD01000006.1
NZ_MACD01000007.1
NZ_MACD01000008.1
NZ_MACD01000009.1
NZ_MACD01000010.1
NZ_MACD01000011.1
NZ_MACD01000012.1
NZ_MACD01000013.1
NZ_MACD01000014.1
NZ_MACD01000015.1
NZ_MACD01000016.1
ADD REPLY
0
Entering edit mode

okay I will explore this and update...

ADD REPLY
1
Entering edit mode

by the way, you could also look in the snpeff data directory where data are usually grouped on the name of the chromosomes.

ADD REPLY
0
Entering edit mode

Im yet to explore that I was trying to find from their site repository, but this I will check

ADD REPLY

Login before adding your answer.

Traffic: 2029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6