Question

building snpEff database

0

Entering edit mode

3.1 years ago

aabhordia ▴ 30

Hello everyone,

I am trying to build a new database in snpEff. I have followed all the steps given in manual. But, unfortunately could not build it

Every time I am trying, it shows these warnings:-

WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533975.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533983.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533991.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018535019.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018535460.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Too many 'WARNING_RARE_AA_POSSITION_NOT_FOUND' warnings, no further warnings will be shown.
.
.
.

00:04:10 Checking database using CDS sequences
00:04:10 Reading CDSs from file '/mnt/d/snpEff/data/Genome/cds.fa'...
00:04:12 done (137930 CDSs).
00:04:12 Comparing CDS...
        Labels:
                '+' : OK
                '.' : Missing
                '*' : Error

        ....................................................................................................
        ....................................................................................................

CDS check:      Genome       OK: 0   Warnings: 0     Not found: 98752        Errors: 0       Error percentage: NaN%

FATAL ERROR: No CDS checked. This is might be caused by differences in FASTA file transcript IDs respect to database's transcript's IDs.

Please suggest me how can I resolve this issue.

Thank you in advance

snpEff • 3.7k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 3.1 years ago by aabhordia ▴ 30

1

Entering edit mode

Hello!

Please consider adding some format to your question to make it more readable.

Having said that, I guess your error is related with the gff3/gtf you provide when building the database. Do you have CDS annotations for each transcript in the GTF? Do you have all the fields that a GTF usually have (such as a transcript line for each transcript, CDS, exon, genes, UTRs...)

ADD REPLY • link 3.1 years ago by iraun 6.2k

0

Entering edit mode

I have tried using both format gff3 or gtf and CDS.. but got the same error :(

ADD REPLY • link 3.1 years ago by aabhordia ▴ 30

1

Entering edit mode

same problem here -- the names of the transcripts / entries in the gff match. It won't take it. Wondering if there is a secret sauce to getting this to work :-)

ADD REPLY • link 3.1 years ago by karthi.sivaraman ▴ 40

1

Entering edit mode

I had a similar problem in the past, detailed here:

https://github.com/pcingola/SnpEff/issues/388

as it turns out the "officially" accepted GenBank file had some inconsistencies in it, in return SnpEff raised a number of quite confusing errors.

ADD REPLY • link 3.1 years ago by Istvan Albert 102k

0

Entering edit mode

did you check what is written there as last line in the error/warning output ?

Would be my first guess as well, that for some reason for instance the IDs in your fasta file do not match the ones in the DB.

ADD REPLY • link 3.1 years ago by lieven.sterck 15k

0

Entering edit mode

Yes I checked and it is same, because I am using the same reference for variant calling and building snpEff database along with gtf , CDS and protein (in FASTA format).

And you are right the reason is this only.

But unable to sort it out

ADD REPLY • link 3.1 years ago by aabhordia ▴ 30

1

Entering edit mode

time to backtrack everything then:

take for instance that first cannot find transcript ID
grep it from the CDS/fasta file
look it up in the DB

(in general check if you can find that ID in each step/input of this process)

ADD REPLY • link 3.1 years ago by lieven.sterck 15k

0

Entering edit mode

Hi, did someone solve it? Got the same problem( I downloaded CDS for my reference and added as cds.fa but it doesn't work: FATAL ERROR: No CDS checked. This is might be caused by differences in FASTA file transcript IDs respect to database's transcript's IDs. Transcript IDs from database (sample):

'gene-VT05_RS00100'
'TRANSCRIPT_gene-VT05_RS04785'
'gene-VT05_RS04140'
'TRANSCRIPT_gene-VT05_RS04300'
'TRANSCRIPT_gene-VT05_RS06965'
'TRANSCRIPT_gene-VT05_RS02120'
'rna-VT05_RS05630'
'TRANSCRIPT_gene-VT05_RS01275'
'TRANSCRIPT_gene-VT05_RS06960'
'TRANSCRIPT_gene-VT05_RS03455'
'TRANSCRIPT_gene-VT05_RS02125'
'TRANSCRIPT_gene-VT05_RS07810'
'TRANSCRIPT_gene-VT05_RS01270'
'TRANSCRIPT_gene-VT05_RS04780'
'TRANSCRIPT_gene-VT05_RS01285'
'TRANSCRIPT_gene-VT05_RS03465'
'TRANSCRIPT_gene-VT05_RS04795'
'TRANSCRIPT_gene-VT05_RS05640'
'TRANSCRIPT_gene-VT05_RS07825'
'TRANSCRIPT_gene-VT05_RS02135'
'TRANSCRIPT_gene-VT05_RS05645'
'TRANSCRIPT_gene-VT05_RS06970'

Transcript IDs from database (fasta file):

'1_cds_WP_003694219'
'lcl|NZ_CP012026.1_cds_WP_003694035.1_1054'
'1_cds_WP_003694215'
'1_cds_WP_003694217'
'lcl|NZ_CP012026.1_cds_2178'
'lcl|NZ_CP012026.1_cds_WP_020996798.1_208'
'1_cds_WP_003694208'
'lcl|NZ_CP012026.1_cds_WP_003692850.1_224'
'1_cds_WP_003694209'
'1_cds_WP_010951364'
'lcl|NZ_CP012026.1_cds_WP_020996903.1_885'
'lcl|NZ_CP012026.1_cds_2168'
'lcl|NZ_CP012026.1_cds_2169'
'lcl|NZ_CP012026.1_cds_2167'
'lcl|NZ_CP012026.1_cds_WP_003689992.1_1583'
'1_cds_WP_003694232'
'1_cds_WP_003694234'
'lcl|NZ_CP012026.1_cds_2194'
'1_cds_WP_003694238'
'1_cds_WP_003694230'
'lcl|NZ_CP012026.1_cds_WP_003690173.1_1711'
'lcl|NZ_CP012026.1_cds_WP_003693472.1_1400'

ADD REPLY • link 2.3 years ago by Eugenia • 0