Entering edit mode
2.6 years ago
aabhordia
▴
30
Hello everyone,
I am trying to build a new database in snpEff. I have followed all the steps given in manual. But, unfortunately could not build it
Every time I am trying, it shows these warnings:-
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533975.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533983.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018533991.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018535019.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Cannot find transcript 'XP_018535460.1'
WARNING_RARE_AA_POSSITION_NOT_FOUND: Too many 'WARNING_RARE_AA_POSSITION_NOT_FOUND' warnings, no further warnings will be shown.
.
.
.
00:04:10 Checking database using CDS sequences
00:04:10 Reading CDSs from file '/mnt/d/snpEff/data/Genome/cds.fa'...
00:04:12 done (137930 CDSs).
00:04:12 Comparing CDS...
Labels:
'+' : OK
'.' : Missing
'*' : Error
....................................................................................................
....................................................................................................
CDS check: Genome OK: 0 Warnings: 0 Not found: 98752 Errors: 0 Error percentage: NaN%
FATAL ERROR: No CDS checked. This is might be caused by differences in FASTA file transcript IDs respect to database's transcript's IDs.
Please suggest me how can I resolve this issue.
Thank you in advance
Hello!
Please consider adding some
format
to your question to make it more readable.Having said that, I guess your error is related with the
gff3/gtf
you provide when building the database. Do you haveCDS
annotations for each transcript in the GTF? Do you have all the fields that a GTF usually have (such as a transcript line for each transcript, CDS, exon, genes, UTRs...)I have tried using both format gff3 or gtf and CDS.. but got the same error :(
same problem here -- the names of the transcripts / entries in the gff match. It won't take it. Wondering if there is a secret sauce to getting this to work :-)
I had a similar problem in the past, detailed here:
https://github.com/pcingola/SnpEff/issues/388
as it turns out the "officially" accepted GenBank file had some inconsistencies in it, in return SnpEff raised a number of quite confusing errors.
did you check what is written there as last line in the error/warning output ?
Would be my first guess as well, that for some reason for instance the IDs in your fasta file do not match the ones in the DB.
Yes I checked and it is same, because I am using the same reference for variant calling and building snpEff database along with gtf , CDS and protein (in FASTA format).
And you are right the reason is this only.
But unable to sort it out
time to backtrack everything then:
(in general check if you can find that ID in each step/input of this process)
Hi, did someone solve it? Got the same problem( I downloaded CDS for my reference and added as cds.fa but it doesn't work: FATAL ERROR: No CDS checked. This is might be caused by differences in FASTA file transcript IDs respect to database's transcript's IDs. Transcript IDs from database (sample):
Transcript IDs from database (fasta file):