Dear all,
I have recently started bioinformatics, I am trying to build snpEff database for my strain of interest using mode such as GTF and GFF. But I am keep on having the same error in both of my strains. As per the definition, I believe theres is some header issue. But, I am not sure how or where exactly i need to change or does the error list to be exact.
Error:
FATAL ERROR: No CDS checked. This is might be caused by differences in FASTA file transcript IDs respect to database's transcript's IDs.
Transcript IDs from database (sample):
'TRANSCRIPT_gene-CAJCM15448_47400'
'TRANSCRIPT_gene-CAJCM15448_22100'
'TRANSCRIPT_gene-CAJCM15448_09130'
Transcript IDs from database (fasta file):
'lcl|BGOX01000002.1_cds_GBL49991.1_2265'
'lcl|BGOX01000001.1_cds_GBL47767.1_41'
'lcl|BGOX01000001.1_cds_GBL48203.1_477'
'lcl|BGOX01000001.1_cds_GBL48135.1_409'
'lcl|BGOX01000004.1_cds_GBL51971.1_4245'
My files are: cds.fa genes.gff genes.gtf protein.fa sequences.fa
So, now i would like to know, in which file i need to change the headers or is there anything i can do make the database build properly. Any suggestions, Please.
Did you find a solution to this? I have the same issue, even when I change the names to match.