Hi all,
I'm try to do lizard genome annotation in galaxy. I used AUGUSTUS in galaxy, use chicken as trainingset, softmasking is TRUE, "predict genes on specific strands" is both, gene model is complete, GFF format output is GFF3 file, output options are predicted protein sequences (--protein) coding sequence as comment in the output file (--codingseq) CDS region (--cds). Other parameters are false. I got a gff3 file, a coding sequence file and a protein sequence file. But when I blast the first forty protein sequences in blastp, blastp showed that there is no significant result. I did the same thing for green anole genome, and the protein sequence can be found in blastp and 100% identified with the protein from green anole. Does anyone know why those protein sequence can't find in blastp? Even can't find a 50% similarity protein. Is it normal? How can I fix this problem? Thank you!
The 40 sequences are your query proteins, but you don't say what is your database of proteins you blasted against. Unrelated, instead of or in addition to describing the options you used you could just post the command(s) you executed (I don't know if galaxy gives you that though...)