How to identify species specific genes?
1
0
Entering edit mode
6.0 years ago
KG ▴ 10

Hi,

I have genomes (fasta files) of two species and the corresponding GFF3 files. I need to identify a. genes that are unique (< 30% similarity) to each species, b. highly conserved and c. highly diverged. Can you please suggest how I can do that?

NB: these are eukaryotic genomes of ~15-16 Mb and contain very few introns.

Thank you for your help!

gene BLAST Species specific gene • 1.8k views
ADD COMMENT
2
Entering edit mode

I would just want to add a word of caution and that is that defining "species specific" is tricky business. You certainly need to keep in mind that that is only valid given the current context.

top tip: do the effort of checking whether genes are not missed in either annotation (quick tblastn will tell you so)

The answer given by h.mon is likely the best approach to start with and see what you get

ADD REPLY
2
Entering edit mode
6.0 years ago
h.mon 35k

Translate the genes into proteins, and perform all vs all blast searches with the two predicted proteomes. Then filter the blast output for query coverage and percent identity to get the highly conserved and highly diverged genes.

< 30% homology

There is no such thing, genes either are or aren't homologous. You probably mean < 30% similarity.

ADD COMMENT
0
Entering edit mode

Thanks for correcting me! Could you please elaborate more specifically or guide me to proper reference where it is discussed in details?

ADD REPLY
0
Entering edit mode

Translating the genes into proteins will depend on your GFF3 file, but if there are "CDS" features (these should), you can extract the CDS and translate them - see pointers at gff3 to CDS fasta .

Create a database with makeblastdb for each in silico proteome, and blast one against the other. Use -outfmt 6 and use perl / python / awk / your preferred language to parse the results. See Blast - Formatting Output on how to configure blast output to include fields you may find useful.

ADD REPLY
0
Entering edit mode

I would give a +10 for the homology/similarity remark though ;) (correct by OP in the meantime)

ADD REPLY

Login before adding your answer.

Traffic: 2738 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6