Question

Extracting homologous proteins from genome ( blat or exonerate)

2

Entering edit mode

5.0 years ago

ricardoguerreiro2121 ▴ 80

Hi,

I would like to quickly extract proteins from various novel plant genomes, by finding homology with documented proteins (ex: A. thaliana), for the purpose of phylogenetic analysis.

A recent paper works with an old tool, Blat, that does just that. But the results of blat are a table of hits (with coordinates). How do transform this into proteins? I have created a script that parses my query DNA sequence based on the hit coordinates, but this doesn't seem ideal, I would have to translate the DNA there are 6 diferent ways of translating..

Does anyone know blat here? Or any nice easy alternative? Exonerate seems to do the same and also outputs alignments against my putative translated proteins, but I don't know how to extract anything from this format..

EDIT: I'm getting close to it with:

exonerate --model  protein2genome  araport_genes.pep.fasta b_repanda.fasta    --showalignment no --showvulgar no --ryo ">%ti (%tab - %tae)\n%tas\n"

Cheers, Ricardo

blat genome proteins phylogeny exonerate • 1.2k views

ADD COMMENT • link 5.0 years ago by ricardoguerreiro2121 ▴ 80

score 1 · Answer 1 · 2020-01-13

From the paper mentioned:

Contig identity was assigned with Blat v.35 using translated DNA against the respective exon reference sets, selecting the highest scoring hit, and contigs with score > 20 and percentage identity > 75% were retained

The author didn't align the nucleotides from the genome, they translated the contigs translating it to the respective proteins.

For your analysis, I think you can annotate your sequences using the closest species, then use Ensembl Plants to retrieve the phylogenetic group and add your sequence to extend the phylogeny

score 0 · Answer 2 · 2020-01-14

0

Entering edit mode

5.0 years ago

ricardoguerreiro2121 ▴ 80

I think I have found my ideal answer:

Run exonerate

Then in Python:

qresult = SearchIO.parse("exonerate_outfile", 'exonerate-text')

for i in qresult:
    hsp = i[0][0]    

    print("".join(list(hsp.hit_all[0])))

ADD COMMENT • link 5.0 years ago by ricardoguerreiro2121 ▴ 80