Matching and Converting Transcript/Protein id to Gene ID (Ensemb. Plants)

Matching and Converting Transcript/Protein id to Gene ID (Ensemb. Plants) - MCScanx input

1

Entering edit mode

5.1 years ago

venura ▴ 70

Hi,

I am trying to generate blast input file to MCScanX. What I did so far is ;

Downloaded latest solanum genome files (fasta amd gff3) from ensembl plants. Genome of interest is Potato.
I have created local blast (protein) database

However, now I am stuck because; -protein sequences are given based on the transcript id (as an example PGSC0003DMT400087821) but gene id (PGSC0003DMG400037392) is completely different from transcript id. So I wont be able to match the output with gene ID as required by the program.

end of the day what I need is protein sequences based on the gene ID rather than transcript id.

Is there a way to overcome come this naming mismatch? Thank you,

ensembl • 1.5k views

ADD COMMENT • link updated 9 months ago by bioinfo223 ▴ 10 • written 5.1 years ago by venura ▴ 70

0

Entering edit mode

But genes have multiple transcripts and therefore multiple proteins. If you link your protein sequence to gene IDs, you will get multiple sequences with one ID. How would that work?

ADD REPLY • link 5.1 years ago by Emily 24k

0

Entering edit mode

I agree. But MCScanX accept only gene names. Workflow include blastp against the genome followed by the synteny analysis. What I want to do is identify segmental and tandem duplications in potato genome and MCScanX is the most recommended tool i found so far.