Matching and Converting Transcript/Protein id to Gene ID (Ensemb. Plants) - MCScanx input
0
1
Entering edit mode
5.1 years ago
venura ▴ 70

Hi,

I am trying to generate blast input file to MCScanX. What I did so far is ;

  • Downloaded latest solanum genome files (fasta amd gff3) from ensembl plants. Genome of interest is Potato.
  • I have created local blast (protein) database

However, now I am stuck because; -protein sequences are given based on the transcript id (as an example PGSC0003DMT400087821) but gene id (PGSC0003DMG400037392) is completely different from transcript id. So I wont be able to match the output with gene ID as required by the program.

end of the day what I need is protein sequences based on the gene ID rather than transcript id.

Is there a way to overcome come this naming mismatch? Thank you,

ensembl • 1.5k views
ADD COMMENT
0
Entering edit mode

But genes have multiple transcripts and therefore multiple proteins. If you link your protein sequence to gene IDs, you will get multiple sequences with one ID. How would that work?

ADD REPLY
0
Entering edit mode

I agree. But MCScanX accept only gene names. Workflow include blastp against the genome followed by the synteny analysis. What I want to do is identify segmental and tandem duplications in potato genome and MCScanX is the most recommended tool i found so far.

ADD REPLY
0
Entering edit mode

Hi, have you find the solution for it ?

ADD REPLY

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6