Entering edit mode
5.1 years ago
venura
▴
70
Hi,
I am trying to generate blast input file to MCScanX. What I did so far is ;
- Downloaded latest solanum genome files (fasta amd gff3) from ensembl plants. Genome of interest is Potato.
- I have created local blast (protein) database
However, now I am stuck because; -protein sequences are given based on the transcript id (as an example PGSC0003DMT400087821) but gene id (PGSC0003DMG400037392) is completely different from transcript id. So I wont be able to match the output with gene ID as required by the program.
end of the day what I need is protein sequences based on the gene ID rather than transcript id.
Is there a way to overcome come this naming mismatch? Thank you,
But genes have multiple transcripts and therefore multiple proteins. If you link your protein sequence to gene IDs, you will get multiple sequences with one ID. How would that work?
I agree. But MCScanX accept only gene names. Workflow include blastp against the genome followed by the synteny analysis. What I want to do is identify segmental and tandem duplications in potato genome and MCScanX is the most recommended tool i found so far.
Hi, have you find the solution for it ?