Hi,
I'm working on the annotation of a fungal genome. The problem you raise is the following: I have a fasta file with the names of all genes, genes in the file are called according to prediction program that I have used, with the following format:
>snap_masked-scaffold983-abinit-gene-2.78-mRNA-1
MATPSPLMMLLGALFFFSANVFAAGAVLGVDLGTEYIKAALVKPGIPLEIVLTKDSRRKETSAVAFKPSKSGPTAGQFPERSYGADAMALAARFPGEVYPNLKTLLGLPIDDASVKEYAARHPALQLQAHSSRGTPAFKTKTLTAEEDALLVEELLAMELQSVQKNAEAAAGDGSSV
>snap_masked-scaffold889-abinit-gene-3.50-mRNA-1
MSSLFDSWFGFIFWGVAYFRMRTADKKIGRERNVIGDWFSMGLNVIIILTGFFFLTAGTYASVQGIIDSFNAGEVGGVFSCKSNGV
>snap_masked-scaffold889-abinit-gene-3.43-mRNA-1
MAVYRVPFSWVHFVNLTIQL
Also I have a file with the correct names, the name of the sequences in the two fasta files is different. My question is if there is a script for changing the names of the first list using the second basis. The idea would be to change the name of all head supports. Thank you very much.
I don't understand. Can you show us the content of "Also I have a file with the correct names, the name of the sequence"
Is the order of the gene in file one identical to the order of the names in file 2?
Usually, gene prediction programs are able to make several predictions/seq (like your scaffold889 I guess) or no prediction at all. As a result, you won't have the same number of seqs in your original fasta file and your predicted peptides file. How can you deal with that? A gene predictor should mention the name of the ref seq (and I think it's the case here). So if you want to change names, you should do it before your prediction.