What is the correct way to make an annotation in a CDS file without ID gene names? This CDS is from Capsicum annuum and looks like this:
>Id16
ATGCATCATCCCATCTTTCATGCTTCTGGTTCTGTGGAAGGGCATTGGATTAGGATTCCCCCACCTCATAAAACATCATTTTATGCTTCTGACATA
TATGATATGAAAGAAGATGAGTCTTTATTCGCCTCATCAGGCATAGTTTCTTTTCAAGAAAGAGACAGAGGATATGAGCTTGACACCGCAGCTAGG
CATGGTTCCGCAGACTGTATACGTGAGCATCTTAGACAAGATCAAATTGAGGATTTGTCATCGTCCCCTCCAGCTGTTGGCTCCATACAGATTGGT
AGAAGCAATGGCTTTGGCCATAACATAGAGTTCATGTCTCAAGCTTACCTCAGAAACAGAAGCTCAGATATTAATATAGAGGTGAAGATTAGCCAA
GCTTCCTCCAACAATCCTGTCAAGGAAGTTGCATCAAAGGTAGCTTCCCAGTTTGAGCATGACAATTACAAGCTGATACTTAAGGTTCGAACAAGG
AAGGGTGAAATTCTTGCCTTAATGGGGCCTTCTGGCAGTGGGAAAACAACCTTGTTAAAGATATTGGGAGAAAGATTGCAAGAAAATGTCAGAG
CATCCCATATAATACAGCTATCAATAAGAGACATCCAAGCAAGATGAGTCAACGTCAGAAGTATGAAAGAGCTGAAGTGCATATTAAAGAATTAGG
CCTGGAAAGATGTCGTCACACGAGAATAGGTGGAGGACTTATTAAAGGCATATTTGGGGGAGAGAGGGAAAAAACTAGCATAGGGTATGAAATCCT
TGTTGATCCTTTTCTCCTCTTGCTCGACGAACCAACTTCAGGCCTTGATTCGTCCTCTGCAAGGAAGGAACGAGTCGGTTCCCCTTTCCGTCTTTC
GGTAGCATAA
>Id17
ATGCCAGTTTCCAGCTATCCGGTTCAAGTCTTTCGTTTTGCCAGCAAGCTGGTGCTTGCAGCCTATGGGCTTTCAGCTGGTGCATGCGATCGAAGA
CTTTATCTAAGAGGTGGATTTCCCTCGATAGTTGGGCATATGATATATGATGGATACAAGTGGGCCAGGAGTCGTAGAGCAATGTCTTTATTGGCA
GTTGCGCAACCTTCGATTGAAGCTACTTCTACAGATTGCGATAGCACCTGTCCTTGGATCAAGGCTCTCTCTCGCTCAAGACGTCGATGTGCCACC
GGGTTGACCCTTTTCTTACCAGCATGGGGAGTTGCGATGGATGCCAAGATGAAGACTCCTCAGCGCCAATTAGGGGGTGCAAGAGATAGTTGGATC
AAGCCTGGGGATAAAGTGATGAGCCCGAGATGTTATAAAGATTTGGGTTTGACTTTTTTGTCTGCTTTGTATGAGTCGACGTATGGAATGCGCCAA
GACATGACTTTGTATGCCATGGCACTGAGAGAAAGACACAGGAGAATTCCTCTTTTAGGAAGACCTAGTAGCTCAGGATCTCTGACGTTTCATGTC
TGTGCCTTTGATCACATACTTTGTCCGCTCGAAGGCTCATGCTCGATCCTCTTTCATTGGATACAGAGATTTCGTTCACTTAAGGCTTGTTTGCAA
TGGAACTGGGAAAAGAGAAAGAAATGGAGTGAAGAGCTTCGTTGA
>Id18
ATGCCTTCTTGGTCGAAGAGCCCCTTTTATACTAGTAAGGACGTAGGAAGCAAAGAAACTTATGCGAAGGACGTTTTCTTCTCTGCCCTCTCCTCT
CCAAAGGCCAAGGGAGAGACTGCATCCCTTTCCTTCGGTAGCTCTTTTGGTTTCCCAAGGATAGCGGTAGCTGGAGCAAAGCCCGCTTTCTTCTCT
CCGCAAATGAAAGAGAAAGTTAGAGGAAAAAACACATTCTCTCTTTGCGAGATCCAAAAGTGGAGAACGCATAGCATTCTATGGGTACATAGGATC
AAACATAAAGCAGCGCTCTCTTGGCAGAGTTTTAGGTGGCAAGAGACTTTAGGTCTTGTTGGAGCTTCTGAGCGTAACGAATCAAAGTCGAAGATG
GATCAAGGTAGCTTACCTACCAAGCCGATAGGCAAAGTGCTGAAGGATGAAATGTGCAAAGTAGATCGTGCACCTGTCGTGTGA
This goes from: Id_1 to Id_35884, I made a **BlastX**
with the protein database of C. annuum var. Zunla-1, because I want to align this CDS file with fastq files using **kallisto**
, but when I did the index in **kallisto**
, I have an error because some Id are repeated and are not unique (only in the name), because some Id_xxxxx were matched with a gene more that once (I keep the best match in **Blastx**
), so, what is the correct way to "annotate" this CDS file?
This is how looks with the Id's from **BlastX**
:
>YP_009049799.1
atggcttcaaacaagcgagaaagtccctttctatcgtcattagtcaagcgcgctagctgc
aataaaaaaagagcgctaacgagcaagaaaagggatgtgctaagaagcaagggctttcgc
gcagctgctgcgcccttgattcttgctttcgacctggagcttgatggggttggtgcttgc
aaaaatatcaagtcgacggggtcaggtaccagtagtgacaatagcaaagaggggttggac
actagttgtgtgagtggaatggcccaactggacctagtcagcccgaactattttgcggtt
ctagaggaacctgaagaagaagaggtaaagatgccagatctggacactgctgaaccgaaa
gagattgctcaggatgagtgtttgggtaacaaagccgaggagggtctattcaaggagaga
actcccaaggagagtgatttggctcatagaagcgaggatctagaagaaagggtcaactat
ggaagtgactga
>YP_006666039.1
atgggcagtcttggtcctattgaaaataccagtgaagatccaaatcaaaaagtgaaaaac
attcccagttgtagtaatgttgattatttattcgacgttaaagacattcagaatttcatc
tctgatgacacttttgtagttagtgataggaatggagacagttattccatctattttgat
attgaaaatcagatttttgagattgacaacgatcattcttttctgagtgaactagaaagt
tctttttatagttatcgaaactcgagttatctgaataatggatttaggggcgaagatccc
tactataattcttacatgtataatactcaatatagtttgaataatcacattaatagttgt
attgataataacttcagtctcaaatctgtatag
>YP_009049789.1
atgatactttccgttttgtcgagccctgctttggtctctggtttcatggttgtacgtgca
aaaaatctaatacattccattttgtttctcatcccagtctttcgcaacacttcaggttta
cttcttttgttaggtctcgacttctttgctatgatcttcccagtagtttatataggagct
atagccatttcatttctattcattgttatgattttccatattcaaatagcggagattcac
aaagaagtattgcgctatttactagtgagtggcattattagacttatcttttggttggag
atattctttattttagataatgaaagcattccattactaccaacccaaagaaatacgacc