I have a fasta file formatted as follows:
>UPF0471 protein C1orf63 homolog
some sequence
>WD repeat-containing protein 43
some sequence
>transmembrane protein 41A
some sequence
When I print out record.id or make dictionaries, biopython cannot handle the spaces in the sequence names. What should I do to let biopython recognize the name as whole rather than just taking the first word of the name?
Replace the spaces with "_" or "-"?
You'll find most tools will take the same attitude to spaces and FASTA identifiers, so good idea!