Entering edit mode
2.5 years ago
Winston
▴
10
I know this question gets asked somewhat frequently, but I've yet to find an answer to my specific issue. I have a multifasta with headers that, while all somewhat different, each contain a unique ID that I want to use to append the sequences into their respective multifasta files.
My original file looks something like this:
seqs.fasta
>cds-123 gene=gene-ABC1 name=ABC1 seq_id=123
GATCGGA
>rna-123 gene=gene-ABC1 name=ABC1 seq_id=123
GATCGGAGGAG
>exon-123-1 transcript=rna-123 gene=gene-ABC1 name=ABC1 seq_id=123
GATCGG
>cds-456 gene=gene-DEF1 name=DEF1 seq_id=456
GACCGACAG
>rna-456 gene=gene-DEF1 name=DEF1 seq_id=456
GACCGACAGGACC
>exon-456-1 transcript=rna-456 gene=gene-DEF1 name=DEF1 seq_id=456
GACCGA
and I want to split this into multiple files based on the name=
field while retaining the original header in the new file:
ABC1.fasta
>cds-123 gene=gene-ABC1 name=ABC1 seq_id=123
GATCGGA
>rna-123 gene=gene-ABC1 name=ABC1 seq_id=123
GATCGGAGGAG
>exon-123-1 transcript=rna-123 gene=gene-ABC1 name=ABC1 seq_id=123
GATCGG
DEF1.fasta
>cds-456 gene=gene-DEF1 name=DEF1 seq_id=456
GACCGACAG
>rna-456 gene=gene-DEF1 name=DEF1 seq_id=456
GACCGACAGGACC
>exon-456-1 transcript=rna-456 gene=gene-DEF1 name=DEF1 seq_id=456
GACCGA
I'm open to any and all solutions.
Thank you for your help!
Worked perfectly. Many thanks!