Hello!
I am trying to split a large fasta file (19,336 lines) into individual contigs. The file set up is as follows:
>k141_284136 flag=1 multi=3.0000 len=1875
AGCCTACATTGGCAAGGTACTGCTTTTGTCGCCCATCGTTGGCGAATTTGCTAATGAGAACACACGGAT
>k141_407195 flag=1 multi=5.0000 len=1723
GCCAGTAGTTTTCAGATTTTCAATTACTTTCTTTGCTTCTTTTAACGCAGCCGCAAAGTTGTCATCAAGTTCTCCACCCTGTGCAATATGTTTATATAGAATGCTGCTTACTTTGTCAGCAA
>k141_169332 flag=1 multi=3.0000 len=20
ATTATCCATCCTATTCATCGCTTGATGAAATGTTGCAAAATTCCAAAGATTTTCAGCGTCAAATCGTTCGTATATCCTAATTAAACACCGCTAAAAGTTATGTCTAAGCAATCTTTAA
I am able to split the file but the output files names are meaning less (xaa, xab, xac etc.).
I am trying to split the fasta file so each contig is in an individual fasta file names with the contig name following the >.
For example file one would be titled "k141_284136.fa" and include:
>k141_284136 flag=1 multi=3.0000 len=1875
AGCCTACATTGGCAAGGTACTGCTTTTGTCGCCCATCGTTGGCGAATTTGCTAATGAGAACACACGGAT
My input file is called vDNA-S1S1.fa! Thank you for any help.
Past threads of interest:
Rename FASTA files according to FASTA file header
Using
seqkit
: https://bioinf.shenwei.me/seqkit/faq/#how-to-split-fasta-sequences-according-to-information-in-header