Entering edit mode
3.2 years ago
setschmann
▴
10
i have a huge reference genome with a lot of contigs, it looks something like this.
>aalba5_s00000010
TTGTCTGCTTCACAGTACAGCTAGAAAATTATGAATTCATTTCCCCACATCAAGCAACCCCTGCTTATTC
>aalba5_s00000011
ACTTGGAATGGGATCTTGTTGGGGGGCCAACAGAACCATAAGGGCAATGGCTGCAATCTTTGATAAGATC
>aalba5_s00000012
TGTAGCAAACAGCTACGGAAAAATTTTAAAAATTTTCGAAATTTAAATCTGGGGTTCCCTTTCCTGTGTA
GATGTATTCCCTTTTTAAAGGTTTTCCTAGGACTTGCAGTCATTAATGAGACGTCTTCTCATGATATCCT
AATTTTTGGAAGATGCCTCCTACATCAGGAATCTTTGCTGCCACTTGTCTCTTTCATCAGCCAGATGTCT
how can i subset this that i have a file each with the filename of the name of the contig (examplea alba5_s00000010.fa) conatining its sequence?
tanks for the help
You can try below Python code for your file
Replace the n with number of sequences in your file. Read more here https://www.reneshbedre.com/blog/filereaders.html#split-fasta-file-into-multiple-fasta-files