I have a concatenated MS alignment of more then 3000 genes in fasta format. I would like to split this alignment in 10-20 files, equally divided if it possible. In my MSA I have 30 seqs, and each seq is made of milions of bp.
>1
ATTGCTGAAACGGCTTTGAAAAGGGCGGAAAATCTTTCTGGTGGT [..]
>2
ATTGCTGAAAC----GGCTTTGAAAAGGGC----GGAAAATCTTTCTGGTGG [..]
>3
ATTGCTGAAAC----GGCTTTGAAAAGGGC----GGAAAATCTTTCTGGTGGT [..]
>4
GATGAGCCGATTGCTTCGCTGGATCCGATGAATGCGCAGGTGGTGATGGACGCTCTTAAG [..]
........
Now I would like to split this file in multiple files having chunks of the MSA. for example file1 from position 1 to 300; file 2 from position 301 to 600; file 3 from 6001 to 900; and so on.
I could not find a way of doing that, any suggestion?
The problem is that most of the tools only split multiple fasta in different files or split single sequences.
PS: I do not have available the original single MSA
How are they concatenated? Is there any example you could provide?
@mxs, it is just a simple MSA, but instead of having around 1000 bp, each entry (>Ids) has milions of bp
And what is the file format? Multi fasta? Phylip? Stockholm? Something else?
@5heikki, right, it is in fasta.