Hi everyone, I'm using some chromosome-level genome assemblies, which have very large sequences. I need to cut these into 1 kb chunks for an analysis.
I have looked at a variety of available tools but I don't see any that do this. Does anyone know of one?
Any advice will be appreciated.
There might be a more straightforward method out there, but you can use split, and then fix the divided sequences (first & last sequences)
Are we sure this works?
split
is a standard unix tool which has no understanding of base pairs. You may be confusing the split with kilobytes.split
doesn't understand anything about fasta file headers etc to my knowledge.Thank you, this works!
The second step sounds complicated. How woulld you clean up these "divided" sequences and still maintain a file size of 1 kb across the board?
Thank you everyone!
All useful solutions.
Please do not add an answer if you're not answering the principal question.
try seqkit split