command-line tool to split genome FASTA into equal chunks?
1
0
Entering edit mode
6.3 years ago
gtrwst9 • 0

Say I have the file for accession LS483306.1 which is one big sequence starting with

>LS483306.1 xyz
AGCT...

and want to have one file with N sequences of size X, looking like this (example X=2000):

>LS483306.1:1-2000 xyz
AGCT...
>LS483306.1:2001-4000 xyz
GCTA...
>LS483306.1:4001-6000 xyz
CTGA...

and so on.

Is there a ready-made command-line tool for this? Which? I could write a BioPython script but I would like something faster.

software conversion fasta • 1.9k views
ADD COMMENT
0
Entering edit mode

BioPython script but I would like something faster

then write it in plain python

ADD REPLY
3
Entering edit mode
6.3 years ago
GenoMax 148k

shred.sh from BBMap suite.

Usage:  shred.sh in=<file> out=<file> length=<number> minlength=<number> overlap=<number>


in=<file>     Input sequences.
out=<file>    Destination of output shreds.
length=500    Desired length of shreds.
minlength=1   Shortest allowed shred.  The last shred of each input sequence may be shorter than desired length.
overlap=0     Amount of overlap between successive reads.
reads=-1      If nonnegative, stop after this many input sequences.
equal=f       Shred each sequence into subsequences of equal size of at most 'length', instead of a fixed size.
ADD COMMENT

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6