Hii,
I have a merged fasta file of 1500 sequences.I want to split it into only 2 files ,one having 1000 fasta sequnces and other having 500 fasta sequences with headers intact.Can anyone suggest me the way with proper command to do it easily by awk or grep?
I have a merged fasta file.Can you suggest me a script or way to split my merged fasta file based on country_name?
I want all the fasta sequences from one country in one separate fasta and same for others.Is it possible?
You can also use SEDA. To achieve the desired split, you may use the Split operation (under Choose operation / General) and configure Fixed number of sequences per file with 1000 sequences.
for simple fasta format, sequence in one line:
for fasta with multiple line sequence format, using
bioawk
: https://github.com/lh3/bioawktry command faSplit given by UCSC utilities.
If each raw sequence is in and only one line, then including the header it will be two lines, so you can use:
+
before 2001 is necessary as it will output line 2001 and anything after that line.Hii, This is my fasta file header
I have a merged fasta file.Can you suggest me a script or way to split my merged fasta file based on country_name? I want all the fasta sequences from one country in one separate fasta and same for others.Is it possible?
Try this:
for the last 500 sequences substitute the
head -1000
totail -500
.Are all the Country names right after the first
/
???yes .All country names are after first "/"
this is a solution using
bioawk
to process fasta file: https://github.com/lh3/bioawkTry this:
for the last 500 sequences substitute the
head -1000
totail -500
.