I have a multiple fasta format sequence in a notepad, I want to delete rest of the base above 500 bp, in each sequence. what script I can use.
thanks
I have a multiple fasta format sequence in a notepad, I want to delete rest of the base above 500 bp, in each sequence. what script I can use.
thanks
Don't know of any scripts that do that, but it's really easy to write up one yourself. Try this one with using BioPython...
from Bio import SeqIO
import sys
fastafile, length = sys.argv[1], int(sys.argv[2])
entries = SeqIO.parse(fastafile, "fasta")
for entry in entries:
print ">"+entry.name
print entry.seq[:length]
Save it to a file (e.g. parse_fasta.py
) and then run it in a terminal (or cmd if you're on windows) with:
python parse_fasta.py <input.fasta> <sequence_lengths> > <output.fasta>
Or slightly more elegantly using SeqIO.write
and a generator expression,
from Bio import SeqIO
import sys
fastafile, length = sys.argv[1], int(sys.argv[2])
trimmed = (entry[:length] for entry in SeqIO.parse(fastafile, "fasta"))
SeqIO.write(trimmed, sys.stdout, "fasta")
Running this is as simple as saving the script to a file (like parse_fasta.py from my post) and running the python interpreter over that script (with the commands I mentioned at the end of my post).
If you have a fresh installation, you'll probably be missing the Biopython library. In that case you can install it by following the steps mentioned here: http://centosn00b.blogspot.be/2012/05/biopython-tip.html
Remember to prepend "sudo" to the commands if you do not have administrator privileges.
pip install pyfaidx
python
And then paste the script, changing your file name.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
A simple sed command might do the trick if your fasta sequences are on one line (i.e. not wrapped):
Or using awk (I like awk!):
Thanks for the quick reply.
Can I use this command on CentOS Linux terminal directly or do I need to download any other package?
These commands should work on most Linux or Unix computers without installing any additional package.
Hi Frederic,
I have one more problem, Hope you can help me with that too
I have a single contig of around 5000 bp, which looks like
I want to have a multiple contigs of 100 bp, like
and so on...
What command should I use?
Best!
Hi, the thread A: How To Split One Big Sequence File Into Multiple Files With Less Than 1000 Seque might answer your question.
thanks Frederic, it works... thanks alot.
Cross posted (to Linkedin???)