I am new to computational biology and NGS . As part of my work I have to use a code through which we can calculate chromosome size by analyzing genome fasta files. The program should take genome fasta file as input and store the sequence length in an array and then print out the sizes for each chromosome. anyone please help me on this
My best advice is to try it yourself and ask if you get stuck. Writing code is the only way to learn. The BioPerl HOWTOs are a good place to start and there are complete examples for doing things like getting sequence stats.
Here is a biopython script to take FASTA format and print out header and the length of the sequence. Run like: python script.py file.fasta
import sys
from Bio import SeqIO
with open(sys.argv[1], 'rU') as input:
SeqRecords = SeqIO.parse(input, 'fasta')
for rec in SeqRecords:
print "%s\t%I" % rec.id, len(rec.seq))
ADD COMMENT
• link
updated 5.1 years ago by
Ram
44k
•
written 9.1 years ago by
Jon
▴
360
In this perl script there is a line my $sequence_length = length($sequence); , just write a line to print that. Your work will be done. you can find other scripts for Nucleotide sequence analysis as well in that repository
ADD COMMENT
• link
updated 5.1 years ago by
Ram
44k
•
written 9.1 years ago by
venu
7.1k
My best advice is to try it yourself and ask if you get stuck. Writing code is the only way to learn. The BioPerl HOWTOs are a good place to start and there are complete examples for doing things like getting sequence stats.
Sounds like a homework. Try it yourself. As SES proposed, check http://www.bioperl.org/wiki/HOWTO:SeqIO for some help. As you are starting, and you want to double check, use 'faSize" from kent utils at https://github.com/ENCODE-DCC/kentUtils/tree/master/bin/linux.x86_64
Thank you ol :)