Biopython or Bioperl help
2
0
Entering edit mode
9.1 years ago
iamtuttu5 ▴ 40

Hello all,

I am new to computational biology and NGS . As part of my work I have to use a code through which we can calculate chromosome size by analyzing genome fasta files. The program should take genome fasta file as input and store the sequence length in an array and then print out the sizes for each chromosome. anyone please help me on this

Thank you for your help :)

next-gen RNA-Seq biopython sequence • 3.3k views
ADD COMMENT
5
Entering edit mode

My best advice is to try it yourself and ask if you get stuck. Writing code is the only way to learn. The BioPerl HOWTOs are a good place to start and there are complete examples for doing things like getting sequence stats.

ADD REPLY
2
Entering edit mode

Sounds like a homework. Try it yourself. As SES proposed, check http://www.bioperl.org/wiki/HOWTO:SeqIO for some help. As you are starting, and you want to double check, use 'faSize" from kent utils at https://github.com/ENCODE-DCC/kentUtils/tree/master/bin/linux.x86_64

ADD REPLY
0
Entering edit mode

Thank you ol :)

ADD REPLY
2
Entering edit mode
9.1 years ago
Jon ▴ 360

Here is a biopython script to take FASTA format and print out header and the length of the sequence. Run like: python script.py file.fasta

import sys
from Bio import SeqIO

with open(sys.argv[1], 'rU') as input:
    SeqRecords = SeqIO.parse(input, 'fasta')
    for rec in SeqRecords:
        print "%s\t%I" % rec.id, len(rec.seq))
ADD COMMENT
1
Entering edit mode
9.1 years ago
venu 7.1k

In this perl script there is a line my $sequence_length = length($sequence); , just write a line to print that. Your work will be done. you can find other scripts for Nucleotide sequence analysis as well in that repository

ADD COMMENT

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6