How To Read And Write Several Sequences In The Format Fasta Placed In One File In Perl?
1
0
Entering edit mode
13.6 years ago

Hey. I'm totally new when it comes to programming in Perl. But I have to make a project and I have no idea how to do it. Maybe someone can help me. The script have to load several sequences in the format FASTA placed in one file. Example input file attached (for example 'gens.txt'):

>A
AGTATCGGACCCGAAGACATTACGCTTAGAGACTTGAAAA
CCTACAGTAAAGAAGCAGCGTCTGGATATCTGGAAGACAA
CGGATTGAAGCTTGTAGAAAAAGAAGCATACTCAGATGAT
GTTCCAGAAGGACAGGTTGTCAAACAAAAACCAGCAGCAG
GTACGGCAGTAAAGCCGGGAAACGAAGTTGAAGTGACATT
CTCTCTCGGACCAGAGAAAAAACCTGCGAAAACAGTGAAA
GAAAAGGTCAAGATCCCCTACGAACCAGAAAATGAAGGGG
ACGAGCTTCAAGTGCAAATCGCGGTTGACGATGCGGATCA
>B
CCATATCGGAGACAGCAGATGCTATTTGCTTCAGGACGAT
GATTTCGTTCAAGTGACAGAAGACCATTCGCTTGTAAATG
AACTGGTTCGCACTGGAGAGATTTCCAGAGAAGACGCTGA
ACATCATCCGCGAAAAAATGTGTTGACGAAGGCGCTTGGA
ACAGACCAGTTAGTCAGTATTGACACCCGTTCCTTTGATA
TAGAACCCGGAGACAAACTGCTTCTATGTTCTGACGGACT
GACAAATAAAGTGGAAGGCACTGAGTTAAAAGACATCCTG
TGGACAAAGCCAATCAGAATGGCGGAGAAGGCGGAGAAGC
>C
ATAAAACAACGGTATTTGCCGGTCAGTCCGGTGTTGGGAA
ATCCTCGCTTCTCAACGCGATCAGTCCGGAGCTCGGATTA
AGAACAAACGAGATTTCCGAGCATTTGGGCCGCGGGAAAC
ACACAACCCGCCACGTGGAGCTGATTCACACGTCCGGAGG
TTTGGTTGCAGATACACCGGGATTCAGCTCGCTTGAATTT
ACAGACATTGAGGAAGAAGAGCTGGGCTATACCTTCCCTG
ATATCAGAGAAAAAAGCTCTTCATGCAAATTTAGAGGCTG
TTTACATCTGAAAGAGCCGAAATGTGCGGTGAAACAAGCT

Then the script should check how similar are the sequences and print percent identity, and then it should also generate a consensus sequence.

perl fasta alignment consensus • 3.1k views
ADD COMMENT
2
Entering edit mode

If you really have no idea where to begin, then providing a ready-made answer for you will be no help whatsoever. First, learn some Perl basics. Second, learn some Bioperl. Third, learn to identify the correct tool for the job. There is plenty of alignment software available to do this task: it's not really a job for Perl.

ADD REPLY
1
Entering edit mode

wouldn't it be great if the perl script also provides the A+ for the assignment? ok, no kidding, if you are really asking for help I would suggest you to read something about hashes (to store your sequences into them) and the functions "open" (to read the file), "for" (to loop through data lines) and "join" (to get each sequences' lines into a single one). once you get there you will surely find some help here.

ADD REPLY
2
Entering edit mode
13.6 years ago

Unless your assignment is really just about basic I/O and not about bioinformatics, and you are not allowed to use outside libraries for the mundane stuff, bioperl is a great way to do things like this. There is no need to reinvent the wheel.

Once you've downloaded bioperl, this section shows how to read/write FASTA sequences from/to a file: http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_file

ADD COMMENT

Login before adding your answer.

Traffic: 1991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6