Hey. I'm totally new when it comes to programming in Perl. But I have to make a project and I have no idea how to do it. Maybe someone can help me. The script have to load several sequences in the format FASTA placed in one file. Example input file attached (for example 'gens.txt'):
>A
AGTATCGGACCCGAAGACATTACGCTTAGAGACTTGAAAA
CCTACAGTAAAGAAGCAGCGTCTGGATATCTGGAAGACAA
CGGATTGAAGCTTGTAGAAAAAGAAGCATACTCAGATGAT
GTTCCAGAAGGACAGGTTGTCAAACAAAAACCAGCAGCAG
GTACGGCAGTAAAGCCGGGAAACGAAGTTGAAGTGACATT
CTCTCTCGGACCAGAGAAAAAACCTGCGAAAACAGTGAAA
GAAAAGGTCAAGATCCCCTACGAACCAGAAAATGAAGGGG
ACGAGCTTCAAGTGCAAATCGCGGTTGACGATGCGGATCA
>B
CCATATCGGAGACAGCAGATGCTATTTGCTTCAGGACGAT
GATTTCGTTCAAGTGACAGAAGACCATTCGCTTGTAAATG
AACTGGTTCGCACTGGAGAGATTTCCAGAGAAGACGCTGA
ACATCATCCGCGAAAAAATGTGTTGACGAAGGCGCTTGGA
ACAGACCAGTTAGTCAGTATTGACACCCGTTCCTTTGATA
TAGAACCCGGAGACAAACTGCTTCTATGTTCTGACGGACT
GACAAATAAAGTGGAAGGCACTGAGTTAAAAGACATCCTG
TGGACAAAGCCAATCAGAATGGCGGAGAAGGCGGAGAAGC
>C
ATAAAACAACGGTATTTGCCGGTCAGTCCGGTGTTGGGAA
ATCCTCGCTTCTCAACGCGATCAGTCCGGAGCTCGGATTA
AGAACAAACGAGATTTCCGAGCATTTGGGCCGCGGGAAAC
ACACAACCCGCCACGTGGAGCTGATTCACACGTCCGGAGG
TTTGGTTGCAGATACACCGGGATTCAGCTCGCTTGAATTT
ACAGACATTGAGGAAGAAGAGCTGGGCTATACCTTCCCTG
ATATCAGAGAAAAAAGCTCTTCATGCAAATTTAGAGGCTG
TTTACATCTGAAAGAGCCGAAATGTGCGGTGAAACAAGCT
Then the script should check how similar are the sequences and print percent identity, and then it should also generate a consensus sequence.
If you really have no idea where to begin, then providing a ready-made answer for you will be no help whatsoever. First, learn some Perl basics. Second, learn some Bioperl. Third, learn to identify the correct tool for the job. There is plenty of alignment software available to do this task: it's not really a job for Perl.
wouldn't it be great if the perl script also provides the A+ for the assignment? ok, no kidding, if you are really asking for help I would suggest you to read something about hashes (to store your sequences into them) and the functions "open" (to read the file), "for" (to loop through data lines) and "join" (to get each sequences' lines into a single one). once you get there you will surely find some help here.