Hello everyone,
I just completed had a bioinformatics amplitude test where the question was quite open ended and I am not sure if I have answered it correctly, I am trying to correct my mistakes and could some help from the community to recognise them as the wording is throwing me off.
Question 1:
The following short DNA sequences have all been derived from the same longer contiguous stretch of sequence (or "contig").
TGCATGATGG
ATGCGCTGC
ATGATGGATACCCC
Assemble them into the original contig
My answer:
ATGCGCTGCATGATGGATACCCC (Original Contig)
Question 2
These additional sequences have been derived from a larger contig that incorporates the original contig.
GGTCGCTTCGCGGCC
GCGGCCGCTAATCGGGG
Assemble them into the larger sequence.
My answer:
GGTCGCTTCGCGGCCGCTAATCGGGG (result of sequences)
Larger sequence (this part I was very confused on how to do this and what was the correct method)
ATGCGCTGATGATGGATACCCC + GGTCGCTTCGCGGCCGCTAATCGGGG
Answer :
ATGCGCTGATGATGGATACCCCGGTCGCTTCGCGGCCGCTAATCGGGG
How does this task differ from the first task?
Task 1 involved using a set of sequences that have been extracted from a contig to form an assembled contig (original contig), aligning 3 similar sequences to form 1 unique sequence composed of the three sequences.
Task 2 now involves using a similar approach regarding the finding the 2nd contig using the sequences, this then involved adding this sequences in priority of order, and recognising that ATG is a start codon and placing the original contig first via assumption for protein synthesis via transcription
Translate the DNA sequence into a protein sequence
DNA Sequence: ATG-CGC-TGC-ATG-ATG-GAT-ACC-CCG-GTC-GCT-TCG-CGG-CCG-CTA-ATC-GGG-G
Reading frame: ATG CGC TGC ATG ATG GAT ACC CCG GTC GCT TCG CGG CCG CTA ATC GGG G
Translated Protein Sequence:
Met-Arg-Cys-Met-Met-Asp-Thr-Pro-Val-Ala-Ser-Arg-Pro-Leu-Ile-Gly
What do you find that is notable?
The start codon "ATG" initiates the translation, and the translation continues until one of the stop codons is encountered. Absence of a stop codon, sequence doesn't contains a stop codon (TAA, TAG, or TGA), which would indicate the end of protein synthesis. This suggests that the sequence may be a open reading frames (ORFs).
<h6>### Included this wasn't too sure if it was highly relevant</h6>Sequence 1 has a GC content of approximately 54.55%.
Sequence 2 has a higher GC content of 72%.
Therefore, the difference in GC content between the two sequences may indicate differences in their biological roles, such as their propensity for gene regulation, protein binding, or other molecular interactions.
Conservation of Amino Acids: In the first sequence, there is a repetition of methionine (Met) and arginine (Arg) codons. Similarly, in the second sequence, there is a repetition of glycine (Gly) and arginine (Arg) codons. This repetition could imply certain functional motifs or domains within the protein sequences.
Hi Phillip,
Thanks for providing an answer and clearing up some confusion, looking into what you said about older assemblers reverse complementing the 2nd sequence to accommodate the 1st sequence, can you provide an example of a piece of software? as well as maybe some insight to why the assembly has changed from this method?
Thanks, Ricardo
Hi Ricardo! CAP3 is an 'older' overlap assembler from 1999 which is perfectly suited for these kinds of 'small' tasks. I used it last in my undergrad to assemble some ESTs....
Nowadays we use De Bruijin graph-based assemblers like MEGAHIT, Spades, Velvet which should work here too but is probably overkill.
Hi Phillip,
Thanks for sharing you knowledge and funnily enough this really was an amplitude test, but anyways your sharing of knowledge has been extremely helpful and I really do appreciate the time you've taken to read my post. I've created another post using you insights and looking into the working a bit more just to clear up further confusion. Genome Assembly task + Protein Translation, assignment advice on a question here it is if you would like to read and maybe contribute further.
Thanks again, Ricardo