Question 9.1 The following short DNA sequences have all been derived from the same longer contiguous stretch of sequence (or "contig")
TGCATGATGG
ATGCGCTGC
ATGATGGATACCCC
Assemble them into the original contig.
Worked this part out (my alignment for genome assembly)
...............TGCATGATGG
ATGCGCTGC
.......................ATGATGGATACCCC
Result -> ATGCGCTGCATGATGGATACCCC
These additional sequences have been derived from a larger contig that incorporates the original contig.
GGTCGCTTCGCGGCC GCGGCCGCTAATCGGGG
Question 9.2
Assemble them into the larger sequence. (This part I am quite stuck on and is holding me back a little)
GGTCGCTTCGCGGCC
.......................GCGGCCGCTAATCGGGG
I think the alignment is there which results in the sequence: GGTCGCTTCGCGGCCGCTAATCGGGG
This then leads onto the assembly of the larger contig.... which I think.. to suggest to reverse complement it to allow for the binding of the nucleotides through background reading and the nucleotide sequences?
ATGCGCTGCATGATGGATACCCC
..............................................CCCCGATTAGCGGCCGCGAAGCGACC (reverse compliment)
Final sequence : ATGCGCTGCATGATGGATACCCCGATTAGCGGCCGCGAAGCGACC
How does this task differ from the first task?
The task in question 9.1 involves assembling short DNA sequences into the original contig, which is essentially reconstructing the original contiguous stretch of DNA from fragmented sequences. This requires aligning the sequences based on overlapping regions and identifying the correct order to reconstruct the original sequence.
In contrast, question 9.2 involves assembling additional sequences into a larger contig that incorporates the original contig. This task builds upon the first one by introducing additional sequences that extend beyond the boundaries of the original contig. Here, the challenge is to align and integrate these new sequences with the existing contig, considering potential overlaps and ensuring continuity in the final assembled sequence.
Furthermore, in question 9.2, you're considering the possibility of reverse complementation, which adds another layer of complexity. Reverse complementation may be necessary when aligning sequences if they are oriented in the opposite direction to each other. This step ensures proper base pairing and alignment, especially when integrating sequences derived from both strands of DNA.
Overall, while both tasks involve sequence assembly, question 9.2 introduces the complexity of extending the contig with additional sequences and potentially involves reverse complementation to achieve accurate alignment and assembly.
Translate the DNA sequence into a protein sequence
Sequence(3s): ATG CGC TGC ATG ATG GAT ACC CCC GAT TAG CGG CCG CGA AGC GAC C
Codons: ATG CGC TGC ATG ATG GAT ACC CCC GAT TAG CGG CCG CGA AGC GAC C
Amino Acids: M R C M M D T P D STOP R P P R D ..
Reading frame(expasy) : Met R C Met Met D T P D Stop R P R S D - 5' to 3' reading frame 1
What do you find that is notable?
The sequence begins with the start codon "ATG", encoding methionine (M), initiating protein synthesis, and ends with the stop codon "TAG", marking the termination of translation and ensuring proper protein length.
My questions: Is my methodology correct in the way I have approached this? What reason would I use to justify the decision for reverse complementing in the 2nd question (as I initially guessed since some old assemblers used to do this... I think)? What other notable distinctions can be made from the AA sequence?