Hi, this is my first post. I'm Fran from Spain and currently I'm working on my final thesis for a bioinformatic master's degree. I have to obtain transcript sequences which are affected by a mutation. To do this, firstly I get the original sequence of the transcript and then I include the mutation by modifing the string.
I use the following function in order to perform the operation:
# param 0 -> TransciptVariationAllele object
# return -> Sequence of the variation including 5' and 3' regions.
sub get_variation_seq{
my $tva = $_[0];
# translateable_seq returns the coding part of the transcript
# (it removes introns and 5' and 3' utr)
# my $seq = $tva->transcript->translateable_seq;
# seq contains 5' and 3' regions.
my $seq = $tva->transcript->seq->seq;
my $variation_start = $tva->transcript_variation->cdna_start - 1;
my $variation_end = $tva->transcript_variation->cdna_end - 1;
# If is a deletion, feature_seq is '-', so we will use '' instead
# to build the final sequence.
my $feature_seq = $tva->feature_seq eq "-" ? "" : $tva->feature_seq;
print $tva->display_codon_allele_string . "\n";
print $tva->transcript_variation->variation_feature->variation_name . "\t$variation_start-$variation_end\n";
print "$seq\n";
substr($seq, $variation_start, $variation_end - $variation_start + 1) = $feature_seq;
print $seq . "\n";
return $seq;
}
This function receives a TrancriptVariationAllele object and returns the complete variation sequence, including 5' and 3' UTR. This works for dbSNP variations, but when I have to deal with COSMIC or HGMD-PUBLIC variations, $tva->feature_seq does not contain information about the variation seq, it only contains a string as "COSMIC".
¿How could I get the complete mutated sequence of non dbSNP variations? ¿Is there any other way to do this?
Thank you in advance.
Greeting, Fran.
Thanks for your answer, Emily. In that case I would have to exclude variations whose source is different from dbSNP.