Protein of un-clear function
0
0
Entering edit mode
4.7 years ago
jaqx008 ▴ 110

Hello all, I am studying a group of small RNAs I believe are being generated from a particular spliced transcript. This transcript (below) as I saw in IGV are duplicated 3 times adjacent to each other. The sequences also have a pattern of repeats in them. Finding the function of this protein is highly relevant to discussing my result and I am somehow stuck. I have tried to blast the sequence against ncbi and what I am finding is not making much sense probably due to poor annotation. I am trying to see if there is anything else I can do to find the function of this protein. My organism is Branchiostoma floridae. Thanks

CTGGCACCACTCTTGTCAGCTGAACGCTGGGCATCCCGATCGTCTGTAGACGGTGCGAAGGTTACCCTCTTCCTGGCACCGGTCTTGTTAGCTGGGCGCTGTGCATCCCGGCCGTCTGTAGACTGTGCGGGGGTAGGCACCAGAGAGCTTTGACGGGGCAGGTTGACCGGAGCAGGTCGACCTGTAAGGAATACAAAAAGAATGCAAAACATTTCAAGCATTAGTTCTCTTTAGCTATGAGATGTCCTAGAAAATCAGGACAAGCAAACGCATTTTCACCTTTTTTTAGAAAGGATATTGACATTGCTGCAGCTAGGATTAGGAAAGACTCGTTCTCTATCAAAAGTTTAACGTTTCATGTGTTGTAGTAATCTGTGTAAGCCCCTCCCAACTTAGAAGCCGAAATACGAAATGGTACAGTACTAGTAGATCCTTTACTTGCATATATACATATAATGAGTAGTTCTGGTTCAATATTGATATATAATTTCAAAACAAAAGACAAATATTACACACTTCTTTTTTTAATTTTATTTTTTCATTCTTGCAAATAACGACCAGAATTTCTTTGACCAAAACCATTCTCACCTACAACACCTGCCGGTGATGCGGACTTTCCGGCCCTCCTGGCTTGTGGTGCGTCACCCATAGGTGCGCATGCGCCTGGCCCATTCAGGCTCTCGCGACTCTCTGGCTTCTTGTCGTAGACTCCGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCGTCGCAGACACCGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACACCGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACACCGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTTGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCTTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACAGTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCATCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACAGTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCATCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACTCCGACACTGGCCATGGTGTCATCTGTGCCAGGGCCACTCTGGTTGTCTGCAAAATAATGCAAAACATTTAACGTTAAATCATCATTTCTCTTTAGGCCTGGGTCACATTTCCAAGCCGGGGCCCGATCGGGATGTTTTAAGAAACGAGAAATCAAATTGTATACCAAGAAAAATACACAAAGTATGCCCTTGAATCTTATTTTGACATCTTGTGTATTTTGATGTCTTTTCTATTATTTGCTTTTCTCCCGATAGCTGCCCGGCCGGGCCCCTTTTTTTTAAATGTGACCTAAGCCTTAGCTATGAGGTGTCCTACAAATCAGGACACATTGTCACTTTTTTTAGAAAGTATATCGACATTGCTGCAGGAGTTCTAAAACAGTTTGGCTTAGGAAAGACTCATTCCATATTAAAAGTTTCATGTTTTATGTGTTGTAGTAATCTGTGTAAGCCCCTCTTATGTTGGAAGGCGAAATACGAAACGGTACAGTACCAGTAGATCCCTTGTTTGCATATATATATATGATTAGTAATTCTCGGTCAATATCAATACATGTTTTGAAAAGAAAAGTCATGTATAGCACACTTCATTCTATTTGAAACCTTTGTTTAACTTATTGCAAATTCCCAATCGTTTATCCCCAGGGCCCTTGCTCTGTTGAATCACAGTTAAGGCACTTTCACATCAACTATCGTATGACTTGTGTCTTACTCATCTTTACCAATATTGTATATATATATTTAAAGTCTGCAATTTGTGT
blast repeat sequences unknownfunction • 1.4k views
ADD COMMENT
0
Entering edit mode

Which blast did you use?

I'm not familiar with this reference genome but I used blastn and I got these hits:

XM_002613548.1 (Branchiostoma floridae hypothetical protein, mRNA)

XM_002613549.1 (Branchiostoma floridae hypothetical protein, mRNA)

XM_002613550.1 (Branchiostoma floridae hypothetical protein, mRNA)

which contains these proteins:

XP_002613594.1

XP_002613595.1

XP_002613596.1

Each contains 2 or 3 conserved domains:

https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=XP_002613594.1

https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=XP_002613595.1

https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=XP_002613596.1

(You can change the view from concise results to standard results to see more domains)

DNA polymerase III subunit gamma/tau

UV excision repair protein Rad23

SOG2: RAM signalling pathway protein

If you google each domain + name of the genome you can find more information

Branchiostoma floridae + DNA polymerase III subunit gamma/tau

One of the results is

http://www.pantherdb.org/panther/family.do?clsAccession=PTHR11669

On this link if you click on Branchiostoma floridae : 4 you can see all the genes in this reference genome that have this domain.

......

ADD REPLY
0
Entering edit mode

The alignments with CDD domains look false positive for me because only a partial (non-repeat) region is matched with the repeat region.

ADD REPLY
0
Entering edit mode

thank you @ Fatima for looking this up. I have also seen the DNA pol but When I look in other organisms it is not found making it really strange. I am looking through your searches now to see what I can make of it.

ADD REPLY
0
Entering edit mode

Have you performed wet experiments? I think standard way to investigate function of gene is check expression -> check translation -> check localization, knock down analysis etc..

ADD REPLY
0
Entering edit mode

Unfortunately, I am unable to perform any wet experiments on this as we do not have the animal models and the work is not particularly funded. We are just looking to use bioinformatics approach.

ADD REPLY
0
Entering edit mode

Hmm all blast hits I can see are hypothetical or predicted proteins... It's quite dangerous to proceed without evidence that the gene is really expressed and translated.

ADD REPLY
0
Entering edit mode

I am not so much worried about the expression since I can quantify the number of transcripts and use that to ascertain whether or not they are translated. I just want to predict the function based on what they do in other organisms.

ADD REPLY
0
Entering edit mode

At first, you have to make sure that the translated amino acid sequence is the same as you expected and then you can perform sensitive search like hhsuite and hmmer

ADD REPLY

Login before adding your answer.

Traffic: 1800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6