Question

Job:Bioinformatics Research Consultant needed for constructing TFBS and regulatory networks from time series plots

1

Entering edit mode

8.3 years ago

tfhahn ▴ 50

If you are interested in teaching me any of this but you don't have the time to read everything I wrote please email me at Thomas.F.Hahn3@gmail.com or call me at my cell phone at 318 243 3940 or send me a Skype contact request to tfh002 If my answering machine is full you can leave me messages on my Google voice number because this answering machine never fills up. That number is (501) 301-4890

I am looking for informal guidance to speed up my bioinformatics dissertation focusing on analyzing genomic data.

I am looking for an informal bioinformatics research adviser, trainer and tutor. I urgently need a publication to remain in the program. But since I am almost blind I won't be able to submit one on time unless I can get some help. My vocational rehabilitation agency is giving me funds to pay for assistants and tutors to help me compensating for the shortcomings caused by my visual impairment.

My research involves yeast because my adviser has a yeast lab. We are interested in understanding how and why caloric restriction extends lifespan. We are specifically interested in changes in membrane composition because it is a marker for aging. We would like to improve our understanding of the mechanisms underplaying and driving the aging process so that we can eventually reverse it.

I am especially looking for help in learning the most relevant Bioconductor packages for analyzing time series microarray, RNA Seq. Chip-Seq. Proteome and any kind of epigenetic, metabolomic and transcription-factor-binding-site data for constructing highly predictive causality-inferring co-expression, regulatory and protein binding networks. My aim is to use computational methods for predicting a novel lifespan extending intervention for yeast, which I hope that my adviser can biologically validate in his yeast lab. I have stayed up all night looking for good tutorials, datasets and review articles, which I thought might be good for learning and developing these skills and techniques, but unfortunately my Firefox browser crashed. This has caused me to have lost all my many open browser windows, in which I had opened this kind of learning material, i.e. tutorials, review articles and sample datasets. If you’d like a list of articles, which employ the skills and techniques I’d like to gain, I’d gladly repeat this literature search again. Maybe after having posted this text I will start working on a second post listing the references to the resources I thought could be helpful for us and also make a note for each publication explaining why I think it is useful. But feel free to refer to and use any information, which has helped you to learn all of this. For example, I have never seen a regulatory network for transcription-factors and therefore, I am still not sure whether the colorful circular figure 6 of one particular article is actually considered a transcription-factor-binding regulatory network and how to read it.

I started my dissertation by plotting lots of time series curves from microarray data from yeast. So far I only learned how to analyze microarray data when the Affymetrix Yeast Genome 2.0 Array chip was used. But I'd like to learn how to analyze other chip data. I was hoping after having plotted enough time series graphs from different GEO microarray datasets to see some aging related trends in gene expression pattern at least for some genes. But unfortunately, I was disappointed because no clear tend became visible for any gene. So far my time series plots look so random that one can assume that there is no relationship between the temporal gene expression pattern and gene function. There does not seem to be any difference in the time series plots similarities between genes belong to the same GO-term and the remaining genes. But if this were the case then co-expression networks would not work because one only connect genes with an edge when their plots are highly correlated. If we would use this approach given my data almost none of the genes belonging to the GO term would have been grouped together. Therefore, before trusting co-expression networks I would like to establish, show and prove that there is indeed a relationship between time series curves and gene function. My analysis may be flawed because I did not use any Bioconductor package to analyze my microarray data. If you could teach me how to do that then maybe that would make my data look better because I did not exclude all not differentially expressed genes from the analysis because I was not aware of this requirement when I did this work.

I also would like help in finding a good way to rank the similarities between my different time series plots. When I used the regular Pearson Correlation in R to group together genes, whose time series curves were correlated by more than 0.85 I found almost no enrichment indicating that those genes, which I had grouped together based on this criterion, where not at all functionally related. This made me even more skeptical whether co-expression networks could really tell us what we expect, i.e. which genes are working together. But since so many people are publishing co-expression networks I think something may be wrong with my analysis that I got such counterintuitive results.

According to my understanding the trajectory of the time series plots is determined by the way each particular gene is regulated. Therefore, I am much more in favor of regulatory networks. If nobody has done this already I would like to show that genes with more similar promoter regulatory regions have higher correlated time series curves. That at least sounds logical to me. But I need help to figure out whether somebody has already done this kind of research.

I need help constructing transcription factor based regulatory networks. I was hoping to find all components plus instructions to build them all night but unfortunately I could not find enough material. I would like to learn how regulatory networks for transcription factor binding sites can be constructed.

I have read that many of the regulatory and co-expression networks have been constructed based on chip-chip and chip-seq. Therefore, I would like somebody to teach me how to do that.

Last week I plotted the time series curves for each of the 5,116 genes on the yeast 2 chips. In those datasets the transcriptome was measured in 10 to 30 minutes intervals. Finally I could see something on these plots that I know is true. Within the first 100 minutes of the cell cycle about half of the genes had either a big peak or a deep valley. I need help quantifying the exact percentages of these motifs. The cell cycle has driver and passenger genes. The driver genes drive the cell cycle forwards across its checkpoints. The passenger gene time series curves follow the expression pattern of the driver genes. That is why I would like to use these cell cycle datasets to define functional / regulatory units. Such units consist of at least one driver gene and its entire passenger gene with similar pattern. I need help in identifying genes that can be grouped together based on their time series curves.

I was supervised to see the reality of the cell cycle to be reflected in the time series plots for cell cycle data because I could not find definite reality resembling in all those time series plots where the time points of measurements were more than half of a yeast cell cycle apart. The yeast can divide in 2-3 hours. Some of the major cell cycle genes, such as RNR1 change by more than 128 fold within the period of one cell cycle. Therefore, even if there is a linear trend in an absolute reference frame (y=0), we might never see it because it is just chance whether we measure the expression of such cyclical genes when they have reached their maximum, their minimum or any level in between.

We have not been able to make much progress in understanding and manipulating the aging process. I was wondering for a long time why I could not find and aging related gene expression pattern in my time series plots. But since I could not find anything despite knowing that there must e something causes aging, I thought maybe we are looking for the wrong thing. We are primarily looking for trends in the affecting the absolute amount of transcription or translation. But maybe aging is not caused by such kind of changes in an absolute reference frame. Maybe aging is caused by relative temporal expression changes between groups of genes with respect to one another.

The gene expression pattern for many human genes might also be cyclical because of our circadian rhythm. Therefore it could be that the cyclical changes could totally overshadow linear changes. For our life processes to take place properly the expression of many sub-groups of genes must be temporally tightly controlled and regulated. For example, for sleep to occur, the eyes must be close. When I was younger I felt that there was no time gap from the time a fell asleep until I woke up in the next morning. But now I can tell that lots of time elapsed in between. This could be caused by a gradually increasing deregulation of gene expression pattern, which must be synchronized. Therefore, this timely deregulation of these initially totally synchronized processes could serve as a marker of aging. If their synchronization is completely lost then the life processes, which depend on this synchronization, can no longer take place; thus, causing death. If this is indeed the case then aging could be reversed by restoring synchronicity. Therefore, I would like to find out whether the initial synchronicity of cyclical co-expression is lost over time.

I feel that the cell cycle data is ideal in defining initial groups of co-expression. Out of the maybe 12-16 time points of measurements for my cell cycle data, maybe I should look, which genes behave like a group in the first 3 time points. The cell cycle regulating genes will be the driver genes. But I found just by visual inspection that many proteins of unknown functions related to lipids follow their expression pattern but with smaller variance; thus, having a much smaller range. I refer to these genes as passenger genes. I’d like to check whether the synchronicity of the initially very highly synchronized gene expression pattern has declined for the last 3 time points. If this is the case, then I’d like to check whether the synchronicity of expression is higher in the first than in the last replication. But for that we’d need new data since I am not aware that such kind of data already exists.

Can values expression between different genes be directly compared on an absolute scale for microarray and RNA Seq. data? I mean if the measured intensity of gene A is twice as high as for gene B, can I conclude that the expression of gene A is double of gene B? If there is indeed a relationship between time series curve and gene function then we can only find it if we can properly distinguish between the time series curves with an high enough resolution, which allows to distinguish between their functions. But from visual inspecting it looks like that the time series curves of genes of the same molecular function or pathway, which must work together, are not more correlated to one another than they are to all the other genes. I am looking for help to verify this programmatically. But if this is the case then trying to cluster together genes of the same function within a clique in a network must inevitably fail because most of the genes despite belonging to the same GO term could not be clustered together based in the similarity of the time series curves.

Is it actually generally assumed that the time series curves for genes involved in the same function are higher correlated to each other than they are to the remaining genes? Why is it actually that people appear to assume that in order to change the rate of a function many if not all of its genes must be change by the same factor? If I were evolution I would find it much easier to only change the rate of one rate-limiting protein / enzyme than having to regulate all other enzymes of the same pathway. I assume that there are certain pathways / functions / processes, which rate can be controlled by changing the expression of one or only very few of its rate-limiting enzymes. But then there appear to be other pathways / functions / processes for which almost all of their enzymes must be changed by about the same factor to affect the overall rate of this particular pathway / function /process. But we don’t seem to know for most of them whether they can be regulated by only changing the expression of a very few or almost all of their rate limiting enzymes to up-or down-regulate this particular pathway / function / process. But if this is the case then the concept on which GO term based gene enrichment is based is flawed because the speed of some pathways can be changed by changing the expression of only one of its genes verses many at the same time. I think it is wrong to conclude that if a pathway, for which more genes are differentially expressed, is more affected than another one, for which must fewer genes are differentially expressed but by a much larger factor. Who determines which genes will be grouped together into one particular GO-term and on which criteria are such kind of decision based? Every year the definitions for some GO terms are changed. This means that were wrong. Then, most likely, many GO terms, which we today believe are functionally or regulatory related, may not be. How, for example, can it be that genes of unknown function are assigned to a particular GO term? How is the entity of GO term defined? What criteria must a group of genes satisfy for being considered to form a particular GO-term? I think we need to compare expression changes for all genes belonging to a particular GO term and the resulting overall metabolic change of the entire GO-term to determine for each GO-term individually for how many of its members must the expression must change to achieve a particular overall change. But for such kind of analysis we’d need metabolomic, transcriptomic and proteomic data.

If you are interested in assisting me to somehow still find a way to earn a PhD in bioinformatics before my funding runs out please email me quickly at TFHahn@UALR.edu or send me a Skype contact request to my Skype user name, which is tfh002 or call me at my cell phone at 318 243 3940. You can also reach me at my Google Voice number. That number is (501) 301-4890 Please let me know if you have any questions. I am very much looking forward to any reply.

Thanks a lot in advance

Thomas

Bioconductor network Transcription-Factors • 4.5k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 8.3 years ago by tfhahn ▴ 50

0

Entering edit mode

Rather than trying to look for specific signatures, you could go one step back, and ask, whether your transcriptomes would allow to distinguish yeast of different age, and to predict their age (e.g.: building classifiers, or perhaps even linear models) – and then look at the genes (similar to Peters et al. 2015)

ADD REPLY • link 8.3 years ago by unksci ▴ 180