Dear bioinformaticians, I am posting this question on behalf of another researcher, who needs help.
"Hello,
I have a set of co-expressed genes from the human genome and would like to find common transcription factor binding site from them (or a subset). My biological story is already written up and, based on that, I like to get certain set of genes to show up in the analysis. Therefore, I am thinking about the following strategy. I will try various online databases or services with all of my co-expressed genes and pick up and cite the one that shows the highest number of my preferred genes. Is that acceptable? How do the reviewers verify the predicted transcription factor binding sites, or do they accept the program and claims at face value? I come from a psychology background and do not know any statistics or bioinformatics. Any help is welcome."
"
Edit. I am trying to learn how bioinformaticians handle the above kind of 'scams', when they read or review a paper. For example, let me consider the first suggestion of oPOSSUM or MEME. An author tries both programs and sees 'expected' result with MEME. In his paper, he reports that MEME gave him a motif with certain short list of 'expected' genes and ignores the oPOSSUM result. The paper will look more sophisticated in terms of bioinformatic analysis than someone who did not try to look for promoter binding sites. Given that we have so many published software programs for every stage of analysis, an unethical user can bias each step to get to the 'right' biological result and publish in a top journal. How do you guys handle such issues? Based on what I experienced so far, most (bioinformatics) reviewers are happy, if the paper speaks the right statistical/bioinformatic lingo, and leaves the biological or medical part to the 'biologist expert'. With so many tools out there, isn't there room for huge subjective bias in the whole process? What are the rules to evaluate the judgement of the expert biologists? How do we know that an entire subfield is not being biased through the opinions of few experts?
On the other hand, we do not (and possibly cannot) require each author to use every software tool and report all results, and then ask the expert biologist to evaluate all options. That will require the biologist to learn and understand the algorithmic difference between programs, which is nearly impossible. Neither can we require the biologist to check each selected gene in the lab before saying anything about the experiment. Moreover, with the biologist typically being in control of grant and thus the entire process, the bioinformatician has less room to play differently and voice his opinion.
Under those considerations, how do we make sure that an entire subfield is not being created to 'defraud' the larger scientific community?
Among various types of popular programs, (a) TF binding site prediction software, (b) miRNA target prediction software and (c) gene analysis based on positive selection often appear to be biased in my opinion.
The quotes indicate that you are citing someone verbatim. Yet I have a hard time imagining any scientist stating the above.
I'm sure that will all come out in the blog.
Istvan, what I wrote is based on a paper that I just went through. It appears very sound and sophisticated in terms of the steps of bioinformatic analysis and statistical jargon, but I cannot be sure that they did not shop for the right promoter binding package. How do you judge the validity of that particular step? I often tend to have similar questions about (a) miRNA target prediction programs, (b) positive selection, but most reviewers like to see the calculation done than not being done, and the papers appear sophisticated with those bioinformatics blocks. However, the biological description always appears subjective ("Our set of 97 genes includes gene X previously known to be related to aging. Therefore our analysis tells the truth.") and yet the bioinformatics behind it is from a well-cited paper and hence technically sound. Still there is huge room for subjectivity. What are the criteria for evaluating such papers?
On a similar note, few weeks back I was at a talk at University of Washington by a professor, who is setting up their cancer detection pipeline. He described a large number of alignment programs, GATK, etc. and mentioned that they use very strict statistical cutoffs to make sure they got only a handful of (100-200) variants. Then he mentioned that the small set of variants is then reviewed by a panel of three senior cancer biologists, which included Mary Claire King, to make sure everything. To me, the last step appears to be the place for huge human bias, but most bioinformaticians I know operate at the beck and call of some biologist or medical doctor. How do you judge the validity of papers then?
one problem is that the computational steps are also ripe with bias, from the choice of parameters to the order various operations took place, the order at which samples were merged and normalized etc. so having human oversight is not necessarily bad. What that human actually does matters more.
You can try to play around with oPOSSUM or Find The List Of Tf Likely To Bind To A Promoter Or A Genome Location.
Hi, the tags you choose should reflect the content of your question not your background, or are you trying to make a scam?. The intention of the question the person is asking is not very clear to me (edit: it is indeed very clear, I call it fraud), or are you still playing mindgames on us or are you still trying to test us? Are you implicitly trying to post something "provokative" as a way of criticism of current practices or are you really trying to get support for forging facts? Or is it a joke. Please help me out. Regards.
Hello Michael,
I apologize if my post comes across differently, but I am trying to learn how bioinformaticians handle the above kind of 'scams', when they read or review a paper. For example, let me consider the previous suggestion of oPOSSUM or MEME. Let us say, an author tries both programs and sees 'expected' result with MEME. In his paper, he reports that MEME gave him a motif with certain short list of 'expected' genes and ignores the oPOSSUM result. The paper will look more sophisticated in terms of bioinformatic analysis than someone who did not try to look for promoter binding site. Given that we have so many published software programs for every stage of analysis, an unethical user can bias each step to get to the 'right' biological result and publish in a top journal. How do you guys handle such issues? Based on what I experienced so far, most (bioinformatics) reviewers are happy, if a paper speaks the right statistical/bioinformatic lingo and leaves the biological or medical part to the 'expert'. With so many tools out there, isn't there room for huge subjective bias in the whole process? What are the rules to evaluate the judgement of the expert biologists?
I understand your motivation quite well now and I see the point. But why didn't you ask this question directly, instead of making up some story around it, which looked just like trying to be provocative (but failing). I am pretty much a supporter of asking directly and on topic (without irony, catchy stories, and the like), and I am convinced this suits the format of BioStar best, even though it might turn out to be more boring.
If you allow me, I can replace my original question with the above paragraph. There is no specific reason for asking in one way versus another, and I thought my selected way of posing the question was interesting, and with appropriate tags and quotes, I could make the intention fairly clear.