Hi All,
As as short introduction - I'm new to bioinformatics/gene sequencing, but I have background in mathematics/statistics. I did some basic reading in molecular biology, NGS technology and about steps in NGS data analysis. So, I'm planning to apply for funding for my PhD project, which is about developing more accurate statistical model/s to estimate gene- or transcript expression levels in the presence of shared reads and shared exons. I'm aware of the existence of such methods like: Salmon, eXpress, RSEM, Kallisto, Sailfish etc, but my potential PhD supervisor has couple of novel ideas that may work and may improve the current state of art. Sure it won't be easy, because these methods are already good, but this topic could be a good challenge for PhD.
1) In my project proposal, I need to describe precisely the economic impact of the project results on pharma/biotech industry. I could say that more accurate estimates, accounting for shared reads & exons, may lead to better understanding of mechanisms of diseases on molecular level and this eventually may results in developing new diagnostic test & medicines. Though, I think such reasoning is not enough. Do you have any ideas how to be more specific ? I need to convince jury with some real arguments. Should I provide some numbers/statistics ? I have in my mind mostly one application - Differential Expression, but maybe I should mention other applications ?
2) Some of you work in pharma/biotech private companies, do you think there is a need/space for other statistical methods which deal with mapping ambiguity (shared reads & exons) ? Perhaps Salmon is already good enough and no need to waste energy on new models?
3) Based on your experience, do shared reads & exons have serious impact on a data analysis ? Or you could just discard multireads most of the times ?
I don't have experience in this field, therefore your opinions, answers will be very appreciated.
Thanks for the suggestion about microbiome, I will take a closer look at this topic.