Dear community. I have a rather broad question I would like have your input on. I am trained in structural biology and proteomics, and following the needs of my project, I have recently ventured into learning how to analyze transcript level RNA-seq data. Like in proteomics and protein identification, my fellow biologists seem recalcitrant to the idea that RNA-seq experimental data can be used to infer gene isoforms and even better calculate their abundances. It is literally impossible to convey the message that the excel spread sheet they get at the end of the analysis is not a list of predictions one needs to validate with a vast amount of convoluted PCR and/or cloning experiments. In the case of mass spec data I have well funded arguments to show that an MS/MS fragmentation pattern explains a peptide sequence or a phosphorylation site for example, if the statistical parameters are good enough. In the case of isoform reconstruction from RNA-seq data, I am not sure I have all the arguments at hand. I would therefore appreciate if any (or many) of you could give me the point of view of bioinformaticists. A few specific questions are: Are the bioinformatics tools available (e.g. TopHat-Cufflinks-Cuffdiff), mature enough to reconstruct isoforms? Furthermore, in a reference-guided analysis, what can I make of novel isoforms, in particular those tagged as class_code J? Your input will be appreciated. G.
This post seems relevant: http://www.biostars.org/post/show/16649/how-are-rnaseq-transcripts-assigned/#16649
Very good thread, thanks lots!