This is one of those things where the reality and desired course of action are divergent and data service providers seem to need to choose between what people think they want versus the complex realities of science. Are life scientists the proper audience to "democratically" decide what "primary" means?
In my opinion, the term "primary" leads people to believe that a subset of the transcripts is more important than the others - they will study these more, hence becoming a self-fulfilling prophecy of 'importance'. It sets back science rather than promoting it.
I can't see the benefit of a new terminology for things that are already defined. Clinically relevant, longest exons, high abundance, low abundance we all know what these words mean. Whatever temporary benefit of a seemingly consistent naming pattern might be, the information will start changing the next day. And now we have to deal with those changes via a new and potentially misleading term. Why not just call one set "clinically relevant (as of 2018)", the other "high abundance" etc and let people filter by those.
The real challenges are in matching/summarizing one data release versus the other (or across versions), finding out what the differences are in between, visualizing them easily.
What we really need are accurate transcripts, ways to annotate or filter transcripts based on observed abundances in tissues or conditions. What we need is information that helps cut down on the busy patchwork of "custom" little scripts to figure out simple information.
We had ~1900 unique users on Biostars in the last hour. Surely more of you can find the time to complete the survey :-D
Be fair, some of them work with proteins.
One thing to consider is that wet lab scientists come to these tools to find sequences for their uses. It's already hard enough to reconcile the gene name reported in a paper (e.g. Hsc70) with the myriad of things with that name (e.g. the 20 or so HSPA8s) before you get to the transcripts.
For a wet lab scientist, a primary transcript could be a very nice thing to see but depending on how it is defined may be misleading or incorrect. Some people tend to think of it as a case where one transcript is "the right one", the one that is the wild-type one found in their cells/animals/etc, and the remaining transcripts are special cases in the sense of "if you needed that one, you'd know". This isn't correct from a bioinformatics standpoint, but in a larger scope it makes sense.
In some sense, a primary transcript (or is best defined by whatever transcript has historically been used experimentally or referenced in literature. Even if that transcript isn't the most abundant/contains some odd allele/etc, the most important information about that gene comes from these pubs and in particular the wet experiments. We may think that in bioinformatics we can just pick the longest/highest abundance/etc and be okay, but we often interpret the significance of our findings largely through what the literature tells us. If we cite papers that refer to transcript A to impart significance on our findings on transcript B, we're in trouble. The same goes for the wet lab biologist, if they clone in transcript B based on all the papers on transcript A, they're in trouble.
Not a new problem, but I'm wondering if identifying a primary transcript will, on average, worsen or improve this issue.
This is the logic behind considering this option. It's a bit of a Wild Wild West at the moment, with people picking the one transcript they're going to study by fairly arbitrary means, and don't always pick the same one. If an authority has defined this, at least it will solve one problem.
Also, people ask us for primary transcripts all the time.
maybe relevant post here: How to tell which transcript is the canonical transcript?