Question

How far pangenome from being the mainstream?

1

Entering edit mode

8 months ago

guntul ▴ 40

Hi everyone, pangenomes' popularity and usage scale over linear reference seems to be increasing rapidly. What are the challenges or issues that should be solved before it is the mainstream over linear reference?

pangenome • 588 views

ADD COMMENT • link updated 8 months ago by dsull ★ 7.6k • written 8 months ago by guntul ▴ 40

score 2 · Answer 1 · 2024-09-27

I don’t think it will ever be “mainstream” over a linear reference. A linear reference is straightforward, easy, and effective for my studies, so it’s all I need. I will only use the pangenome if it can give me an answer to a question that I’m interested in that other approaches can’t, not because it can map a small fraction of my sequencing reads more accurately under certain circumstances or because it could find some satellites or variants that I don’t particularly care about.

On a less pessimistic note, it is a very cool idea and I do think it has important applications; i just don’t think it will be “mainstream” in the sense that it will replace what we already have (it will merely add to it).

E.g. I can’t imagine that people will be making a “pangenome Cell Ranger” as the default option for their 10x single-cell analyses, but I’ll be happy to revisit this comment in a few years. :)

score 2 · Answer 2 · 2024-09-27

I think pangenomics will become mainstream, but still far less frequently used than linear methods, in about 5-10 years.

There are many technical and speed issues with short read mapping to many pangenomes (at least, the many I have built and attempted), even with tiny plant references like 3 Arabidopsis sequences.

Long read mapping looks better as some tools are quicker and there are less long reads to start with.

SNP callers for pangenomes remain very sparse, as do SV callers.

General toolkits are lacking for many basic conversion and visualization functions, and standard formats are not at all as understandable and efficient as BAM/CRAM for example.

I believe many more computational groups need to get involved and build more robust and efficient software, but linear references are far easier to understand and work on.

I maintain together with the community this list of tools by the way: https://github.com/colindaven/awesome-pangenomes