Hi, I have some queries related to Pangenome. I have 8 genome assemblies of single eukaryotic species, could I make a pan-genome from this? Should I follow the Denovo approach method? Resequencing data is not available.
Hi, I have some queries related to Pangenome. I have 8 genome assemblies of single eukaryotic species, could I make a pan-genome from this? Should I follow the Denovo approach method? Resequencing data is not available.
I maintain an open list of pangenome resources and tools here: https://github.com/colindaven/awesome-pangenomes
For beginners, I would look at minigraph by Heng Li -especially because its performant and robust - , but then also consider PGGB, which is more compatible with the vg and odgi toolkit ecosystem.
Theres' a lot to learn with new formats for references and read alignments, and many roads to consider. Eukaryotic pangenome software best practices do not exist yet.
It depends on what your question is!
Are you assemblies high-quality (~HiFi) or are they draft-level assemblies? People usually use high-quality assemblies to build graph pangenomes (see https://www.nature.com/articles/s41587-023-01793-w#Sec18 or https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9958381/ )
Are you interested in only gene content? Then running something like Orthofinder with the 8 annotations might already be enough. Are you interested in TE content? Look at PanEDTA https://github.com/oushujun/EDTA We've built many Illumina-only based pangenomes in the past since 2016, you might now struggle to publish such a short-read-only pangenome...
bioinfo -
First, one can make a pangenome from as few as 2 genomes, though the practice is not meaningful until N > = 3.
Regarding the best, most up-to-date answer: hard to provide it. the 'currently best' answer is changing very rapidly as HPRC and other groups publish additional analyses, head to head reviews, etc.
Biorxiv abounds with recent articles on pangenome assembly, both for human subpopulations and in comparative genomics alike. To get started, you might google "[eukaryotic] genome assembly" and filter out everything more than 1 or 2 years old.
It could also be worth google pangenome assembly then your species name, to identify potential colleagues with like goals.
Thank you for your response sir, I am new to this topic pangenomics, and I want to know the basics in one eukaryotic species, it has 5 assemblies on NBCI, and no resequencing WGS data is available, I want to know if can I proceed with these assemblies to construct the pangenome pipeline? I have not seen any paper regarding this that they have used only single species assemblies to construct the pangenomics.
I guess you can proceed to running a pangenome pipeline but if you don't have a unique biological question publishing a pangenome based on draft-level genomes will be hard; the most recent pangenomes are all based on almost-complete HiFi genomes.
My former group has been publishing the pangenomes you describe back since about 2016, https://www.nature.com/articles/ncomms13390 , but we also struggled to publish similar pangenomes starting about 2018/2019.
Excellent points above. Regardless of N3 or more, an ideal check would be to have whole genome blast using NW algorithm and check for duplications across the genomes.
Simple awk liners could be used to check the amount of duplications from m8 file. That will also allow us to check conserved regions and find appropriate genomes for pangenome assembly
Prash
Yes, please go ahead but you have to employ de novo based approach as well to check the alignments before you reach consensus!
https://www.nature.com/articles/s41588-018-0273-y This is Sherman et al's work and you may go through that!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
nice - will bookmark this
colindaven - do you know if de bruijn graphs have a formalizable relationship to steiner sets?
VAL