Pan genome based assembly
5
0
Entering edit mode
19 months ago
bioinfo223 ▴ 10

Hi, I have some queries related to Pangenome. I have 8 genome assemblies of single eukaryotic species, could I make a pan-genome from this? Should I follow the Denovo approach method? Resequencing data is not available.

denovo ngs • 2.7k views
ADD COMMENT
6
Entering edit mode
19 months ago

I maintain an open list of pangenome resources and tools here: https://github.com/colindaven/awesome-pangenomes

For beginners, I would look at minigraph by Heng Li -especially because its performant and robust - , but then also consider PGGB, which is more compatible with the vg and odgi toolkit ecosystem.

Theres' a lot to learn with new formats for references and read alignments, and many roads to consider. Eukaryotic pangenome software best practices do not exist yet.

ADD COMMENT
0
Entering edit mode

nice - will bookmark this

ADD REPLY
0
Entering edit mode

colindaven - do you know if de bruijn graphs have a formalizable relationship to steiner sets?

VAL

ADD REPLY
3
Entering edit mode
19 months ago

It depends on what your question is!

Are you assemblies high-quality (~HiFi) or are they draft-level assemblies? People usually use high-quality assemblies to build graph pangenomes (see https://www.nature.com/articles/s41587-023-01793-w#Sec18 or https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9958381/ )

Are you interested in only gene content? Then running something like Orthofinder with the 8 annotations might already be enough. Are you interested in TE content? Look at PanEDTA https://github.com/oushujun/EDTA We've built many Illumina-only based pangenomes in the past since 2016, you might now struggle to publish such a short-read-only pangenome...

ADD COMMENT
0
Entering edit mode

Hello sir, Thank you for the response, I have complete level assembly and draft one also. I need to know can I proceed for pangenome, if i am taking all assemblies from the public database - NCBI. because I dont have resequencing data.

ADD REPLY
2
Entering edit mode
19 months ago
LauferVA 4.5k

bioinfo -

First, one can make a pangenome from as few as 2 genomes, though the practice is not meaningful until N > = 3.

Regarding the best, most up-to-date answer: hard to provide it. the 'currently best' answer is changing very rapidly as HPRC and other groups publish additional analyses, head to head reviews, etc.

Biorxiv abounds with recent articles on pangenome assembly, both for human subpopulations and in comparative genomics alike. To get started, you might google "[eukaryotic] genome assembly" and filter out everything more than 1 or 2 years old.

It could also be worth google pangenome assembly then your species name, to identify potential colleagues with like goals.

ADD COMMENT
0
Entering edit mode

Thank you for your response sir, I am new to this topic pangenomics, and I want to know the basics in one eukaryotic species, it has 5 assemblies on NBCI, and no resequencing WGS data is available, I want to know if can I proceed with these assemblies to construct the pangenome pipeline? I have not seen any paper regarding this that they have used only single species assemblies to construct the pangenomics.

ADD REPLY
1
Entering edit mode

I guess you can proceed to running a pangenome pipeline but if you don't have a unique biological question publishing a pangenome based on draft-level genomes will be hard; the most recent pangenomes are all based on almost-complete HiFi genomes.

My former group has been publishing the pangenomes you describe back since about 2016, https://www.nature.com/articles/ncomms13390 , but we also struggled to publish similar pangenomes starting about 2018/2019.

ADD REPLY
0
Entering edit mode

Thank you for your response sir, But in this paper they have already done some sequencing part.

ADD REPLY
1
Entering edit mode
19 months ago
Prash ▴ 280

Excellent points above. Regardless of N3 or more, an ideal check would be to have whole genome blast using NW algorithm and check for duplications across the genomes.

Simple awk liners could be used to check the amount of duplications from m8 file. That will also allow us to check conserved regions and find appropriate genomes for pangenome assembly

Prash

ADD COMMENT
0
Entering edit mode

Hello sir, Thank you for the response, I have complete level assembly and draft one also. I need to know can I proceed for pangenome, if i am taking all assemblies from the public database - NCBI. because I dont have resequencing data.

ADD REPLY
0
Entering edit mode
19 months ago
Prash ▴ 280

Yes, please go ahead but you have to employ de novo based approach as well to check the alignments before you reach consensus!

ADD COMMENT
0
Entering edit mode

Sir, could you share any paper regarding this, so it would be easy for me to follow the pipeline?

ADD REPLY
0
Entering edit mode

https://www.nature.com/articles/s41588-018-0273-y This is Sherman et al's work and you may go through that!

ADD REPLY

Login before adding your answer.

Traffic: 1608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6