Searching for tool to condense ~900 draft genomes from different isolates into one mega sequence
0
0
Entering edit mode
2.7 years ago
Dr. Jason • 0

Hi,

This question is a possible duplicate to : Bulding a pangenome consensus from many individuals. I'm reposting it since I have run into a similar issue.

Background: We observe high variability of antibiotic resitance phenotype within different isolates of 1 bacterial species. To identify the genetic component encoding this variability I'm running a GWAS analysis (using DBGWAS) of ~ 900 isolates with their genome sequences. The output of GWAS gives me ~ 8000 short k-mer sequences (30 -70 nts) which point towards genes/regions from the input draft genomes which could be responsible for the phenotype. Usually the next step is to manually filter the output and find significant candidate genes, but it's often quite difficult to do this manually. One of the workaround I have found is to align all the 8000 k-mer sequences to the reference genome of the bacteium using bowtie. This works quite well and I can visuallise which genomic region has most of the k-mers concentrated at. enter image description here

Problem: Since the gene/region responsible for the phenotype wouldn't be present in all the isolates I am having difficulty choosing a reference to align all the k-mers to. RIght now I'm using roary to build a pangeome out of all the draft genomes. As far as I understand, this would just result in a core-gene set + accessory gene set (or pan-proteome), the problem is I think this would a loose all the snp variation, intergenic regions.

Is there a better tool to condensemultiple genomes together, removing duplicate genes, but maintaining variant genes, intergenic regions etc to output a reference genome containing all the variant information.

Thank you!

pangenome • 777 views
ADD COMMENT
1
Entering edit mode

Is there a better tool to condensemultiple genomes together, removing duplicate genes, but maintaining variant genes, intergenic regions etc to output a reference genome containing all the variant information.

Maybe you can build pan-genome graph with Pandora

ADD REPLY
0
Entering edit mode

This looks interesting. I understand it makes de bruijn graphs for the pan-genome. But do you know what kind of format a pan-genome graph is? would it be a .bam file or a multi-fasta? i'm just wondering if practically I could use bowtie to map k-mers to the pan-genome graph sequence file?

ADD REPLY
0
Entering edit mode

I have never tryied this before, but looking at the documentation the reference graph looks like a multi-fasta file of a specific genomic region shared by all the strains used to build the graph: link. This graph is later used in Pandora to output VCF file with all the variants detected in graph.

I must say that this pipeline looks quite complex so I would try the tutorial first

ADD REPLY

Login before adding your answer.

Traffic: 1960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6