Question

Pangenome: Gene presence and absence analysis

0

Entering edit mode

5.2 years ago

ysas ▴ 10

Hello. I would like to have an idea for gene presence and absence determination for pangenome analysis.

I have built a pangenome from several yeast strains of interest by merging each CDS and eliminate the redundantly using CD-HIT. Then, I mapped Illumina reads from each strain against the pangenome individually, generated sorted-bam files and calculated mean coverages by qualimap. Based on this information (e.g., mean coverage), I want to determine each CDS cover ratio and define the CDS presence if 95% of it is covered by reads. Do you have any idea how to run this step or a better idea? By looking over previous discussions, I have tried to use samtools depth and got read depth per each single base location. However, I still wondering how to transform the data, calculate each CDS cover ratio, and use for pangenome analysis. Thank you for your support.

sequencing pangenome • 1.1k views

ADD COMMENT • link updated 21 months ago by colindaven 7.0k • written 5.2 years ago by ysas ▴ 10

score 0 · Answer 1 · 2023-03-08

0

Entering edit mode

21 months ago

colindaven 7.0k

Very late (only 3.5 years), but I have used odgi pav successfully for this. There is a nice tutorial here:

https://odgi.readthedocs.io/en/latest/rst/tutorials/presence_absence_variants.html

ADD COMMENT • link 21 months ago by colindaven 7.0k