Hello. I would like to have an idea for gene presence and absence determination for pangenome analysis.
I have built a pangenome from several yeast strains of interest by merging each CDS and eliminate the redundantly using CD-HIT. Then, I mapped Illumina reads from each strain against the pangenome individually, generated sorted-bam files and calculated mean coverages by qualimap. Based on this information (e.g., mean coverage), I want to determine each CDS cover ratio and define the CDS presence if 95% of it is covered by reads. Do you have any idea how to run this step or a better idea? By looking over previous discussions, I have tried to use samtools depth and got read depth per each single base location. However, I still wondering how to transform the data, calculate each CDS cover ratio, and use for pangenome analysis. Thank you for your support.