Entering edit mode
3.5 years ago
robert.murphy
▴
90
I have alignments of Busco genes present across all my assemblies. I want to get a final distance matrix based on these individual gene alignments so I can plot an MDS.
I plan to generate either a similarity matrix with Rs Seqinr package or a distance matrix with RaxML for each gene. Firstly are similarity and distance here just the same with reversed given that the distances are euclidean I believe?
Secondly, how can I combines all these matrices into a single one?
Wouldn't it make more sense to concatenate all alignments first and then proceed with your phylogenetic analysis based on a single MSA? Gene-wise calculation of distances gets rid of length information of different gene-based alignments unless you somehow bring them back into the calculus.
I am not sure I understand what you mean. I am taking all (for example) BUSCO
929at5338
genes across all 46 samples and performing an MSA. I am unsure how I could concatenate those alignments with all others into one and and use it for distance analysis.Could I not just multiple the distance value by gene length to weight every distance by thee length then sum all the dataframes together?