Question

Extract the Variables with the Most Influence on the Ordination in an LDA

0

Entering edit mode

9 months ago

Mgs3 ▴ 30

I performed an LDA analysis on my data, and I want to extract the variables that most impacted the ordination. If I understood correctly, I can obtain that information from the "scaling" component of the LDA object. In the script below, I use lda_variable_impact <- as.data.frame(novaseqmod$scaling). Can someone confirm if this is correct?

Additional information: The input file "full" contains samples in columns and bacterial genus in rows, with the percentage count of bacteria in each sample.The "metadata" file associate each sample to a condition

This is my code:

    full[1:4,1:4]                                                                                                    1
Bacteria;Acidobacteriota;Blastocatellia;Blastocatellales;Blastocatellaceae;Stenotrophobacter 0.14613673
Bacteria;Planctomycetota;Planctomycetes;Gemmatales;Gemmataceae;Fimbriiglobus                 0.09421973
Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Comamonadaceae;Ideonella         0.00000000
Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Rhodobacter     0.02412305
                                                                                                      2
Bacteria;Acidobacteriota;Blastocatellia;Blastocatellales;Blastocatellaceae;Stenotrophobacter 0.13213317
Bacteria;Planctomycetota;Planctomycetes;Gemmatales;Gemmataceae;Fimbriiglobus                 0.04552392
Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Comamonadaceae;Ideonella         0.02481054
Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Rhodobacter     0.03015960
                                                                                                      3
Bacteria;Acidobacteriota;Blastocatellia;Blastocatellales;Blastocatellaceae;Stenotrophobacter 0.06054529
Bacteria;Planctomycetota;Planctomycetes;Gemmatales;Gemmataceae;Fimbriiglobus                 0.14525729
Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Comamonadaceae;Ideonella         0.01825357
Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Rhodobacter     0.03187948
                                                                                                      4
Bacteria;Acidobacteriota;Blastocatellia;Blastocatellales;Blastocatellaceae;Stenotrophobacter 0.09844844
Bacteria;Planctomycetota;Planctomycetes;Gemmatales;Gemmataceae;Fimbriiglobus                 0.02838332
Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Comamonadaceae;Ideonella         0.00000000
Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Rhodobacter     0.04247574



    # LDA
nrep <- 1000
ldares <- data.frame(novaseq = rep(NA, nrep))

for (aaa in 1:nrep) {
  cat("Replicate", aaa, "out of", nrep, "\n")
  # Scale for LDA
  scalenovaseq <- data.frame(scale(t(full)))
  scalenovaseq$type <- metadata[, myvar][match(row.names(scalenovaseq), metadata$Sample_name)]
  scalenovaseq <- scalenovaseq[order(as.numeric(row.names(scalenovaseq))), ]

  # Create training set and test set
  chooseme <- sample(c(TRUE, FALSE), nrow(scalenovaseq), replace = TRUE, prob = c(0.7, 0.3))
  trainnovaseq <- scalenovaseq[chooseme, ]
  testnovaseq <- scalenovaseq[!chooseme, ]

  # Remove variables that are not variable within condition
  sdnovaseq <- apply(trainnovaseq[, !names(trainnovaseq) %in% "type"], 2, sd)
  removeme <- sdnovaseq < 0.01
  trainnovaseq <- trainnovaseq[, !removeme]
  testnovaseq <- testnovaseq[, !removeme]

  # Create LDA model
  novaseqmod <- lda(type ~ ., data = trainnovaseq)
  novaseqpredicted <- predict(novaseqmod, testnovaseq)
  ldares[aaa, "novaseq"] <- mean(novaseqpredicted$class == testnovaseq$type)
}

# Calculate explained variance for each linear discriminant
explained_variance <- novaseqmod$svd^2 / sum(novaseqmod$svd^2)

# Extract and save LDA loadings
lda_variable_impact <- as.data.frame(novaseqmod$scaling)

biostatistics LDA r • 406 views

ADD COMMENT • link updated 9 months ago by Pierre Lindenbaum 166k • written 9 months ago by Mgs3 ▴ 30

0

Entering edit mode

cross posted: https://stackoverflow.com/questions/78777739

Please mind that posting the same question to multiple sites can be perceived as bad etiquette, because efforts may be made to address a problem that has already been solved elsewhere in the meantime."

ADD REPLY • link 9 months ago by Pierre Lindenbaum 166k