Extract the Variables with the Most Influence on the Ordination in an LDA
0
0
Entering edit mode
4 months ago
Mgs3 ▴ 30

I performed an LDA analysis on my data, and I want to extract the variables that most impacted the ordination. If I understood correctly, I can obtain that information from the "scaling" component of the LDA object. In the script below, I use lda_variable_impact <- as.data.frame(novaseqmod$scaling). Can someone confirm if this is correct?

Additional information: The input file "full" contains samples in columns and bacterial genus in rows, with the percentage count of bacteria in each sample.The "metadata" file associate each sample to a condition

This is my code:

    full[1:4,1:4]                                                                                                    1
Bacteria;Acidobacteriota;Blastocatellia;Blastocatellales;Blastocatellaceae;Stenotrophobacter 0.14613673
Bacteria;Planctomycetota;Planctomycetes;Gemmatales;Gemmataceae;Fimbriiglobus                 0.09421973
Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Comamonadaceae;Ideonella         0.00000000
Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Rhodobacter     0.02412305
                                                                                                      2
Bacteria;Acidobacteriota;Blastocatellia;Blastocatellales;Blastocatellaceae;Stenotrophobacter 0.13213317
Bacteria;Planctomycetota;Planctomycetes;Gemmatales;Gemmataceae;Fimbriiglobus                 0.04552392
Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Comamonadaceae;Ideonella         0.02481054
Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Rhodobacter     0.03015960
                                                                                                      3
Bacteria;Acidobacteriota;Blastocatellia;Blastocatellales;Blastocatellaceae;Stenotrophobacter 0.06054529
Bacteria;Planctomycetota;Planctomycetes;Gemmatales;Gemmataceae;Fimbriiglobus                 0.14525729
Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Comamonadaceae;Ideonella         0.01825357
Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Rhodobacter     0.03187948
                                                                                                      4
Bacteria;Acidobacteriota;Blastocatellia;Blastocatellales;Blastocatellaceae;Stenotrophobacter 0.09844844
Bacteria;Planctomycetota;Planctomycetes;Gemmatales;Gemmataceae;Fimbriiglobus                 0.02838332
Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Comamonadaceae;Ideonella         0.00000000
Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Rhodobacter     0.04247574



    # LDA
nrep <- 1000
ldares <- data.frame(novaseq = rep(NA, nrep))

for (aaa in 1:nrep) {
  cat("Replicate", aaa, "out of", nrep, "\n")
  # Scale for LDA
  scalenovaseq <- data.frame(scale(t(full)))
  scalenovaseq$type <- metadata[, myvar][match(row.names(scalenovaseq), metadata$Sample_name)]
  scalenovaseq <- scalenovaseq[order(as.numeric(row.names(scalenovaseq))), ]

  # Create training set and test set
  chooseme <- sample(c(TRUE, FALSE), nrow(scalenovaseq), replace = TRUE, prob = c(0.7, 0.3))
  trainnovaseq <- scalenovaseq[chooseme, ]
  testnovaseq <- scalenovaseq[!chooseme, ]

  # Remove variables that are not variable within condition
  sdnovaseq <- apply(trainnovaseq[, !names(trainnovaseq) %in% "type"], 2, sd)
  removeme <- sdnovaseq < 0.01
  trainnovaseq <- trainnovaseq[, !removeme]
  testnovaseq <- testnovaseq[, !removeme]

  # Create LDA model
  novaseqmod <- lda(type ~ ., data = trainnovaseq)
  novaseqpredicted <- predict(novaseqmod, testnovaseq)
  ldares[aaa, "novaseq"] <- mean(novaseqpredicted$class == testnovaseq$type)
}

# Calculate explained variance for each linear discriminant
explained_variance <- novaseqmod$svd^2 / sum(novaseqmod$svd^2)

# Extract and save LDA loadings
lda_variable_impact <- as.data.frame(novaseqmod$scaling)
biostatistics LDA r • 285 views
ADD COMMENT
0
Entering edit mode

cross posted: https://stackoverflow.com/questions/78777739

Please mind that posting the same question to multiple sites can be perceived as bad etiquette, because efforts may be made to address a problem that has already been solved elsewhere in the meantime."

ADD REPLY

Login before adding your answer.

Traffic: 1565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6