My aim is to eliminate dupliccation in dataframe
i wrote a program that determine variables that have the same values in row 17 , next the program put these variables in other data and calculate correlation matrix , i set percentage of this correlation matrix to be 95% it means the program create vector that contain only variables names that correlated more than 95%
for example vector contain name of variables
>Vector
"MT91" "MT92" "MT93"
i want to use this vector to calculate the sum of these variables in all the other lines
for example i have this data :
Name
MT91 MT93 MT92 MT95
QC_G1 70027.02132 95774.1359 100 24
QC_G2 69578.18634 81479.29575 200 45
QC_G3 69578.18634 87021.95427 10 42545
QC_G4 68231.14338 95558.76738 1000 425
QC_G5 64874.12936 96780.77245 7000 4545
QC_G6 63866.65780 91854.35304 19 455
Ctr1 66954.38799 128861.36163 199 2424
Ctr2 97352.55229 101353.25927 155 344
Ctr3 1252.42545 115683.73755 188 3434
Bti1 81873.96379 112164.14229 1222 444
Bti2 84981.21914 0.00000 100 3443
Bti3 36629.02462 124806.49101 188 3434
Bti4 0.00000 109927.26425 122 1000
rt 13.90181 13.90586 12 13
So i want to use the vector to calculate the sum of each variables in all the rows except the 17th row , after that i want to keep only the variable that have the highest sum, as you can see it's my vector contain the variables : "MT91" "MT92" "MT93" and it's MT93 that have the highest sum in the 16 rows so i want to eliminate MT91 and MT92
The result will be :
MT93 MT95
QC_G1 95774.1359 24
QC_G2 81479.29575 45
QC_G3 87021.95427 42545
QC_G4 95558.76738 425
QC_G5 96780.77245 4545
QC_G6 91854.35304 455
Ctr1 128861.36163 2424
Ctr2 101353.25927 344
Ctr3 115683.73755 3434
Bti1 112164.14229 444
Bti2 0.00000 3443
Bti3 124806.49101 3434
Bti4 109927.26425 1000
rt 3.90586 13
Note that the vector is generated by the program that will generate a lot of vectors (i'm using for loops) so i don't know the length of the vectors neither the name of the variables in the loops
Please tell me if you want any clarification Thank you