Question

Error with numpy integer during Boruta analysis

1

Entering edit mode

12 months ago

naveennavinbeast • 0

I am doing ML analysis for feature selection of my data I am using Boruta analysis. But I am getting error when I pass my array into boruta fit. Can any one help me resolve the issue.

The code is:

from boruta import BorutaPy
from sklearn.ensemble import RandomForestClassifier

X1 = final_data.drop("sample_title", axis=1)
y1 = final_data.loc[:, ["sample_title"]]

# Convert y1 to numeric values (assuming they are in string format)
ya = y1.astype(float).values.ravel()
Xn = X1.astype('int_')
yb = ya.astype('int_')
rf1 = RandomForestClassifier(n_estimators=100, random_state=1)
boruta_feature = BorutaPy(rf1, n_estimators='auto', random_state=1)
boruta_feature.fit(Xn, ya)

# Check which features are selected (support)
selected_features = X1.columns[boruta_feature.support_].to_list()
print('Accepted features:', selected_features)

The error message is

AttributeError                            Traceback (most recent call last)
Cell In[57], line 13
     11 rf1 = RandomForestClassifier(n_estimators=100, random_state=1)
     12 boruta_feature = BorutaPy(rf1, n_estimators='auto', random_state=1)
---> 13 boruta_feature.fit(Xn, ya)
     15 # Check which features are selected (support)
     16 selected_features = X1.columns[boruta_feature.support_].to_list()

File ~/.pyenv/versions/3.10.11/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
    188 def fit(self, X, y):
    189     """
    190     Fits the Boruta feature selection with the provided estimator.
    191 
   (...)
    198         The target values.
    199     """
--> 201     return self._fit(X, y)

File ~/.pyenv/versions/3.10.11/lib/python3.10/site-packages/boruta/boruta_py.py:260, in BorutaPy._fit(self, X, y)
    255 _iter = 1
    256 # holds the decision about each feature:
    257 # 0  - default state = tentative in original code
    258 # 1  - accepted in original code
    259 # -1 - rejected in original code
--> 260 dec_reg = np.zeros(n_feat, dtype=np.int)
    261 # counts how many times a given feature was more important than
    262 # the best of the shadow features
    263 hit_reg = np.zeros(n_feat, dtype=np.int)

File ~/.pyenv/versions/3.10.11/lib/python3.10/site-packages/numpy/__init__.py:324, in __getattr__(attr)
    319     warnings.warn(
    320         f"In the future `np.{attr}` will be defined as the "
    321         "corresponding NumPy scalar.", FutureWarning, stacklevel=2)
    323 if attr in __former_attrs__:
--> 324     raise AttributeError(__former_attrs__[attr])
    326 if attr == 'testing':
    327     import numpy.testing as testing

AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Copy pasted the whole error message for better understanding. I am a beginner but tried several possibilities for three days and then posting the error

ML numpy Boruta • 1.4k views

ADD COMMENT • link updated 12 months ago by Ram 44k • written 12 months ago by naveennavinbeast • 0

1

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY • link 12 months ago by Ram 44k

0

Entering edit mode

Thankyou Dear Ram. Apologies for the way that I posted. But this is my first time that I am using Biostars. Hereafter I will follow the way you have instructed me. Thanks for your generous help in structring my query and suggestions in the way to post. Hope the way you edited will give better understanding for experts to give me valuble inputs

ADD REPLY • link 12 months ago by naveennavinbeast • 0

0

Entering edit mode

I'm going to assume there was a misunderstanding because of the way I phrased my initial comment and that's why all of your text is in code formatting.

The instructions were given to help you format the code parts of your post as code. Formatting all of it as code takes away the meaning from formatting the computer generated/input-output parts of the post as code. Look at how I formatted your post and see if you can notice the difference between the parts I used code formatting for and the parts I did not use it for.

ADD REPLY • link 12 months ago by Ram 44k

Ram · Answer 1 · 2023-11-13

Click on this link: https://github.com/scikit-learn-contrib/boruta_py/blob/master/boruta/boruta_py.py and paste the code inside your notebook and run this command

rf = RandomForestClassifier(n_jobs=-1, class_weight='balanced', max_depth=5)
boruta_selector = BorutaPy(rf, n_estimators='auto', verbose=2,max_iter=1000, random_state=1)
boruta_selector.fit(selfeat.values, y_combined)