API Reference#
- class optimask.OptiMask(n_tries=10, max_steps=32, random_state=None, verbose=False)#
OptiMask is a class for calculating the optimal rows and columns to retain in a 2D array or DataFrame to remove NaN values and preserve the maximum number of non-NaN cells. The class uses a heuristic optimization approach, and increasing the value of n_tries generally leads to better results, potentially reaching or closely approaching the optimal quantity.
- Parameters:
n_tries (int) – The number of optimization attempts. Higher values may lead to better results.
max_steps (int) – The maximum number of steps to perform in each optimization attempt.
random_state (Union[int, None]) – Seed for the random number generator.
verbose (bool) – If True, print verbose information during optimization.
from optimask import OptiMask import numpy as np # Create a matrix with NaN values m = 120 n = 7 data = np.zeros(shape=(m, n)) data[24:72, 3] = np.nan data[95, :5] = np.nan # Solve for the largest submatrix without NaN values rows, cols = OptiMask().solve(data) # Calculate the ratio of non-NaN values in the result coverage_ratio = len(rows) * len(cols) / data.size # Check if there are any NaN values in the selected submatrix has_nan_values = np.isnan(data[rows][:, cols]).any() # Print or display the results print(f"Coverage Ratio: {coverage_ratio:.2f}, Has NaN Values: {has_nan_values}") # Output: Coverage Ratio: 0.85, Has NaN Values: False
- solve(X: ndarray | DataFrame, return_data: bool = False) Tuple[ndarray, ndarray] | Tuple[Index, Index] #
Solves the optimal problem of removing NaNs for a 2D array or DataFrame.
- Parameters:
X (Union[np.ndarray, pd.DataFrame]) – The input 2D array or DataFrame with NaN values.
return_data (bool) – If True, returns the resulting data; otherwise, returns the indices.
- Returns:
If return_data is True, returns the resulting 2D array or DataFrame; otherwise, returns the indices of rows and columns to retain.
- Return type:
Union[Tuple[np.ndarray, np.ndarray], Tuple[pd.Index, pd.Index]]
- Raises:
ValueError – If the input data is not a numpy array or a pandas DataFrame, or if the input numpy array does not have ndim==2, or if the OptiMask algorithm encounters an error during optimization.