philsupertramp/game-math
|
#include <Scaler.h>
Public Member Functions | |
StandardScaler (bool withMeans=true, bool withStd=true) | |
void | fit (const Matrix< double > &X, const Matrix< double > &y) override |
Matrix< double > | transform (const Matrix< double > &in) override |
Public Member Functions inherited from Transformer | |
Matrix< double > | predict (const Matrix< double > &in) override |
virtual void | fit (const Matrix< double > &X, const Matrix< double > &y)=0 |
virtual Matrix< double > | predict (const Matrix< double > &)=0 |
virtual Matrix< double > | transform (const Matrix< double > &)=0 |
Public Attributes | |
Matrix< double > | means |
Matrix< double > | std_deviations |
Private Attributes | |
bool | with_std = true |
bool | with_means = true |
Standarize features by removing the mean and scaling to unit variance.
The standard score of a sample $x\inX$ with $X\in\mathbf{R}^{N\times M}$ is calculated as $$ \tilde{x} = \frac{x - \mu}{\sigma} $$ with
Note: Standardization of a data set is a common requirement for many ML estimators: They might behave badly if the individual featrues do not more or less look like standard normally distributed data.
For instance many elements used in the objective funtion of a learning algorithm assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of maginiteds larger than others, it might dominate the objective function and make the estimator unable to learn from other feaztures correctly as expected.[^1]
|
inline |
|
inlineoverridevirtual |
Computes mean and sds to be used later in scaling.
X | given matrix to use to calculate mean/standard deviations |
y | unused |
Implements Predictor.
Perform standarization by centering and scaling
in | Data to transform |
Implements Predictor.
Matrix<double> StandardScaler::means |
Matrix<double> StandardScaler::std_deviations |
|
private |
|
private |