What is Data Standardization and Normalization?
Data standardization and normalization are techniques used to transform the input data to a consistent format and range.
Standardization
Standardization is a technique that transforms the input data to have zero mean and unit variance. This is done by subtracting the mean of the data from each data point and dividing it by the standard deviation of the data. Standardization is useful when the input features have different scales, and we want to treat them equally.
Normalization
Normalization is a technique that transforms the input data to a specific range. This is done by scaling the input data to be between 0 and 1 or -1 and 1. Normalization is useful when the input features have widely different ranges, and we want to limit the impact of outliers.
Implementing Data Standardization and Normalization Using Scikit-Learn
Scikit-learn provides easy-to-use functions for data standardization and normalization. Let’s explore some examples of how to use these functions.
Standardization Using Scikit-Learn
The StandardScaler
class in Scikit-learn provides a simple way to standardize the input data. Here’s an example of how to use it:
from sklearn.preprocessing import StandardScaler
import numpy as np
# create some example data
X = np.array([[1, 2], [3, 4], [5, 6]])
# create a StandardScaler object
scaler = StandardScaler()
# fit the scaler to the data and transform it
X_std = scaler.fit_transform(X)
print(X_std)
In this example, we created a StandardScaler
object and used it to standardize the input data X
. The resulting output X_std
has zero mean and unit variance.
Normalization Using Scikit-Learn
The MinMaxScaler
class in Scikit-learn provides a simple way to normalize the input data to a specific range. Here’s an example of how to use it:
from sklearn.preprocessing import MinMaxScaler
import numpy as np
# create some example data
X = np.array([[1, 2], [3, 4], [5, 6]])
# create a MinMaxScaler object
scaler = MinMaxScaler()
# fit the scaler to the data and transform it
X_norm = scaler.fit_transform(X)
print(X_norm)
In this example, we created a MinMaxScaler
object and used it to normalize the input data X
to the range of 0 to 1. The resulting output X_norm
has values between 0 and 1.
Conclusion
Data standardization and normalization are important preprocessing steps in machine learning. Scikit-learn provides easy-to-use functions for data standardization and normalization. In this tutorial, we explored the basics of data standardization and normalization and how to implement them using scikit-learn.