For a loss (or cost) function such as
Start with a random
Calculate the gradient
Update
Repeat 2-3 until some stopping criterion is met
where

Main takeaway: feature should be more or less on the same scale

Standardize:
Normalize:
def standardize(X, mu, sigma):
return (X - mu) / sigma
mu = X_train.mean()
sigma = X_train.std()
# ... later on, during inference
X = standardize(X, mu, sigma)
What would happen if
standardizeinstead computed values on the fly?
What else am I missing here?
You may run across magic numbers, e.g from the PyTorch tutorials:
import torch
from torchvision import transforms, datasets
data_transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
This really should have a comment! Derived from ImageNet.
Pipelinesfit and transform method
fit computes parameters from the training datatransform applies the transformationfit_transform does both -- only use on training data!Pipelinemake_pipeline is slightly simpler syntax (no names needed)from sklearn.pipeline import make_pipeline, Pipeline
pipeline = make_pipeline(
StandardScaler(),
SGDClassifier()
)
# or, equivalently:
pipeline = Pipeline([
('scaler', StandardScaler()),
('sgd', SGDClassifier())
])
ColumnTransformer
The variance of a single feature
The covariance between two features
Independent variables will have low covariance, but low covariance does not necessarily mean independence!
The covariance matrix between all features in a matrix
Just like regular variance, covariance scales with the data
If you normalize this, you get the correlation matrix
n_components to PCA(), e.g.pca = PCA(n_components=2) # just two axes
X_reduced = pca.fit_transform(X)
<= min(n_classes - 1, n_features)Higher list of pros, but still... don't do it unless it's demonstrably beneficial
draw on the board
Discussion time