4: Categorical data

Drawbacks - Only useful for naturally ordered data - Assumes linear relationship - Difficult to deal with novel features

Drawbacks - Not good for high cardinality -> consider feature hashing or combining categories - Collinearity issues -> (if a concern for model) dummy encoding instead with drop - Novel features -> implied by 0 if not using dummy, could also define "other" category

Drawbacks - Doesn't work for unsupervised - Loss of relationship with other features - Mean is not great for categories with few examples