What do the dots on box plots represent?
As usual, very data- and model-dependent. Tree based methods are particularly impacted by outliers!
Any other ideas?
Can you think of an example?
A few other options that might fall into "encoding":
If your outliers are:
Then you probably want to keep them!
Consider using a RobustScalar to standardize if you have lots of outliers
- OKCupid dataset: negative values for income, placeholders in categories - Property assessments: very low property values - Traffic dataset: some streets have no speed limit
Reminder of rows vs columns