Many analysts who are interested in building predictive models invest a lot of their time and effort in trying to understand how to best tune the parameters of the specific technique that they are using, whether that technique be logistic regression or a neural network, and they are doing this in order to achieve the best accuracy of the resultant model. In this series of videos we look at some often overlooked approaches that can be applied in the same way to a wide variety of algorithms and which may lead to better predictive accuracy. In all of our examples we’ll focus on improving the accuracy of a predictive model applied to a classification prediction problem.
Feature engineering
Feature Engineering is really just a fancy term for creating new data. Very often we can help an algorithm build better models by preparing the input data in a way that allows it to detect a clearer signal in the often noisy data. In machine learning variables are often referred to as ‘features’, so feature engineering refers to the transformation of variables or the creation of new ones. Typical examples of to feature engineering include:
- Re-scaling predictor fields
- Replacing missing values
- Excluding outliers and extreme values
- Creating new fields based on the ratio of one variable to another
- Using Factor Analysis/PCA to create new linear combinations of existing correlated variables
- Using Cluster Analysis to create groups in the data based on the similarity of cases
Watch this video to find out more
Check out the other videos in this series
- 6 secrets of building better models part one: bootstrap aggregation
- 6 secrets of building better models part two: boosting
- 6 secrets of building better models part three: feature engineering
- 6 secrets of building better models part four: ensemble modelling
- 6 secrets of building better models part five: meta models
- 6 secrets of building better models part six: split models