Improving classification supported by ensemble technique

Passio Consulting
5 de ago. de 2021
1 min de leitura

Atualizado: 3 de jul.

One of the hardest tasks while developing a predictive model is the selection of the right algorithm. In my predictive data mining lectures, I came across with ensemble technique and it had a major role in the developed project.

Ensemble is based on the psychological phenomena from the wisdom of the crowd, where instead of having a single answer, we will have many different answers, combining them will, in the end, create a better solution.

The basic idea of ensemble classifiers is the ability to learn from a set of classifiers simultaneously, from several models, allowing them to vote simultaneously. The final goal is improving the model's effectiveness and accuracy, which performs better and, on the other hand, reduces the general error. There are several techniques for voting:

Majority Vote
Bagging supported with bootstrap
Random forest
Boosting
Stacking

To have this diversity is important, the combination of good models and poor models for the final accuracy classification, that way we are enabling diversity of our final classification accuracy. This combination feature is an important characteristic of the ensemble technique.

Ensemble is defined as a machine learning technique, for a classification purpose, that combines several classification models' results to produce the optimal classification model (ex., aggregate several DT outputs).

Of course, like everything, there are pros and cons:

Pros: We generally have an improvement in the accuracy of the predictive model

Cons: the model that results from the combination of several models is harder to interpret, due to the feeding by several classifiers. On the other hand, understanding each model outcome individually it`s easier.

by Pedro Veiga

Data Analyst Consultant @ Passio Consulting