One of the most important decisions when building a machine learning model is choosing the right algorithm. For many problems, there isn’t a simple answer. It’s not uncommon for machine learning engineers to try multiple algorithms before selecting the best approach. In recent years, deep learning has become the dominant machine learning technique, but it still has competitors. In particular, gradient boosted trees can often produce comparable, if not better solutions than deep learning, often with less hassle. In this post, we’ll look at the differences between these two popular machine learning techniques and how to pick the right one for your problem.
Machine learning is advancing at an incredible rate, with new state-of-the-art results generated regularly. Of the myriad machine learning techniques, however, deep learning and gradient boosted trees have consistently outperformed their competitors. They are powerful algorithms that have proven themselves across a wide range of problem domains and datasets.
Computer Vision and Natural Language Processing
Deep learning has been particularly successful in the fields of computer vision and natural language processing (NLP). Its performance in these domains is unrivaled, so there’s currently no reason to consider any other technique when working on problems in these fields. Deep learning works so well in these fields because it addresses something called the “representation problem.” For example, deep convolutional neural networks (CNNs) are a type of neural network architecture commonly used in computer vision. They work by looking at groups of pixels simultaneously. This allows them to use the spatial relationships between pixels to learn higher-order concepts like edges and patterns. Contextual representation is also important when solving NLP problems. Individual words in a sentence are not very useful by themselves. The surrounding words are necessary to understand its context and derive its meaning. Deep learning can find patterns in data using features in combination that carry little information by themselves. Gradient boosted trees, however, can only handle data that has individually informative features. If a feature doesn’t carry much information on its own, gradient boosted trees will have a tough time finding a good solution. If your problem involves computer vision or NLP, deep learning is the best approach.
Tabular Data
If you’re working with tabular data (e.g. spreadsheet-type data), gradient boosted trees can be an excellent choice. Gradient boosted trees require very little data pre-processing and handle missing data automatically. But there's an important caveat: deep learning can sometimes solve complex tabular data problems with a higher degree of success than gradient boosted trees. In general, gradient boosted trees work best on tabular data problems that have categorical features of limited size. A categorical feature is an input that can take on one of a fixed number of possible values (e.g. high, medium, low). If your problem contains categorical features with tens of thousands of possible values, deep learning is a better choice.
Explainability
Deep learning models are notoriously difficult to interpret. It’s common for modern deep learning models to have hundreds of hidden layers and billions of parameters, resulting in very low explainability. Conversely, gradient boosted trees are relatively easy to interpret and have good explainability. Generating feature importance plots on a trained gradient boosted tree is simple and allows you to directly observe the relationships the model has discovered. If interpretability and explainability are requirements for your machine learning model, gradient boosted trees are the best choice.
Speed
Interestingly, deep learning and gradient boosted trees take roughly the same amount of time to train. However, gradient boosted trees will almost certainly be faster after they’ve been trained. For problems where low model inference latency is required, gradient boosted trees should be preferred.
Deep Learning Pros:
Deep Learning Cons:
Gradient Boosted Tree Pros:
Gradient Boosted Tree Cons:
Use Neural Networks when:
Use Gradient Boosted Trees when: