3 minute read

Project

In my last post I explained how I went about cleaning and preparing the dataset for modelling. In this post I will be going into detail on my thought process, model evaluation, hyperparamater optimization and key takeaways from the results.

Model Evaluation

We tried out a variety of different models: logistic regression, KNN, decision trees, random forest and XGBoost. I used normalized confusion matrices to compare the model performance.

Our best performing model appears to be logistic regression. We used GridSearch to find the optimal hyperparamaters and found that LR with C=1 was the best performing. (Notably, KNN performs rather poorly with our dataset.)

Accuracy is a bit of a misleading metric. In the world of Magic design, there are quite a few cards that are designed ‘incorrectly’, commonly referred to as color design breaks. Take a look at these three cards:

The former two cards are both classified as white by our logistic regression model, but the third is also predicted to be white, when it was actually a blue card. This doesn’t mean the model is necessarily wrong; its simply identifying what the color of the card should be. This is what I mean by accuracy being a misleading metric: the model can be right even when it’s wrong.

This does make model evaluation confusing, and may require a more qualitative approach. Instead of looking at accuracy, it makes more sense to look at the coefficients - how the model weighs the different features when deciding what color a card should be. Take a look at the coefficients from red:

We can compare what our model sees to this article about the mechanical color pie: haste, drawing and discard, and menace are all mechanics heavily connected to red.

Another key observation is that the most common misclassification for each colour is one of its ally colours. Every colour in MTG has 2 ally colours, and colours will often share mechanics with their ally colour. These misclassifications are a good indicator that the model is picking up on this.

For example:

  • Green cards are most often mistaken for White cards. (Common mechanic: gaining life)
  • Black cards are most often mistaken for Red cards. (Common mechanic: menace)
  • White cards are most often mistaken for Blue cards. (Common mechanic: flying)

Results

I now have a classification model that performs at roughly 80% accuracy that can classify a card provided it’s name, text, and other aforementioned features. From this model, we can obtain a list of coefficients and how they positive or negatively impact the probability of a card being a certain colour. In terms of potential business value added:

  • Card designers can run future cards through the model and check whether or not it was designed ‘appropriately’ (If the card is red but the model thinks its white, something might be wrong there.)
  • Card designers may also want to go in the opposite direction and make an innovative card that model will almost certain misclassify. They can look at the most negative coefficients from each class (maybe goblins have never been blue before) and explore designing unique cards in that design space.
  • Instead of predicting the colour identity, this project could be inverted using more advanced model to see if it is possible to train an AI to make custom magic cards. (In this case, name + text are the dependant variables). On a broader scale, this project aims to showcase how natural language processing can be used to ensure consistency between a new product of one category and older products of that same category.

A massive shoutout to Brian Yee’s own color classifier project, which can you find documented in this repository.