Data Analyst and Data Scientist
Proficient in Python, R, SQL, TensorFlow and Scikit-learn
Dungeons & Dragons nerd, founder of DnDwithToph.com
View My LinkedIn
View My Articles on Medium
This project involves using Scikit-learn and Pandas to create a machine learning model for a multiple classification problem aimed at predicting the race of a Dungeons & Dragons character based on their ability scores and general features.
This project aims to predict the race of Dungeons & Dragons characters based on their ability scores and other features. The goal is to create a machine learning model that can accurately classify the race of a character.
The original dataset was obtained from Kaggle, provided by Andrew Abeles: DND Character Stats Dataset.
Aim To achieve a classification accuracy that matches or surpasses 51.8%, the model accuracy from Andrew Abeles’s project (KNN tuned model).
The dataset contains 10,000 samples, each with 9 input features and 1 target feature.
Input Features:
Target Variable:
Correlation matrix to visualize the relationships between features.
The correlation heatmap showed strong relationships between height, weight, and speed, indicating their significance in predicting the character’s race.
Further comparison of height and weight features using a scatter plot.
The scatter plot revealed trends in three clear groups/clusters of character sizes: small, medium, and large.
Data Preprocessing: The dataset was loaded into a Pandas DataFrame. There were no missing values.
Exploratory Data Analysis (EDA): Data visualization was performed to understand feature correlations and distributions.
Model Selection: Several machine learning models were evaluated, including Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), Random Forest Classifier, AdaBoost Classifier, and Logistic Regression.
Among the baseline comparison of models, Random Forest and Logistic Regression demonstrated the highest accuracy scores.
Fine-tuning Random Forest Classifier against n_estimators.
Model Comparison: The best Random Forest model was selected based on the hyperparameters obtained from Grid Search CV.
The mean scores of accuracy, precision, recall, and F1-score were as follows:
Accuracy: 0.6748 Precision: 0.6609 Recall: 0.6762 F1-score: 0.6690
The project involved building and tuning a machine learning model to predict the race of a DND character based on their ability scores and general features. The Random Forest model achieved an accuracy of around 67.25% after hyperparameter tuning and cross-validation, improving upon the previous accuracy of 51.8% found in the orignal KNN model from the Dataset.