Characteristics of Underfitting
- High Training Error: The model performs poorly on the training dataset, indicating that it is unable to capture the patterns in the data.
High Validation/Test Error: The model also performs poorly on the validation or test dataset, showing that it has not generalized well to new, unseen data.
Simple Model: Underfitting often occurs with models that are too simple, such as linear models applied to non-linear data or shallow neural networks applied to complex tasks.
1. Too Simple Model: Using a model that does not have enough complexity or capacity to learn the data. Examples include:
- Linear regression for a problem that requires polynomial regression.
Decision trees with too few splits.
Neural networks with too few layers or neurons.
3. Inadequate Features: Using too few or irrelevant features that do not provide enough information for the model to learn effectively.
4. High Regularization: Applying too much regularization can penalize the model to the point where it cannot learn the training data properly.
Mitigation Strategies
1. Increase Model Complexity: Use a more complex model that has the capacity to learn the underlying patterns in the data.
- Use polynomial regression instead of linear regression if the relationship is non-linear.
Increase the depth of decision trees.
Use deeper neural networks with more layers and neurons.
- Creating new features based on domain knowledge.
Using techniques like polynomial features, interactions, and transformations.
4. Train Longer: Allow the model to train for more epochs or iterations, ensuring it has enough time to learn from the data.
5. Hyperparameter Tuning: Adjust the model’s hyperparameters to find a better balance between bias and variance.
Example of Underfitting and Mitigation in Python
Here’s an example using a linear regression model to fit non-linear data:
Code: Select all
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Generate synthetic data
np.random.seed(0)
X = np.sort(np.random.rand(100, 1) * 10, axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
# Create a linear regression model (which will underfit the non-linear data)
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
# Predict and evaluate
y_train_pred = linear_model.predict(X_train)
y_test_pred = linear_model.predict(X_test)
print(f"Train MSE (Linear Model): {mean_squared_error(y_train, y_train_pred):.3f}")
print(f"Test MSE (Linear Model): {mean_squared_error(y_test, y_test_pred):.3f}")
# Plotting the results for the linear model
plt.scatter(X, y, color='black', label='Data')
plt.plot(X, linear_model.predict(X), color='blue', label='Linear Model')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Underfitting Example')
plt.show()
# Now, create a polynomial regression model to mitigate underfitting
degree = 5 # Using a polynomial degree to better fit the data
poly_model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
poly_model.fit(X_train, y_train)
# Predict and evaluate
y_train_pred_poly = poly_model.predict(X_train)
y_test_pred_poly = poly_model.predict(X_test)
print(f"Train MSE (Polynomial Model): {mean_squared_error(y_train, y_train_pred_poly):.3f}")
print(f"Test MSE (Polynomial Model): {mean_squared_error(y_test, y_test_pred_poly):.3f}")
# Plotting the results for the polynomial model
plt.scatter(X, y, color='black', label='Data')
plt.plot(X, poly_model.predict(X), color='red', label=f'Polynomial Model (degree {degree})')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Mitigating Underfitting with Polynomial Regression')
plt.show()
- 1. Data Generation: We generate synthetic data using a sine function and add some noise.
2. Linear Model Training: We train a linear regression model on the data, which is too simple for this non-linear data, leading to underfitting.
3. Evaluation: We calculate and print the mean squared error (MSE) for both the training and test sets, showing poor performance due to underfitting.
4. Visualization: We plot the original data and the linear model's predictions to visually inspect underfitting.
5. Polynomial Model Training: We then train a polynomial regression model with a degree of 5, which is more appropriate for the non-linear data.
6. Evaluation and Visualization: We evaluate the polynomial model, print the MSE, and plot its predictions, showing improved performance and mitigation of underfitting.