Skip to content
On this page

Chapter 6: Feature Engineering


6.1 What is a Feature?

  • Feature = input variable used by the model
  • Represents information about data

Examples:

  • Age
  • Salary
  • Location

Features are the “inputs” that help the model make predictions.

In Machine Learning, a feature is simply a piece of information given to the model. The model does not see the real world directly—it only sees features. For example, in house price prediction, the model does not “see” a house. It only sees features like size, number of rooms, and location. These features act as clues that help the model understand patterns.


6.2 Raw Data vs Useful Features

  • Raw data is not always useful
  • Features are processed, meaningful data

Not all data is helpful for learning.

For example, a dataset may contain a date of birth. This is raw data. But for prediction, age is more useful than date of birth. So we convert date of birth → age. This transformed value becomes a better feature.

The key idea is:

Raw data → processed → meaningful feature


6.3 What is Feature Engineering?

  • Process of creating better features from raw data
  • Improves model performance

Feature Engineering = Improving input to improve output

Feature engineering is the process of transforming raw data into features that better represent the problem. Instead of directly using available data, we modify or create new features that make patterns easier for the model to learn.


6.4 Why Feature Engineering is More Important Than Models

  • Good features → simple model works well
  • Bad features → even complex models fail

Simple model + good features > complex model + poor features

In real-world Machine Learning, feature engineering often has a bigger impact than choosing the algorithm. A well-prepared dataset can make even a basic model perform very well. On the other hand, poor features can confuse even advanced models.


Example

House Price Prediction:

Raw features:

  • Area
  • Number of rooms

Engineered features:

  • Price per square foot
  • Location category

These engineered features give better insight, making it easier for the model to learn patterns.


6.5 Thinking Like an ML Engineer

  • Ask:

    • What information helps prediction?
    • What is irrelevant?

Focus on meaningful data, not just available data.

An ML engineer does not use all data blindly. Instead, they think about which features actually matter. For example, in predicting exam scores, features like study hours are useful, but features like favorite color are not. This ability to select meaningful features is a key skill.


6.6 Types of Feature Engineering


1. Feature Creation

  • Creating new features from existing data

Examples:

  • Total rooms = bedroom + living room
  • Age from date of birth

Feature creation involves combining or modifying existing data to create new, more useful features. These new features often capture patterns better than raw data.


2. Feature Transformation

  • Changing the form of data

Examples:

  • Scaling values
  • Log transformation

Sometimes data needs to be transformed to make patterns clearer. For example, large values can be scaled down, or skewed data can be adjusted using transformations. This helps models learn more effectively.


3. Feature Extraction

  • Extracting useful information from complex data

Examples:

  • From date → day, month
  • From text → word count

Feature extraction focuses on converting complex or unstructured data into meaningful numerical features. For example, instead of using raw text, we may use the number of words or frequency of certain keywords.


6.7 Real-World Examples


Example 1: House Price Prediction

Raw data:

  • Area
  • Rooms

Engineered features:

  • Price per square foot
  • Area per room

These new features provide better insights into property value.


Example 2: E-commerce

Raw data:

  • Purchase history

Engineered features:

  • Average spending
  • Purchase frequency

These features help understand customer behavior more effectively.


6.8 Good Features vs Bad Features

  • Good features:

    • Relevant
    • Informative
  • Bad features:

    • Irrelevant
    • Noisy

The quality of features directly affects model performance.

If features are irrelevant, the model may learn incorrect patterns. For example, using random or unrelated data can confuse the model. Good features highlight important relationships, making learning easier.


6.9 Feature Engineering and Overfitting

  • Too many features → overfitting
  • Too few features → underfitting

Balance is important.

Adding too many features can make the model memorize data instead of learning patterns. On the other hand, using too few features may not provide enough information. The goal is to find the right balance.


6.10 Common Mistakes

  • Using irrelevant features
  • Ignoring data understanding
  • Creating overly complex features
  • Data leakage through features

These mistakes can reduce model performance and lead to incorrect results. For example, including future information in features (data leakage) can make the model appear accurate but fail in real-world scenarios.


6.11 A Clear Mental Model

  • Data → Features → Model → Prediction

Better features = better learning

Always remember that the model depends entirely on the features it receives. Improving features directly improves predictions.


6.12 Why This Chapter Matters

  • Most important practical skill in ML
  • Used in every real-world project

Feature engineering is what separates beginners from experienced ML engineers. It requires thinking, creativity, and understanding of data.


6.13 What Comes Next

Now that you understand how to prepare and improve data, we will start working with tools.

Next: Python and NumPy — the foundation for implementing Machine Learning


Chapter Summary

  • Features are inputs to the model
  • Raw data must be transformed into meaningful features
  • Feature engineering improves model performance
  • Good features matter more than complex models
  • Creating, transforming, and extracting features are key techniques

Built with VitePress