Data representation and feature engineering

Mar 11

Before those students ever turn on the oven, they set up their kitchen with all ingredients washed, chopped, and measured so cooking is smooth. Data scientists do the same with information. Raw data is like a messy countertop piled with fruits, veggies, and spices. We first clean mistakes and fill in missing bits like washing off dirt, then scale numbers to a common range kind of like chopping veggies into uniform pieces, and convert categories such as “red apple” or “green apple” into a format that the computer can read, like one hot labels just like laying out each ingredient in its own bowl. Sometimes we mix two basics, like temperature and humidity, into one richer feature. Just as you would blend garlic and oil into a single sauce that way the model sees stronger signals. This prep work, called feature engineering, ensures that the computer learns from the real flavors in the data instead of getting lost in the clutter.

Henry Polvorosa

Data representation and feature engineering

Machine Learning Algorithms

Model evaluation and ethics