Starting with Machine Learning on devices like MCU

Starting with Machine Learning on devices like MCU - Part 3

Indian 5:00 PM

The important part of Machine learning is to select the right data set. For selecting the data set, there are several statistical methods that can be used. Before going to details, we need to know why we need these details. The reason why we need these methods are to avoid for examples outliers in data collected, avoid redundant data, remove highly correlated data. The following are some of the methods:

Descriptive Statistics
Correlation & Redundancy Analysis
Statistical Significance Tests (ANOVA tests)
Feature Importance via Modeling
Dimensionality Reduction (Statistical Decomposition)
Outlier & Anomaly Detection
Information Gain & Entropy-Based Methods

Selecting the right data features is the backbone of any successful machine learning model. Statistical techniques help identify which variables truly matter by analyzing relationships, significance, and variability in data. Methods like remove redundant features, while ANOVA and reveal statistically relevant ones. captures non-linear dependencies, and like automatically eliminate weak predictors. Advanced models such as and further refine the dataset by ranking or compressing features based on importance. Together, these techniques ensure your dataset is both efficient and information-rich—laying a strong foundation for accurate predictions.