AI/ML CheatSheet : Must-Know Tips & Tricks For AI Engineers

In this article, we'll take a look at Show

4. SUPERVISED LEARNING ALGORITHMS

Supervised learning uses labeled data, meaning the model learns from input-output pairs (X → y). The algorithm tries to map inputs (features) to correct outputs (targets/labels).

4.1) Linear Regression

Used for predicting continuous values (e.g., predicting house prices, temperature).

4.1.1) Simple vs. Multiple Linear Regression

Simple Linear Regression: One input (X) to predict one output (Y). Example: Predicting salary from years of experience.
Multiple Linear Regression: Multiple inputs (X1, X2, …, Xn). Example: Predicting price based on area, location, and age.

4.1.2) Gradient Descent and Normal Equation

Gradient Descent: An iterative method to minimize error (cost function).
Normal Equation: A Direct way to find weights using linear algebra: 0=(X^TX)^-1X^TY. Works for small datasets.

4.1.3) Regularization (L1, L2)

Prevents overfitting by adding a penalty:

L1 (Lasso): Can reduce coefficients to 0 (feature selection).
L2 (Ridge): Shrinks coefficients but doesn’t make them 0.

4.2) Logistic Regression

Used for classification problems (e.g., spam vs. not spam).

4.2.1) Binary vs. Multiclass Classification

Binary: 2 outcomes (e.g., 0 or 1)
Multiclass: More than 2 classes (handled using One-vs-Rest or Softmax)

4.2.2) Sigmoid and Cost Function

Sigmoid Function: Converts outputs to values between 0 and 1. Formula: sigmoid(z) = 1 / (1 + e^-z)

Cost Function: Log loss used to measure prediction error.

4.2.3) Regularization

L1 and L2 regularization help prevent overfitting in logistic regression as well.

4.3) K-Nearest Neighbors (KNN)

A simple classification (or regression) algorithm that uses proximity.

4.3.1) Distance Metrics

Euclidean Distance: Straight line between two points.
Manhattan Distance: Sum of absolute differences.

4.3.2) Choosing K

K is the number of neighbors to consider.
Too low K → sensitive to noise
Too high K → model becomes less flexible

4.3.3) Advantages & Disadvantages

Simple and easy to implement
Slow for large datasets, sensitive to irrelevant features

4.4) Support Vector Machines (SVM)

Powerful classification model for small to medium-sized datasets.

4.4.1) Hyperplanes and Margins

SVM finds the best hyperplane that separates data with maximum margin.

4.4.2) Linear vs. Non-Linear SVM

Linear SVM: Works when data is linearly separable.
Non-linear SVM: Uses kernel trick for complex datasets.

4.4.3) Kernel Trick

Transforms data into higher dimensions to make it separable.
- Common kernels: RBF (Gaussian), Polynomial, Sigmoid

4.5) Decision Trees

Tree-like structure used for classification and regression.

4.5.1) Gini Impurity and Entropy

Measures how pure a node is:
- Gini Impurity: Probability of misclassification.
- Entropy: Measure of randomness/information.

4.5.2) Overfitting and Pruning

Overfitting: The Tree memorizes the training data.
Pruning: Removes unnecessary branches to reduce overfitting.

4.6) Random Forest

An ensemble of decision trees to improve accuracy and reduce overfitting.

4.6.1) Bootstrapping

Randomly selects subsets of data to train each tree.

4.6.2) Bagging

Combines predictions of multiple trees (majority vote or average).

4.6.3) Feature Importance

Measures which feature contribute most to model prediction.

4.7) Gradient Boosting Machines (GBM)

Boosting is an ensemble method where models are trained sequentially.

4.7.1) XGBoost, LightGBM, CatBoost

Advanced boosting libraries:

XGBoost: Popular, fast, and accurate
LightGBM: Faster, uses leaf-wise growth
CatBoost: Handles categorical features automatically

4.7.2) Hyperparameter Tuning

Adjust parameters like:
- Learning rate
- Number of estimators (trees)
- Max depth
Tools: GridSearchCV, RandomSearchCV

4.7.3) Early Stopping

Stops training if the model stops improving on the validation set.

4.8) Naive Bayes

Probabilistic classifier based on Bayes’ Theorem and strong independence assumption.

4.8.1) Gaussian, Multinomial, Bernoulli

Gaussian NB: For continuous features (assumes normal distribution)
Multinomial NB: For text data, counts of words
Bernoulli NB: For binary features (0/1)

4.8.2) Assumptions and Applications

Assumes all features are independent (rarely true, but still works well)
Commonly used in spam detection, sentiment analysis, and document classification

AI/ML CheatSheet : Must-Know Tips & Tricks for AI Engineers

4. SUPERVISED LEARNING ALGORITHMS

4.1) Linear Regression

4.1.1) Simple vs. Multiple Linear Regression

4.1.2) Gradient Descent and Normal Equation

4.1.3) Regularization (L1, L2)

4.2) Logistic Regression

4.2.1) Binary vs. Multiclass Classification

4.2.2) Sigmoid and Cost Function

4.2.3) Regularization

4.3) K-Nearest Neighbors (KNN)

4.3.1) Distance Metrics

4.3.2) Choosing K

4.3.3) Advantages & Disadvantages

4.4) Support Vector Machines (SVM)

4.4.1) Hyperplanes and Margins

4.4.2) Linear vs. Non-Linear SVM

4.4.3) Kernel Trick

4.5) Decision Trees

4.5.1) Gini Impurity and Entropy

4.5.2) Overfitting and Pruning

4.6) Random Forest

4.6.1) Bootstrapping

4.6.2) Bagging

4.6.3) Feature Importance

4.7) Gradient Boosting Machines (GBM)

4.7.1) XGBoost, LightGBM, CatBoost

4.7.2) Hyperparameter Tuning

4.7.3) Early Stopping

4.8) Naive Bayes

4.8.1) Gaussian, Multinomial, Bernoulli

4.8.2) Assumptions and Applications

Leave a Comment X

AI/ML CheatSheet : Must-Know Tips & Tricks for AI Engineers

4. SUPERVISED LEARNING ALGORITHMS

4.1) Linear Regression

4.1.1) Simple vs. Multiple Linear Regression

4.1.2) Gradient Descent and Normal Equation

4.1.3) Regularization (L1, L2)

4.2) Logistic Regression

4.2.1) Binary vs. Multiclass Classification

4.2.2) Sigmoid and Cost Function

4.2.3) Regularization

4.3) K-Nearest Neighbors (KNN)

4.3.1) Distance Metrics

4.3.2) Choosing K

4.3.3) Advantages & Disadvantages

4.4) Support Vector Machines (SVM)

4.4.1) Hyperplanes and Margins

4.4.2) Linear vs. Non-Linear SVM

4.4.3) Kernel Trick

4.5) Decision Trees

4.5.1) Gini Impurity and Entropy

4.5.2) Overfitting and Pruning

4.6) Random Forest

4.6.1) Bootstrapping

4.6.2) Bagging

4.6.3) Feature Importance

4.7) Gradient Boosting Machines (GBM)

4.7.1) XGBoost, LightGBM, CatBoost

4.7.2) Hyperparameter Tuning

4.7.3) Early Stopping

4.8) Naive Bayes

4.8.1) Gaussian, Multinomial, Bernoulli

4.8.2) Assumptions and Applications

You may also like

Leave a Comment X