AI/ML CheatSheet : Must-Know Tips & Tricks For AI Engineers

In this article, we'll take a look at Show

Unsupervised learning finds hidden patterns in unlabeled data. Unlike supervised learning, it doesn’t rely on labeled outputs (no predefined target).

K-Means is a clustering algorithm that divides data into K clusters based on similarity.
It works by:

Improves basic K-Means by smartly selecting initial centroids, reducing the chance of poor clustering.
Starts with one random centroid, then selects the next ones based on distance from the current ones (probabilistically).

Agglomerative (bottom-up): Start with each point as its own cluster and merge the closest clusters.
Divisive (top-down): Start with one large cluster and recursively split it.

Agglomerative is more commonly used.

A dendrogram is a tree-like diagram that shows how clusters are formed at each step.
The height of branches represents the distance between clusters.
Cutting the dendrogram at a certain height gives the desired number of clusters.

PCA is a dimensionality reduction technique used to simplify datasets while retaining most of the important information.

PCA transforms the data into a new coordinate system with fewer dimensions (called principal components).
Useful for visualization, speeding up algorithms, and avoiding the curse of dimensionality.

PCA is based on eigenvectors and eigenvalues of the covariance matrix of the data.
The eigenvectors define the new axes (principal components).
The eigenvalues indicate the amount of variance each component captures.

Scree Plot: A plot of eigenvalues to help decide how many components to keep.
The explained variance ratio shows how much of the data’s variance is captured by each component.

DBSCAN is a density-based clustering algorithm that groups closely packed points and marks outliers as noise.

Unlike K-Means, DBSCAN doesn’t require specifying the number of clusters. Clusters are formed based on dense regions in the data.

Epsilon (ε): Radius around a point to search for neighbors.
MinPts: Minimum number of points required to form a dense region.
Points are classified as:
- Core Point: Has MinPts within ε.
- Border Point: Not a core but within ε of a core.
- Noise: Neither core nor border.