5. UNSUPERVISED LEARNING ALGORITHMS

Unsupervised learning finds hidden patterns in unlabeled data. Unlike supervised learning, it doesn’t rely on labeled outputs (no predefined target).

5.1) K-Means Clustering

5.1.1) Algorithm Overview
K-Means is a clustering algorithm that divides data into K clusters based on similarity.
It works by:
  • Selecting K random centroids.
  • Assigning each point to the nearest centroid.
  • Updating the centroid to the mean of its assigned points.
  • Repeat steps 2–3 until the centroids stop changing.
5.1.2) Elbow Method
  • Used to choose the optimal number of clusters (K).
  • Plot the number of clusters (K) vs. Within-Cluster-Sum-of-Squares (WCSS).
  • The point where the WCSS curve bends (elbow) is the best K.
5.1.3) K-Means++ Initialization
  • Improves basic K-Means by smartly selecting initial centroids, reducing the chance of poor clustering.
  • Starts with one random centroid, then selects the next ones based on distance from the current ones (probabilistically).

 

5.2) Hierarchical Clustering

5.2.1) Agglomerative vs. Divisive Clustering
  • Agglomerative (bottom-up): Start with each point as its own cluster and merge the closest clusters.
  • Divisive (top-down): Start with one large cluster and recursively split it.

Agglomerative is more commonly used.

5.2.2) Dendrogram and Optimal Cut
  • A dendrogram is a tree-like diagram that shows how clusters are formed at each step.
  • The height of branches represents the distance between clusters.
  • Cutting the dendrogram at a certain height gives the desired number of clusters.
 

5.3) Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique used to simplify datasets while retaining most of the important information.
5.3.1) Dimensionality Reduction
  • PCA transforms the data into a new coordinate system with fewer dimensions (called principal components).
  • Useful for visualization, speeding up algorithms, and avoiding the curse of dimensionality.
5.3.2) Eigenvalue Decomposition
  • PCA is based on eigenvectors and eigenvalues of the covariance matrix of the data.
  • The eigenvectors define the new axes (principal components).
  • The eigenvalues indicate the amount of variance each component captures.
5.3.3) Scree Plot and Explained Variance
  • Scree Plot: A plot of eigenvalues to help decide how many components to keep.
  • The explained variance ratio shows how much of the data’s variance is captured by each component.

 

5.4) DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

DBSCAN is a density-based clustering algorithm that groups closely packed points and marks outliers as noise.

5.4.1) Density-Based Clustering

Unlike K-Means, DBSCAN doesn’t require specifying the number of clusters. Clusters are formed based on dense regions in the data.

5.4.2) Epsilon and MinPts Parameters
  • Epsilon (ε): Radius around a point to search for neighbors.
  • MinPts: Minimum number of points required to form a dense region.
  • Points are classified as:
    • Core Point: Has MinPts within ε.
    • Border Point: Not a core but within ε of a core.
    • Noise: Neither core nor border.

 

Leave a Comment