A curated collection of quick-reference Stanford guides covering core topics in Artificial Intelligence, Machine Learning, and Deep Learning. AI/ML CheatSheets are designed to help students, developers, and researchers quickly recall important concepts and formulas.

 

1. INTRODUCTION TO AI AND ML

1.1) What is Artificial Intelligence (AI)?

Artificial Intelligence (AI) is the ability of machines or computer programs to perform tasks that typically require human intelligence. These tasks can include understanding language, recognizing patterns, solving problems, and making decisions.
Simple explanation: AI is when machines are made smart enough to think and act like humans.
Examples:

  • Voice assistants like Alexa
  • Image recognition systems
  • Chatbots
  • Self-driving cars

1.2) Types of AI: Narrow AI vs. General AI

Narrow AI (Weak AI):
  • Designed to perform one specific task
  • Cannot do anything beyond its programming
  • Examples: Email spam filters, facial recognition, recommendation systems
General AI (Strong AI):
  • Still under research and development
  • Can learn and perform any intellectual task a human can do
  • Would have reasoning, memory, and decision-making abilities similar to a human

1.3) What is Machine Learning (ML)?

Machine Learning is a subfield of AI that allows machines to learn from data and improve their performance over time without being explicitly programmed.
Simple explanation: Instead of writing rules for everything, we give the machine data, and it figures out the rules on its own.
Example: A machine learns to identify spam emails by studying thousands of examples.

1.4) Supervised vs. Unsupervised vs. Reinforcement Learning

Supervised Learning:
  • The training data includes both the input and the correct output (labels)
  • The model learns by comparing its output with the correct output
  • Example: Predicting house prices based on features like size, location, etc.
Unsupervised Learning:
  • The data has no labels
  • The model tries to find patterns or groupings in the data
  • Example: Grouping customers based on purchasing behavior
Reinforcement Learning:
  • The model learns through trial and error
  • It receives rewards or penalties based on its actions
  • Example: A robot learning to walk or a program learning to play chess

1.5) Key Terminologies

Model: A program or function that makes predictions or decisions based on data.
Feature: An input variable used in making predictions (e.g., age, income, temperature).
Target: The value the model is trying to predict (e.g., house price, spam or not).
Algorithm: A step-by-step method or set of rules used to train the model.
Training: The process of teaching the model using a dataset.
Testing: Evaluating the trained model on new, unseen data to measure its accuracy.

1.6) Applications of AI and ML

  • Recommendation systems (YouTube, Amazon, Netflix)
  • Fraud detection in banking
  • Language translation
  • Healthcare diagnosis
  • Self-driving vehicles
  • Stock market prediction
  • Chatbots and customer support
  • Social media content moderation

 

2. MATHEMATICS FOR ML/AI

Mathematics is the foundation of AI and Machine Learning. It helps us understand how algorithms work under the hood and how to fine-tune models for better performance.

2.1) Linear Algebra

Linear Algebra deals with numbers organized in arrays and how these arrays interact. It is used in almost every ML algorithm.

2.1.1) Vectors, Matrices, and Tensors
  • Vector: A 1D array of numbers. Example: [3, 5, 7]
  • Used to represent features like height, weight, and age.
  • Matrix: A 2D array (rows and columns). Example: Used to store datasets or model weights.
  • Tensor: A generalization of vectors and matrices to more dimensions (3D or higher). Example: Used in deep learning models like images (3D tensor: width, height, color channels).
2.1.2) Matrix Operations

Addition/Subtraction: Add or subtract corresponding elements of two matrices.
Multiplication: Used to combine weights and inputs in ML models.
Transpose: Flip a matrix over its diagonal.
Dot Product: Fundamental in calculating output in neural networks.

2.1.3) Eigenvalues and Eigenvectors

Eigenvector: A direction that doesn’t change during a transformation.
Eigenvalue: Tells how much the eigenvector is stretched or shrunk.
These are used in algorithms like Principal Component Analysis (PCA) for dimensionality reduction.
 

2.2) Probability and Statistics

Probability helps machines make decisions under uncertainty, and statistics helps us understand data and model performance.

2.2.1) Mean, Variance, Standard Deviation

Mean: The average value.
Variance: How spread out the values are from the mean.
Standard Deviation: The square root of variance. It measures the extent to which values vary.
Used to understand the distribution and behavior of features in datasets.

2.2.2) Bayes Theorem

A mathematical formula to calculate conditional probability:

Used in Naive Bayes classifiers for spam detection, document classification, etc.

2.2.3) Conditional Probability

The probability of one event occurring given that another event has already occurred. Example: Probability that a user clicks an ad given that they are between 20-30 years old.

2.2.4) Probability Distributions

Normal Distribution: Bell-shaped curve. Common in real-world data, like height, exam scores.
Binomial Distribution: Used for yes/no type outcomes. Example: Flipping a coin 10 times.
Poisson Distribution: For events happening over a time period. Example: Number of customer calls per hour.
These distributions help in modeling randomness in data.

2.3) Calculus for Optimization

Calculus helps in training models by optimizing them to reduce errors.

2.3.1) Derivatives and Gradients

Derivative: Measures how a function changes as its input changes.
Gradient: A vector of derivatives that tells the slope of a function in multi-dimensions.
Used to find the direction in which the model should adjust its weights.

2.3.2) Gradient Descent

An optimization algorithm used to minimize the loss (error) function.
How it works:

  • Start with random values
  • Calculate the gradient (slope)
  • Move slightly in the opposite direction of the gradient
  • Repeat until the loss is minimized

Gradient Descent is the core of many training algorithms in ML and DL.

Leave a Comment