7. NEURAL NETWORKS & DEEP LEARNING

7.1) Introduction to Neural Networks

A neural network is a computer model inspired by the human brain. It consists of neurons (nodes) organized in layers, capable of learning patterns from data.
7.1.1) Perceptrons
The perceptron is the simplest type of neural network, with: Inputs → Weights → Summation → Activation Function → Output. It’s like a yes/no decision maker (binary classification).
7.1.2) Activation Functions
These introduce non-linearity, allowing the network to learn complex functions:
  • Sigmoid: Outputs between 0 and 1. Good for probability-based outputs.
  • ReLU (Rectified Linear Unit): Most popular. Fast, reduces vanishing gradient.
    • ReLU(x) = max(0, x)
  • Tanh: Like sigmoid, but outputs between -1 and 1.
7.1.3) Forward Propagation and Backpropagation
  • Forward Propagation: Input data flows through the network to produce an output.
  • Backpropagation: Calculates the error and updates weights using gradients (from loss function).
This is how neural networks learn from data.
7.1.4) Loss Functions
They measure how far off the prediction is from the actual result.
  • MSE (Mean Squared Error): Used in regression problems.
  • Cross-Entropy Loss: Used in classification tasks.
 

7.2) Deep Neural Networks (DNN)

A Deep Neural Network has multiple hidden layers between the input and the output.
7.2.1) Architecture and Layers
  • Input Layer: Where the data comes in
  • Hidden Layers: Where computation happens (many neurons per layer)
  • Output Layer: Final predictions
7.2.2) Training Process and Optimizers
  • During training, the network:
    • Makes predictions
    • Calculates the loss
    • Updates weights via optimizers like:
      • SGD (Stochastic Gradient Descent)
      • Adam (adaptive learning rate)
      • RMSProp
7.2.3) Overfitting and Regularization
  • Overfitting happens when the model learns noise instead of patterns.
  • Regularization techniques help:
    • Dropout: Randomly turns off neurons during training.
    • L2 Regularization: Penalizes large weights (weight decay).
 

7.3) Convolutional Neural Networks (CNN)

CNNs are specialized for image data.
7.3.1) Convolutional Layers, Pooling Layers
  • Convolutional Layers: Apply filters to detect features (edges, corners).
  • Pooling Layers: Reduce size of feature maps (e.g., Max Pooling).
7.3.2) Filters/Kernels and Strides
  • Filters: Small matrix to slide over input to extract features.
  • Strides: Step size of the filter as it moves.
7.3.3) Applications
  • Image Classification
  • Face Recognition
  • Object Detection

 

7.4) Recurrent Neural Networks (RNN)

RNNs are designed for sequential data (time series, text, etc.).
7.4.1) Basic RNN vs. LSTM vs. GRU
  • Basic RNN: Loops through time steps but suffers from memory issues.
  • LSTM (Long Short-Term Memory): Handles long dependencies well.
  • GRU (Gated Recurrent Unit): Similar to LSTM but faster.
7.4.2) Time-Series Prediction and NLP Applications
  • Predict stock prices, weather, or language sequences.
  • Used in chatbots, translation, and speech recognition.
7.4.3) Vanishing and Exploding Gradients
  • Problem during training of RNNs where gradients shrink (vanish) or explode.
  • LSTM and GRU solve this with gate mechanisms.

 

7.5) Generative Adversarial Networks (GANs)

GANs are powerful models for generating new data.
7.5.1) Generator and Discriminator
  • Generator: Creates fake data
  • Discriminator: Tries to distinguish real from fake data
  • They compete with each other (like a forger and a detective).
7.5.2) Training Process
  • The generator tries to fool the discriminator
  • Discriminator improves to detect fakes
  • They both improve over time, leading to realistically generated data
7.5.3) Applications
  • Image Generation (e.g., fake faces)
  • Art and Style Transfer
  • Data Augmentation for training other ML models

Leave a Comment