6s191


Q&A

  • How many ways to split training process into multiple steps, where each step can be done at different place.

Reference 1

Intro - Perceptron

Perceptron, 1985.

Activation Functions: non-linearity

two input, 1 output.

multi output.

Traning

init weight. –> wrong prediction –> big loss

multiple data –> loss of all predictions

Loss function

Binary Cross Entropy Loss: for output to be 0 or 1.

Mean Squared Error Loss: for output to be continuous real numbers.

Loss Optimization

Find the weights $W$ that achieve the least loss.

Gradient Descent:

  • pick a random weight
  • compute gradient (find the right direction to move down hill)
  • update weights
  • loop above 2 steps until convergence

==> The key: compute the gradient ==> backpropagation

Backpropagation

Given a loss, and weight, how do we know which way to move to reach the lowest point of the loss function?

E.G. One in, one hidden, one out learning model.

input x (layer 0) – w1 (layer 1) – w2 (layer2) – output J(W)

How does a small change in one weight $(ex, w_2)$ affects the final loss J(W)?

Training Optimization

Dec 2017, Visualizing the loss landscape of neural nets.

Learning rate

Fixed learning rate:

  • small: moves slowly, trapped in local minima.
  • large: miss the minima, divege.

Adaptive learning rate: the size of small changes in weight is adaptive, depending on

  • how large the gradient;
  • how fast learning is happening;
  • size of particular weights;

Algorithms: 1952 ~ 2014

  • SGD, 1952[^sgd1952]
  • Adam, 2014[^adam2014]
  • Adadelta, 2012[^adadelta2012]
  • Adagrad, 2011[^adagrad2011]
  • RMSProp

Mini-batches

To compute gradient descent: pick a single point vs all the points vs a set of points: mini-batch.

Mini-batch: much quicker

Avoid overfit

Regularization.

  • Dropout
  • Early Stopping

Recurrent Neural Networks

Deep Sequence Modeling

A ball, where to go next? Need a sequence of position over time.

A sequence modeling problem: Predict the Next Word.

“This morning I took my cat for a ____“. (Walk)

Fixed Window:

  • Cannot model long-term dependencies.

-> count words in entire sequence

==> count does not preserve order info

-> A big fixed window:

Standard feed-forward network

Recurrent Neural Network: a feed back loop; can be viewed as multiple sub-networks got connected inside

the RNN network inside have recurrent cell that can be fed by input as well as the previous output of itself.

Backpropagation Through Time

TODO

Convolutional Neural Networks

Vision

For computers, images are just numbers.

(LLM: For humans, images are ???.)

Tasks in computer vision: regression, classification.

Features detection.

  • Manual feature extraction.
    • LLM: How does human extract features under the hood?
    • Domain knowledge, define features (How does this happen?)
  • learning features

Learning visual features

Dense(fully connected) Neural Network

Convolution:

  • Connect a patch of input to a single neuron in the hidden layer.
  • use a sliding window to define connections.
  • weight all patches one by one using a filter (matrix) to detect particular feature. such as boundaries, sharping, etc. The feature maps will be generated.

Input image —> Convolution (filters, feature maps) —> Maxpooling –> fully connected layer as output.

Three main parts:

  • convolution. learn weights of filters in convolutional layers. tf.keras.layers.Conv2D
  • non-linearity. Often ReLU. tf.keras.activations.*
  • pooling. Downsampling operation on each feature map. tf.keras.layers.MaxPool2D

Convolution operation: applying filters to generate feature maps. one filter one feature.

e.g. tf.keras.layers.Conv2D(filters = d, kernel_size = (h,w), strides = s)

Generative modeling

Myth of cove. Find hidden variables even only the observable is given. Finding hidden cause/reason/laws.

Autoencoders/decoders.

Input object –> reconstructed object.

Reinforcement Learning

Learning in dynamic environment.

Supervised, unsupervised, vs. reinforcement learning.

TODO….


nitty-gritty 事实真相,本质


Created May 16, 2020 // Last Updated Aug 31, 2021

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?