Neural Network and Deep Learning (Practice)

Econ 425T / Biostat 203B

Author

Dr. Hua Zhou @ UCLA

Published

March 14, 2023

1 Learning sources

This lecture draws heavily on following sources.

Deep Learning with Python by Francois Chollet.
Deep Learning Tuning Playbook by Google Research.
Learning Deep Learning lectures by Dr. Qiyang Hu (UCLA Office of Advanced Research Computing): https://github.com/huqy/deep_learning_workshops

2 Software

High-level software focuses on user-friendly interface to specify and train models.
Keras, PyTorch, scikit-learn, …
Lower-level software focuses on developer tools for implementing deep learning models.
TensorFlow, PyTorch, CNTK, Theano (stopped development!), Caffe, Torch, …
Most tools are developed in Python plus a low-level language (C/C++, CUDA).

Source: https://www.simplilearn.com/keras-vs-tensorflow-vs-pytorch-article

3 TensorFlow

Developed by Google Brain team for internal Google use. Formerly DistBelief.
Open sourced in Nov 2015.
OS: Linux, MacOS, and Windows (since Nov 2016).
GPU support: NVIDIA CUDA.
TPU (tensor processing unit), built specifically for machine learning and tailored for TensorFlow.
Mobile device deployment: TensorFlow Lite (May 2017) for Android and iOS.

TensorFlow supports distributed training.
TensorFlow does not support Apple Silicon (M1/M2) directly, but Apple provides the tensorflow-macos package for running on M1/M2 GPUs.
Used in variety of Google apps: speech recognition (Google assistant), Gmail (Smart Reply), search, translate, self-driving car, …

when you have a hammer, everything looks like a nail.

4 Workflow for a deep learning network

4.1 Step 1: Data ingestion, preparation, and processing

Source: CrowdFlower

The most time-consuming but the most creative job. Take >80% time, require experience and domain knowledge.
Determines the upper limit for the goodness of DL. Garbage in, garbage out.
For structured/tabular data.

Data prep for special DL tasks.
- Image data: pixel scaling, train-time augmentation, test-time augmentation, convolution and flattening.
- Data tokenization: break sequences into units, map units to vectors, align and pad sequences.
- Data embedding: sparse to dense, merge diverse data, preserve relationship, dimension reduction, Word2Vec, be part of model training.

4.2 Step 2: Select neural network

Architecture.

Source: https://www.asimovinstitute.org/neural-network-zoo/

Activation function.

4.3 Step 3: Select loss function

Regression loss: MSE/quadratic loss/L2 loss, mean absolute error/L1 loss.
Classification loss: cross-entropy loss, …
Customized losses.

4.4 Step 4: Train and evaluate model

Choose optimization algorithm. Generalization (SGD) vs convergence rate (adaptive).
- Stochastic GD.
- Adding momentum: classical momentum, Nesterov acceleration. Visualize
- Adaptive learning rate: AdaGrad, AdaDelta, RMSprop.
- Comining acceleration and adaptive learning rate: ADAM (default in many libraries).
- Beyond ADAM: lookahead, RAdam, AdaBound/AmsBound, Range, AdaBelief.

A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam) by Lili Jiang: https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c

Fitness of model: underfitting vs overfitting.

Source: https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks

Model selection: \(K\)-fold cross validation.

5 Keras examples

Following are selected examples from the collection of Keras code examples.

6 Example: MNIST - MLP

7 Example: CIFAR100 - CNN

8 Example: Using Pretrained Resnet50 to classify natural images

9 Example: IMDB review sentiment analysis - Lasso, MLP, RNN, LSTM, fransformer

10 Example: Generate Artificial Faces with GAN

Generate Artificial Faces with CelebA Progressive GAN Model

11 Example: Neural style transfer

Neural style transfer