Neural Network and Deep Learning (Practice)
Econ 425T / Biostat 203B
1 Learning sources
This lecture draws heavily on following sources.
Deep Learning with Python by Francois Chollet.
Deep Learning Tuning Playbook by Google Research.
Learning Deep Learning lectures by Dr. Qiyang Hu (UCLA Office of Advanced Research Computing): https://github.com/huqy/deep_learning_workshops
2 Software
High-level software focuses on user-friendly interface to specify and train models.
Keras, PyTorch, scikit-learn, …Lower-level software focuses on developer tools for implementing deep learning models.
TensorFlow, PyTorch, CNTK, Theano (stopped development!), Caffe, Torch, …Most tools are developed in Python plus a low-level language (C/C++, CUDA).
Source: https://www.simplilearn.com/keras-vs-tensorflow-vs-pytorch-article
3 TensorFlow
Developed by Google Brain team for internal Google use. Formerly DistBelief.
Open sourced in Nov 2015.
OS: Linux, MacOS, and Windows (since Nov 2016).
GPU support: NVIDIA CUDA.
TPU (tensor processing unit), built specifically for machine learning and tailored for TensorFlow.
Mobile device deployment: TensorFlow Lite (May 2017) for Android and iOS.
TensorFlow supports distributed training.
TensorFlow does not support Apple Silicon (M1/M2) directly, but Apple provides the
tensorflow-macos
package for running on M1/M2 GPUs.Used in variety of Google apps: speech recognition (Google assistant), Gmail (Smart Reply), search, translate, self-driving car, …
when you have a hammer, everything looks like a nail.
4 Workflow for a deep learning network
4.1 Step 1: Data ingestion, preparation, and processing
Source: CrowdFlower
The most time-consuming but the most creative job. Take >80% time, require experience and domain knowledge.
Determines the upper limit for the goodness of DL.
Garbage in, garbage out
.For structured/tabular data.
Data prep for special DL tasks.
Image data: pixel scaling, train-time augmentation, test-time augmentation, convolution and flattening.
Data tokenization: break sequences into units, map units to vectors, align and pad sequences.
Data embedding: sparse to dense, merge diverse data, preserve relationship, dimension reduction, Word2Vec, be part of model training.
4.2 Step 2: Select neural network
- Architecture.
Source: https://www.asimovinstitute.org/neural-network-zoo/
- Activation function.
4.3 Step 3: Select loss function
Regression loss: MSE/quadratic loss/L2 loss, mean absolute error/L1 loss.
Classification loss: cross-entropy loss, …
Customized losses.
4.4 Step 4: Train and evaluate model
Choose optimization algorithm. Generalization (SGD) vs convergence rate (adaptive).
A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam) by Lili Jiang: https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c
- Fitness of model: underfitting vs overfitting.
Source: https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
- Model selection: \(K\)-fold cross validation.
5 Keras examples
Following are selected examples from the collection of Keras code examples.