Neural Network and Deep Learning (Practice)

Econ 425T / Biostat 203B


Dr. Hua Zhou @ UCLA


March 14, 2023

1 Learning sources

This lecture draws heavily on following sources.

2 Software

  • High-level software focuses on user-friendly interface to specify and train models.
    Keras, PyTorch, scikit-learn, …

  • Lower-level software focuses on developer tools for implementing deep learning models.
    TensorFlow, PyTorch, CNTK, Theano (stopped development!), Caffe, Torch, …

  • Most tools are developed in Python plus a low-level language (C/C++, CUDA).


3 TensorFlow

  • Developed by Google Brain team for internal Google use. Formerly DistBelief.

  • Open sourced in Nov 2015.

  • OS: Linux, MacOS, and Windows (since Nov 2016).

  • GPU support: NVIDIA CUDA.

  • TPU (tensor processing unit), built specifically for machine learning and tailored for TensorFlow.

  • Mobile device deployment: TensorFlow Lite (May 2017) for Android and iOS.

  • TensorFlow supports distributed training.

  • TensorFlow does not support Apple Silicon (M1/M2) directly, but Apple provides the tensorflow-macos package for running on M1/M2 GPUs.

  • Used in variety of Google apps: speech recognition (Google assistant), Gmail (Smart Reply), search, translate, self-driving car, …

when you have a hammer, everything looks like a nail.

4 Workflow for a deep learning network

4.1 Step 1: Data ingestion, preparation, and processing

Source: CrowdFlower

  • The most time-consuming but the most creative job. Take >80% time, require experience and domain knowledge.

  • Determines the upper limit for the goodness of DL. Garbage in, garbage out.

  • For structured/tabular data.

  • Data prep for special DL tasks.

    • Image data: pixel scaling, train-time augmentation, test-time augmentation, convolution and flattening.

    • Data tokenization: break sequences into units, map units to vectors, align and pad sequences.

    • Data embedding: sparse to dense, merge diverse data, preserve relationship, dimension reduction, Word2Vec, be part of model training.

4.2 Step 2: Select neural network

  • Architecture.


  • Activation function.

4.3 Step 3: Select loss function

  • Regression loss: MSE/quadratic loss/L2 loss, mean absolute error/L1 loss.

  • Classification loss: cross-entropy loss, …

  • Customized losses.

4.4 Step 4: Train and evaluate model

  • Choose optimization algorithm. Generalization (SGD) vs convergence rate (adaptive).

    • Stochastic GD.

    • Adding momentum: classical momentum, Nesterov acceleration. Visualize

    • Adaptive learning rate: AdaGrad, AdaDelta, RMSprop.

    • Comining acceleration and adaptive learning rate: ADAM (default in many libraries).

    • Beyond ADAM: lookahead, RAdam, AdaBound/AmsBound, Range, AdaBelief.

A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam) by Lili Jiang:

  • Fitness of model: underfitting vs overfitting.


  • Model selection: \(K\)-fold cross validation.

5 Keras examples

Following are selected examples from the collection of Keras code examples.

6 Example: MNIST - MLP

qmd, html.

7 Example: CIFAR100 - CNN

qmd, html.

8 Example: Using Pretrained Resnet50 to classify natural images

qmd, html.

9 Example: IMDB review sentiment analysis - Lasso, MLP, RNN, LSTM, fransformer

10 Example: Generate Artificial Faces with GAN

11 Example: Neural style transfer