Checklist on your resume
-
Machine learning methods.
- Supervised learning (classification, prediction, forecasting)
- Scikit-Learn
- Keras+Tensorflow (PyTorch if applicable)
-
XGBoost
-
Know the material in ISL2 and ESL2 by heart.
- Create a GitHub repo for the coursework and show your work to back your resume.
What’s not covered
Make sure to acquire data importing (text files, internet, databases), data wrangling, data visualization, high performance computing, and software engineering skills.
- Linux scripting
- Git/GitHub (put your GitHub handle on resume)
- Data wrangling (pandas)
- Data visualization (matplotlib, seaborn, plotly)
- Frontend development (shiny, web app)
- Databases, SQL (Google BigQuery)
- Cloud computing (GCP, AWS, Azure, OCI)
-
High-performance computing (HPC) on cluster (if you use Hoffman2)
-
Computational algorithms. Numerical linear algebra and numerical optimization algorithms.
-
Economics or business applications.
- Be open to languages. R is equally popular in data science community. Julia is attractive for high performance scientific computing. JavaScript is dominant in web applications. Scala is popular for implementing distributed programs.
Course evaluation
Please do it NOW.
QAs
-
Course materials: git clone or fork.
-
Search strategy besides exhaustive search:
GridSearchCV()
,RandomizedSearchCV()
, … See reference. -
HW6. Preparing time series data for RNN/LSTM. This tutorial can be helpful: https://keras.io/examples/timeseries/timeseries_weather_forecasting/
Plan for today
- Overview of select topics: unsupervised learning, survival analysis, A/B testing, causal analysis, conformal prediction.