GADS7 0

GA Data Science NY Section 7

Welcome to the General Assembly Data Science Handout page. Here I’ll be assembling handouts, walkthrough and links for everyone to have some references to follow-up after class.

####Final Project Details

Key Dates
  • Feb 6: Project Presentations
  • Feb 20: Data Review and Processing Presentations
  • Last Day of Class: Final Presentations

###General Resources ####Books - Python for Data Analysis - Elements of Statistical Learning (great reference for the theory behind a lot of the techniques) - Machine Learning For Hackers (R code examples and walkthroughs)

###Lesson 1: Introduction to Data Science & Basic Data Manipulation

####Slides - Lesson 1 Slides

####Handouts - Unix Basics: Intro to the Command Line

####Links - How To Start Thinking Like a Data Scientist - Data Science Workflow - Video Tutorials for Command Line Basics - Command Line Data Manipulation - Git Tutorial from Atlassian - Git Tutorial from CodeSchool

###Lesson 2: Data Storage and Extraction

####Slides - Lesson 2 Slides

####Handouts - MySQL Tutorial - Introduction to Python - Python Exercises

####Links - Comparison of NoSQL Databases - Interactive Python Tutorial - Introduction to Python - Python Data Structures

###Lesson 3: Python and Data Manipulation

####Handouts - Python Exercises

####Links - Introduction to Pandas

###Lesson 4: Data Visualization ####Assignment 1: Due Jan 9

####Handouts - Pandas and Data Viz Notebook

####Slides - Lesson 4 Slides

####Links - Linear Algebra in 4 Pages - Narrative Visualization: Telling Stories With Data - Bokeh - Vincent - Python ggplot - Matplotlib

Lesson 5: Introduction to Machine Learning

####Slides - Lesson 5 Slides

####Handouts - Sklearn and KNearest Neighbors

####Links - Scikits-learn User Guide - A Few Useful Things to Know About Machine Learning

Lesson 6: Linear Regression

####Assignment 1: Due Jan 24 ####Slides - Lesson 6 Slides

####Handouts - Linear Regression

####Links - A Few Useful Things to Know About Machine Learning - Statsmodels Documentation - Python 538 Model

Lesson 7: Logistic Regression and Regularization

####Slides - Lesson 7 Slides

####Links - Logistic Regression Walkthrough - Logistic Regression w/ Statsmodel - Well Switching in Bangledesh - Odds Ratio Explanation - Fast Logistic Regression: Mahout - Fast Logistic Regression: Vowpal Wabbit - Fast Logistic Regression: LIBLINEAR

Lesson 8: Naive Bayes and Bayesian Estimators

Slides

Exercise

####Links - Insult Detection Kaggle Submission - Holy Trinity of Bayesian Estimation - History of Bayes - Mathematical Exploration of Bayes Theorem - Naive Bayes v. Logistic Regression

Lesson 9: Decision Trees and Random Forests

####Slides - Lesson 9 Slides

####Handouts - Random Forest on Text Data

Links

Lesson 10: Classification Review

####Slides - Lesson 10 - Review Slides

####Handouts - In Class Review

####Links - Choosing a ML Classifier - Machine Learning Cheat Sheet for Sklearn

Lesson 11: Ensemble Learning

####Slides - Lesson 11 - Ensemble Learning - Lesson 11 - K Means

####Handouts - Random Forest on Text Data

####Links - KMeans IPython Notebook - Text Clustering in Sklearn - Cloudera ML KMeans

Lesson 12: K-Means Clustering

Exercise

Lesson 13: PCA and Unsupervised Learning

####Slides - Lesson 13 - PCA and SVD

###Links - A Tutorial on PCA - Stanford PCA Tutorial

Lesson 14: Recommendation Systems

Links

Exercise

Lesson 15: Further Topics in Unsupervised Learning

####Slides - Lesson 15: More Unsupervised Learning

####Links - LDA from EChen Blog - Denoising Autoencoders in Theano - Deep Learning Reading List

Lesson 16: Hadoop

####Slides - Lesson 16: Hadoop

Handouts

Links

Lesson 17: Distributed Data Processing

####Slides - Lesson 17: Spark

Handouts

Links

Readings

Lesson 1819: Distributed Data Processing

Links

Related Repositories

GADS7

GADS7

GA Data Science NY Section 7 ...

GADS7

GADS7

GA Data Science NY Section 7 ...


Top Contributors

arahuja