Skip to main content

Parallel Machine Learning

Parallel Machine Learning with scikit-learn and IPython

Video Tutorial
Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearranged in part and extended. Look at the title of the of the notebooks to be able to follow along the presentation.

Scope of this tutorial:

  • Learn common machine learning concepts and how they match the scikit-learn Estimator API.
  • Learn about scalable feature extraction for text classification and clustering
  • Learn how to perform parallel cross validation and hyper parameters grid search in parallel with IPython.
  • Learn to analyze the kinds of common errors predictive models are subject to and how to refine your modeling to take this analysis into account.
  • Learn to optimize memory allocation on your computing nodes with numpy memory mapping features.
  • Learn how to run a cheap IPython cluster for interactive predictive modeling on the Amazon EC2 spot instances using StarCluster.

Target audience

This tutorial targets developers with some experience with scikit-learn and machine learning concepts in general.
It is recommended to first go through one of the tutorials hosted at scikit-learn.org if you are new to scikit-learn.
You might might also want to have a look at SciPy Lecture Notes first if you are new to the NumPy / SciPy / matplotlib ecosystem.

Setup

Install NumPy, SciPy, matplotlib, IPython, psutil, and scikit-learn in their latest stable version (e.g. IPython 2.2.0 and scikit-learn 0.15.2 at the time of writing).
You can find up to date installation instructions on scikit-learn.org and ipython.org .
To check your installation, launch the ipython interactive shell in a console and type the following import statements to check each library:
>>> import numpy
>>> import scipy
>>> import matplotlib
>>> import psutil
>>> import sklearn
If you don't get any message, everything is fine. If you get an error message, please ask for help on the mailing list of the matching project and don't forget to mention the version of the library you are trying to install along with the type of platform and version (e.g. Windows 8.1, Ubuntu 14.04, OSX 10.9...).
You can exit the ipython shell by typing exit.

Fetching the data

It is recommended to fetch the datasets ahead of time before diving into the tutorial material itself. To do so run the fetch_data.py script in this folder:
python fetch_data.py

Using the IPython notebook to follow the tutorial

The tutorial material and exercises are hosted in a set of IPython executable notebook files.
To run them interactively do:
$ cd notebooks
$ ipython notebook
This should automatically open a new browser window listing all the notebooks of the folder.
You can then execute the cell in order by hitting the "Shift-Enter" keys and watch the output display directly under the cell and the cursor move on to the next cell. Go to the "Help" menu for links to the notebook tutorial.

Credits

Some of this material is adapted from the scipy 2013 tutorial:
http://github.com/jakevdp/sklearn_scipy2013
Original authors:

Comments

Popular posts from this blog

Introduction to Machine Learning in Python

Python tutorials for introduction to machine learning Introduction to Machine Learning in Python This repository provides instructional material for machine learning in python. The material is used for two classes taught at NYU Tandon by  Sundeep Rangan : EE-UY / CS-UY 4563: Introduction to Machine Learning (Undergraduate) EL-GY 6123: Introduction to Machine Learning (Graduate) Anyone is free to use and copy this material (at their own risk!). But, please cite the material if you use the material in your own class. Pre-requisites All the software can be run on any laptop (Windows, MAC or UNIX).  Instructions  are also provided to run the code in Google Cloud Platform on a virtual machine (VM). Both classes assume no python or ML experience. However, experience with some programming language (preferably object-oriented) is required. To follow all the mathematical details and to complete the homework exercises, the class assumes undergraduate probability, ...

Python Machine Learning Notebooks (Tutorial style)

Python Machine Learning Notebooks (Tutorial style) Dr. Tirthajyoti Sarkar, Sunnyvale, CA ( You can connect with me on LinkedIn here ) Essential codes/demo IPython notebooks for jump-starting machine learning/data science. You can start with this article that I wrote in Heartbeat magazine (on Medium platform): "Some Essential Hacks and Tricks for Machine Learning with Python" Essential tutorial-type notebooks on Pandas and Numpy Jupyter notebooks covering a wide range of functions and operations on the topics of NumPy, Pandans, Seaborn, matplotlib etc. Basics of Numpy array Basics of Pandas DataFrame Basics of Matplotlib and Descriptive Statistics Tutorial-type notebooks covering regression, classification, clustering, dimensionality reduction, and some basic neural network algorithms Regression Simple linear regression with t-statistic generation Multiple ways to do linear regression in Python and their speed comparison ( check the article I wr...

R tutorials for Data Science, NLP and Machine Learning

R Data Science Tutorials This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks. Curated list of Python tutorials for Data Science, NLP and Machine Learning . Comprehensive topic-wise list of Machine Learning and Deep Learning tutorials, codes, articles and other resources . Learning R Online Courses tryR on Codeschool Introduction to R for Data Science - Microsoft | edX Introduction to R on DataCamp Data Analysis with R Free resources for learning R R for Data Science - Hadley Wickham Advanced R - Hadley Wickham swirl: Learn R, in R Data Analysis and Visualization Using R MANY R PROGRAMMING TUTORIALS A Handbook of Statistical Analyses Using R , Find Other Chapters Cookbook for R Learning R in 7 simple steps More Resources Awesome-R Repository on GitHub R Reference Card: Cheatsheet R bloggers: blog aggregator R Resources...