Jupyter Notebooks (née IPython Notebooks) feel like an important technology to me. It’s a way to interactively build up a computer program, then save the output and share it with other people. You can see a sample notebook I made here or check out this gallery of fancy notebooks. It’s particularly popular with data scientists. If you’re an old Python fogey like me it’s kind of a new thing and it’s exciting and worth learning about. I’m focussing on Python here, but Jupyter is now language-agnostic and supports lots of languages like R, Go, Java, even C++.

The notebook is basically a REPL hosted in a web browser. You type a snippet of code into the web page, run it, and it shows you the output and saves it to the notebook. Because the output is in the browser it can display HTML and images; see cells 6 and 7 in my sample notebook. There’s excellent matplotlib support for quick data visualization and lots of fancier plugins for D3, Leaflet, etc if you need.

Notebooks are made to be shareable. My sample is basically a static snapshot of a program’s output, there’s no Python running when you view it on GitHub. But you can also download that file, run the Python code on your own computer, and modify it however you want. That makes it an incredibly useful pedagogical tool. There are complex notebooks you can download that are effectively whole college courses in computing topics. And you can keep your own notebooks around as documentation of the work you’ve done. It’s a very powerful tool.

Behind the scenes what’s going on is the browser window acts like an attachable debugger. There’s a headless IPython server process running somewhere and the browser connects to it. It’s easy to run IPython yourself on your own machine or there are other options including cloud hosted live notebooks. Most of the display magic works by having objects define their own _repr_html_ methods, that’s what lets Pandas show that nice HTML table in cell 6.

Installing and starting IPython is pretty simple, just install it with pip and run ipython notebook. You’ll also want %matplotlib inline for inline plots. Notebooks seem particularly popular with data scientists; if you want a full Python data environment with Pandas, scikit-learn, etc then the Anaconda distribution is an easy way to get going.

  2015-09-18 14:45 Z