jupyter-scala 0 travis-ci

Lightweight Scala kernel for Jupyter / IPython 3

2 years after

Jupyter Scala

Jupyter Scala is a Scala kernel for Jupyter (formerly known as IPython). It aims at being a versatile and easily extensible alternative to other Scala kernels or notebook UIs.

Build Status Gitter Maven Central

Table of contents

  1. Quick start
  2. Why
  3. Special commands / API
  4. Writing libraries using the interpreter API
  5. Jupyter installation
  6. Examples (deprecated)
  7. Internals
  8. Compiling it

Quick start

First ensure you have Jupyter installed. Running jupyter --version should print a value >= 4.0. See Jupyter installation if it's not the case.

Then download and run the Jupyter Scala launcher with

$ curl -L -o jupyter-scala https://git.io/vrHhi && chmod +x jupyter-scala && ./jupyter-scala && rm -f jupyter-scala

This downloads the bootstrap launcher of Jupyter Scala, then runs it. If no previous version of it is already installed, this simply sets up the kernel in ~/Library/Jupyter/kernels/scala211 (OSX) or ~/.local/share/jupyter/kernels/scala211 (Linux). Note that on first launch, it will download its dependencies from Maven repositories. These can be found under ~/.jupyter-scala/bootstrap.

Once installed, the downloaded launcher can be removed, as it copies itself in ~/Library/Jupyter/kernels/scala211 or ~/.local/share/jupyter/kernels/scala211.

For the Scala 2.10.x version, you can do instead

$ curl -L -o jupyter-scala-2.10 https://git.io/vrHh7 && chmod +x jupyter-scala-2.10 && ./jupyter-scala-2.10 && rm -f jupyter-scala-2.10

The Scala 2.10.x version shares its bootstrap dependencies directory, ~/.jupyter-scala/bootstrap, with the Scala 2.11.x version. It installs itself in ~/Library/Jupyter/kernels/scala210 (OSX) or ~/.local/share/jupyter/kernels/scala210 (Linux).

Some options can be passed to the jupyter-scala (or jupyter-scala-2.10) launcher.

  • The kernel ID (scala211) can be changed with --id custom (allows to install the kernel alongside already installed Scala kernels).
  • The kernel name, that appears in the Jupyter Notebook UI, can be changed with --name "Custom name".
  • If a kernel with the same ID is already installed and should be erased, the --force option should be specified.
  • Some dependencies can be added in all the sessions based on this kernel, with -d org:name:version.
  • Some repositories can be added for the dependencies added via -d and during the session, with -r https://repo.com/base.

You can check that a kernel is installed with

$ jupyter kernelspec list

which should print one line per installed Jupyter kernel.

Why

There are already a few notebook UIs or Jupyter kernels for Scala out there:

(zeppelin is worth noticing too. Although not related to Jupyter, it provides similar features, and has some support for Spark, Flink, Scalding in particular.)

Most of them usually target one single use - like Spark calculations (and you have to have Spark around if you just do bare Scala!), or just Scala calculations (and no way of adding Spark on-the-fly). They share no code with each other, so that features can typically be added to only one single kernel or project, and need to be re-implemented in the ones targetting other usages. That also makes it hard to share code between these various uses.

Jupyter Scala is an attempt at having a more modular and versatile kernel to do Scala from Jupyter.

Jupyter Scala aims at being closer to what Ammonite achieves in the terminal in terms of completion or pretty-printing. Also, like with Ammonite, users interact with the interpreter via a Scala API rather than ad-hoc hard-to-reuse-or-automate special commands. Jupyter Scala also publishes this API in a separate module (scala-api), which allows to write external libraries that interact with the interpreter. In particular it has a Spark bridge, that straightforwardly adds Spark support to a session. More bridges like this should come soon, to interact with other big data frameworks, or for plotting.

Thanks to this modularity, Jupyter Scala shares its interpreter and most of its API with ammonium, the fork of Ammonite it is based on. One can switch from notebook-based UIs to the terminal more confidently, with all that can be done on one side, being possible on the other (a bit like Jupyter / IPython allows with its console and notebook commands, but with the additional niceties of Ammonite, also available in ammonium, like syntax highlighting of the input). In teams where some people prefer terminal interfaces to web-based ones, some people can use Jupyter Scala, and others its terminal cousins, according to their tastes.

Special commands / API

The content of an instance of a jupyter.api.API is automatically imported when a session is opened (a bit like Ammonite does with its "bridge"). Its methods can be called straightaway, and replace so called "special" or "magic" commands in other notebooks or REPLs.

show

show(value)

Print value - its pprint representation - with no truncation. (Same as in Ammonite.)

classpath.add

classpath.add("organization" % "name" % "version")

Add Maven / Ivy module "organization" % "name" % "version" - and its transitive dependencies - to the classpath. Like in SBT, replace the first % with two percent signs, %%, for Scala specific modules. This adds a Scala version suffix to the name that follows. E.g. in Scala 2.11.x, "organization" %% "name" % "version" is equivalent to "organization" % "name_2.11" % "version".

Can be called with several modules at once, like

classpath.add(
  "organization" % "name" % "version",
  "other" %% "name" % "version"
)

(Replaces load.ivy in Ammonite.)

classpath.addInConfig

classpath.addInConfig("config")(
  "organization" % "name" % "version",
  "organization" %% "name" % "version"
)

Add Maven / Ivy modules to a specific configuration. Like in SBT, which itself follows what Ivy does, dependencies are added to so called configurations. Jupyter Scala uses 3 configurations,

  • compile: default configuration, the one of the class loader that compiles and runs things,
  • macro: configuration of the class loader that runs macros - it inherits compile, and initially has scala-compiler in it, that some macros require,
  • plugin: configuration for compiler plugins - put compiler plugin modules in it.

classpath.dependencies

classpath.dependencies: Map[String, Set[(String, String, String)]]

Return the previously added dependencies (values) in each configuration (keys).

classpath.addRepository

classpath.addRepository("repository-address")

Add a Maven / Ivy repository, for dependencies lookup. For Maven repositories, add the base address of the repository, like

classpath.addRepository("https://oss.sonatype.org/content/repositories/snapshots")

For Ivy repositories, add the base address along with the pattern of the repository, like

classpath.addRepository(
  "https://repo.typesafe.com/typesafe/ivy-releases/" +
    "[organisation]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)" +
    "[revision]/[type]s/[artifact](-[classifier]).[ext]"
)

By default, ~/.ivy2/local, Central (https://repo1.maven.org/maven2/), and Sonatype releases (https://oss.sonatype.org/content/repositories/releases) are used. Sonatype snapshots (https://oss.sonatype.org/content/repositories/snapshots) is also added in snapshot versions of Jupyter Scala.

classpath.repositories

classpath.repositories: Seq[String]

Returns a list of the previously added repositories.

classpath.addPath

classpath.addPath("/path/to/file.jar")
classpath.addPath("/path/to/directory")

Adds one or sevaral JAR files or directories to the classpath.

classpath.classLoader (advanced)

classpath.classLoader(config: String): ClassLoader

Returns the ClassLoader of configuration config (see above for the available configurations).

classpath.addedClasses (advanced)

classpath.addedClasses(config: String): Map[String, Array[Byte]]

Returns a map of the byte code of classes generated during the REPL session. Each line or cell in a session gets compiled, and added to this map - before getting loaded by a ClassLoader and run.

The ClassLoader used to compile and run code mainly contains initial and added dependencies, and things from this map.

classpath.onPathsAdded (advanced)

Registers a hook called whenever things are added to the classpath.

classpath.path (advanced)

classpath.path()

Returns the classpath of the current session.

Add Spark support

See the example in the Ammonium README.

Writing libraries using the interpreter API

Most available classes (classpath, eval, setup, interpreter, ...) from a notebook are:

  • defined in the scala-api module ("com.github.alexarchambault.jupyter" % "scala-api_2.11.8" % "0.3.0-M3") - or its dependencies (jupyter-kernel-api or ammonium-interpreter-api),
  • available implicitly from a notebook (implicitly[ammonite.api.Classpath] would be the same as just classpath).

This allows to write libraries that can easily interact with Jupyter Scala. E.g. one can define a method in a library, like

def doSomething()(implicit eval: ammonite.api.Eval): Unit = {
  eval("some code")
}

def displaySomething(implicit publish: jupyter.api.Publish[jupyter.api.Evidence], ev: jupyter.api.Evidence): Unit = {
  publish.display("text/html" -> "<div id='myDiv'></div>")
  publish.display("application/javascript" -> "// some JS")
}

then load this library via classpath.add, and call these methods from the session.

Jupyter installation

Check that you have Jupyter installed by running jupyter --version. It should print a value >= 4.0. If it's not the case, a quick way of setting it up consists in installing the Anaconda Python distribution (or its lightweight counterpart, Miniconda), and then running

$ pip install jupyter

or

$ pip install --upgrade jupyter

jupyter --version should then print a value >= 4.0.

Examples

Warning: these examples are somehow deprecated, and should be updated.

Some example notebooks can be found in the examples directory: you can follow macrology 201 in a notebook, use compiler plugins like simulacrum from notebooks, use a type level library to parse CSV, setup a notebook for psp-std etc.

Internals

Jupyter Scala uses the Scala interpreter of ammonium, in particular its interpreter and interpreter-api modules. The interaction with Jupyter (the Jupyter protocol, ZMQ concerns, etc.) are handled in a separate project, jupyter-kernel. In a way, Jupyter Scala is just a bridge between these two projects.

The API as seen from a Jupyter Scala session is defined in the scala-api module, that itself depends on the interpreter-api module of ammonium. The core of the kernel is in the scala module, in particular with an implementation of an Interpreter for jupyter-kernel, based on interpreter from ammonium, and implementations of the interfaces / traits defined in scala-api. It also has a third module, scala-cli, which deals with command-line argument parsing, and launches the kernel itself. The launcher consists in this third module.

The launcher itself is generated with coursier.

Compiling it

Clone the sources:

$ git clone https://github.com/alexarchambault/jupyter-scala.git
$ cd jupyter-scala

Compile and publish them:

$ sbt publishLocal

Edit the launch script, and set VERSION to 0.3.0-SNAPSHOT (the version being built / published locally). Launch it:

$ ./launch --help

launch behaves like the jupyter-scala launcher above, and accepts the same options as it (--id custom-id, --name "Custom name", --force, etc - see --help for more infos). When launched, it will download (on first launch) and launch coursier, that will itself launch the kernel out of the artifacts published locally. If you install a Jupyter kernel through it, it will copy the coursier launcher in a Jupyter kernel directory (like ~/Library/Jupyter/kernels/scala211 on OSX), and setup a kernel.json file in it able to launch the copied coursier launcher with the right options, so that coursier will then launch the Jupyter kernel out of the locally published artifacts.

Once a kernel is setup this way, there's no need to run the launcher again if the sources change. Just publishing them locally with sbt publishLocal is enough for them to be used by the kernel on the next (re-)launch. One can also run sbt "~publishLocal" for the sources to be watched for changes, and built / published after each of them.

If one wants to make changes to jupyter-kernel or ammonium, and test them via Jupyter Scala, just clone their sources,

$ git clone https://github.com/alexarchambault/jupyter-kernel

or

$ git clone https://github.com/alexarchambault/ammonium

build them and publish them locally,

$ cd jupyter-kernel
$ sbt publishLocal

or

$ cd ammonium
$ sbt publishLocal

Then adjust the ammoniumVersion or jupyterKernelVersion in the build.sbt of jupyter-scala (set them to 0.3.0-SNAPSHOT or 0.4.0-SNAPSHOT), reload the SBT compiling / publishing jupyter-scala (type reload, or exit and relaunch it), and build / publish locally jupyter-scala again (sbt publishLocal). That will make the locally published artifacts of jupyter-scala depend on the locally published ones of ammonium or jupyter-kernel.

To generate a launcher using these modified ammonium / jupyter-kernel / jupyter-scala, run

$ VERSION=0.3.0-SNAPSHOT project/generate-launcher.sh -s

The VERSION environment variable tells the script to use the locally published jupyter-scala version. The -s option makes it generate a standalone launcher, rather than a thin one. A thin launcher requires the ammonium / jupyter-kernel / jupyter-scala versions it uses to be published on a (Maven) repository accessible to the users. It is the case for the launcher in the jupyter-scala repository, but it's likely not the case if you just modified the sources. A standalone launcher embeds all the JARs it needs, including the ones you locally published on your machine - at the cost of an increased size (~40 MB). Note that as this solution is a bit hackish, you shouldn't change the version of the versions of the locally published projects (these should stay the default 0.x.y-SNAPSHOT), so that the dependency management in the kernel still can find public corresponding artifacts - although the embedded ones will have the priority in practice.

Released under the Apache 2.0 license, see LICENSE for more details.

Related Repositories

coursier

coursier

Pure Scala Artifact Fetching ...

coursier

coursier

Pure Scala Artifact Fetching ...

spark-Jupyter-AWS

spark-Jupyter-AWS

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, wi ...

Vegas

Vegas

The missing MatPlotLib for Scala + Spark ...

sparknotebook

sparknotebook

An example of running Apache Spark using Scala in ipython notebook ...


Top Contributors

alexarchambault frgomes sbromberger

Releases

-   v0.3.0-M3 zip tar
-   v0.3.0-M3-2 zip tar
-   v0.3.0-M2 zip tar
-   v0.3.0-M1 zip tar