The openapi Project

Wiki New Zealand

What do you need to connect people with data?

Access to data
Domain knowledge
Data Science skills
Statistical Graphics skills
Graphical Design skills

Few individuals possess all of these

Wiki New Zealand

Some problems:

Access to data
Domain knowledge
Data Science skills
Statistical Graphics skills
Graphical Design skills

Some solutions:

Open Data, Open Government, Open Access
Education and experience
Software that does everything for you

The openapi Project

Some problems:

Domain knowledge
Data Science skills
Statistical Graphics skills
Graphical Design skills

Some solutions:

Education and experience
Software that does everything for you
Software that lets everyone do a small piece

Allows more ways to contribute
Allows small contributions
Allows contributions to be combined

openapi is Data and Scripts

LTD404701_20140509_101154_10.csv

"Birth rates - DFMA (Annual-Dec)",""
"","Total Population"
1855,39.25
1856,37.81
1857,39.47
1858,38.24
...

birthrate-file.R

# Read in original data source from Stats NZ ...
#   'brsrcfile' ("LTD404701_20140509_101154_10.csv")
# ... and tidy it to produce nicer CSV ...
#   "birthrate.csv"
lines <- readLines(brsrcfile)
# Drop any line that does not start with a digit
writeLines(lines[grep("^[0-9]", lines)], "birthrate.csv")

openapi is Modules

<?xml version="1.0"?>
<module xmlns="http://www.openapi.org/2014/" version="0.1">
  <platform name="fileSystem"/>
  <description><![CDATA[This module provides a CSV file call ...
  <output name="brsrcfile" type="external"
          ref="data/LTD404701_20140509_101154_10.csv"/>
</module>

<?xml version="1.0"?>
<module xmlns="http://www.openapi.org/2014/" version="0.1">
  <platform name="R"/>
  <description><![CDATA[This module takes a CSV file and pro ...
  <input name="brsrcfile" type="external"/>
  <output name="brfile" type="external" ref="birthrate.csv"/>
  <source ref="src/birthrate-file.R"><![CDATA[]]></source>
</module>

openapi is Pipelines

<pipeline xmlns="http://www.openapi.org/2014/" version="0.1">
  <component name="brsource"/>
  <component name="birthrate"/>
  <component name="brplot-R"/>
  <pipe>
    <start component="brsource" name="brsrcfile"/>
    <end component="birthrate" name="brsrcfile"/>
  </pipe>
  <pipe>
    <start component="birthrate" name="brfile"/>
    <end component="brplot-R" name="brfile"/>
  </pipe>
</pipeline>

openapi is a Glue System

library(oaglue)

p <- readPipeline("birthrate-pipe")
results <- runPipeline(p)

          compname         name        type       format formatType
brsrcfile "birthrate-pipe" "brsrcfile" "external" ""     "text"    
brfile    "birthrate-pipe" "brfile"    "external" ""     "text"    
brsvg     "birthrate-pipe" "brsvg"     "external" ""     "text"    
          ref                                                 
brsrcfile "data/LTD404701_20140509_101154_10.csv"             
brfile    "birthrate-pipe/Components/birthrate/birthrate.csv" 
brsvg     "birthrate-pipe/Components/brplot-R/birthrate-R.svg"

An openapi example

Andrew Balemi wanted to add an annotation to the Wiki New Zealand plot of NZ birth rate to show the end of World War II (the onset of the baby boomers)

An openapi example


births <- read.csv(brfile, col.names=c("year", "births"))

svg("birthrate-R.svg")
plot(births, type="l")
abline(v=1945 +
       as.numeric(as.Date("1945-09-02") - as.Date("1945-01-01"))/365)
dev.off()

An openapi example

<?xml version="1.0"?>
<module xmlns="http://www.openapi.org/2014/" version="0.1">
  <platform name="R"/>
  <description><![CDATA[This module reads a CSV file and pro ...
  <input name="brfile" type="external"/>
  <output name="brsvg" type="external" ref="birthrate-R.svg"/>
  <source ref="src/birthrate-plot-custom.R"><![CDATA[]]></source>
</module>

An openapi example

<pipeline xmlns="http://www.openapi.org/2014/" version="0.1">
  <component name="brsource"/>
  <component name="birthrate"/>
  <component name="brplot-R"/>
  <component name="brplot-R-custom"/>
  <pipe>
    <start component="brsource" name="brsrcfile"/>
    <end component="birthrate" name="brsrcfile"/>
  </pipe>
  <pipe>
    <start component="birthrate" name="brfile"/>
    <end component="brplot-R" name="brfile"/>
  </pipe>
  <pipe>
    <start component="birthrate" name="brfile"/>
    <end component="brplot-R-custom" name="brfile"/>
  </pipe>
</pipeline>

Another openapi example

birthrate-plot.py

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

year, births = np.loadtxt(brfile, unpack=True, delimiter=",")

plt.plot_date(x=year, y=births, fmt="r-")
plt.grid(True)
plt.savefig("birthrate-py.svg")

Another openapi example

<?xml version="1.0"?>
<module xmlns="http://www.openapi.org/2014/" version="0.1">
  <platform name="python"/>
  <description><![CDATA[This module reads a CSV file and pro ...
  <input name="brfile" type="external"/>
  <output name="brsvg" type="external" ref="birthrate-py.svg"/>
  <source ref="src/birthrate-plot.py"><![CDATA[]]></source>
</module>

Another openapi example

<pipeline xmlns="http://www.openapi.org/2014/" version="0.1">
  <component name="brsource"/>
  <component name="birthrate"/>
  <component name="brplot-R"/>
  <component name="brplot-py"/>
  <pipe>
    <start component="brsource" name="brsrcfile"/>
    <end component="birthrate" name="brsrcfile"/>
  </pipe>
  <pipe>
    <start component="birthrate" name="brfile"/>
    <end component="brplot-R" name="brfile"/>
  </pipe>
  <pipe>
    <start component="birthrate" name="brfile"/>
    <end component="brplot-py" name="brfile"/>
  </pipe>
</pipeline>

Yet another openapi example

What if all of the New Zealand Youth (18-24) who did NOT vote in 2011 all voted for the Internet Party in 2014?

Yet another openapi example

What do I need?

Access to data
Domain knowledge
Data Science skills
Statistical Graphics skills
Graphical Design skills

Yet another openapi example

<?xml version="1.0"?>
<module xmlns="http://www.openapi.org/2014/" version="0.1">
  <platform name="fileSystem"/>
  <output name="nvfile" type="external" ref="data/non-voters.csv"/>
</module>

<?xml version="1.0"?>
<module xmlns="http://www.openapi.org/2014/" version="0.1">
  <platform name="fileSystem"/>
  <output name="popfile" type="external" 
  ref="data/TABLECODE7511_Data_821b2c90-79e3-4462-9994-4ae796f6e654.csv"/>
</module>

Yet another openapi example

<?xml version="1.0"?>
<module xmlns="http://www.openapi.org/2014/" version="0.1">
  <platform name="R"/>
  <input name="nvfile" type="external"/>
  <input name="popfile" type="external"/>
  <output name="nonvoters" type="internal"/>
  <output name="pop2013" type="internal"/>
  <output name="pop2013grouped" type="internal"/>
  <source ref="src/tidy.R"><![CDATA[]]></source>
</module>

Yet another openapi example

What if you could search for an existing script?
Or request a script?
Or request a module wrapper for an existing script?
Or write your own wrapper on an existing script?

Yet another openapi example

Where did I get 5.51 from?
My statement is informed
We can share and remix the data and the code
Our discussion can be informed

Summary

We want to connect people with data
- more informed individuals
- more informed discussion
We propose a framework that enables small contributions
- more ways to contribute
- more contributors
- more contributions
We have made a start
- modules
- pipelines
- glue systems

References and links

Two (partly compatible) experimental frameworks are under development (both on GitHub): 'oaglue' and 'conduit'
These are R packages that have functions to read, write, and run modules and pipelines.
These slides and the resources used to create them, including example data, scripts, modules, and pipelines, are available at: https://www.stat.auckland.ac.nz/~paul/Talks/OpenAPI2014/.

Acknowledgements

Bradley Drayton helped with some early exploratory work supported by a Faculty of Science Research Development Grant.
Ashley Noel Hinton was supported by a Faculty of Science Research Development Grant.
Wiki New Zealand
The birth rate data, 2011 election data, and population estimates are all from Statistics NZ.
The 2011 election results pie chart came from the Rock Enrol web site.

openapi wants to be ...

As simple as possible
- no looping contructs or conditionals
As few concepts as possible
- modules, pipelines, glue system
As independent as possible
- script author, module author, pipeline author can all be different people
As open as possible
- open source, open data, open access
As portable as possible
- cross platform, language agnostic

openapi could be (with funding) ...

Multiple glue systems
Repositories of modules and pipelines
Request sites for modules and pipelines
Search, ratings, and recommendations for modules and pipelines
Tools for creating modules and pipelines
(including GUIs)
Automated pipeline creation
(from simple script markup)

openapi challenges

Debugging modules and pipelines
Documentation of modules and pipelines
Reliability of modules and pipelines
- Availability of required packages/libraries
Speed of execution, strong typing, security
Bundling modules and pipelines (and resources)
Relative/absolute and remote/local resources
Creating general-purpose modules
Capturing "non-functional" scripts as modules
Modules for "annotating" data
Why should people contribute?

The Project

Wiki New Zealand

Wiki New Zealand

Wiki New Zealand

Wiki New Zealand

The openapi Project

openapi

openapi is NOT visual programming

openapi is NOT visual programming

openapi is NOT visual programming

openapi is Data and Scripts

openapi is Modules

openapi is Pipelines

openapi is a Glue System

openapi is ...

An openapi example

An openapi example

An openapi example

An openapi example

An openapi example

Another openapi example

Another openapi example

Another openapi example

Another openapi example

Another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Yet another openapi example

Summary

References and links

Acknowledgements

openapi wants to be ...

openapi could be (with funding) ...

openapi challenges