Data Hacking for Fun and Profit

Randy J. Fortier
randy.fortier@uoit.ca
@randy_fortier

Outline

  • What is hacking?
    • What is data hacking?
  • Who can program?
  • Open data
  • Obesity data example
  • Bike share data example

Data Hacking

What is hacking?

Hacking

  • What hacking isn't:
    • Stealing identities
    • Looking at top secret pentagon files
    • Espionage
    • Electronic heist

Hacking

  • What hacking is:
    • Finding a way to make some thing(s) help you solve a problem
    • It may involve:
      • Figuring out how the thing works
      • Using the thing in a way that was not intended

Hacking

  • Often, hacking involves using hardware:
    • Raspberry Pi
    • Arduino
    • Sensors
    • Wireless router
    • etc..

Data Hacking

  • Data hacking is simply hacking with data
    • Open data
      • e.g. Government statistics
    • Commercial web services
      • e.g. PayPal, Twitter

Data Hacking

  • Writing programs to access data
    • Quick
    • Unconventional
    • Probably buggy (but who cares?)

Data Hacking

Who can Program?

Who can Program?

  • Rules for who can be a programmer:
    1. You have to be a math whiz
    2. You have to be male
    3. You have to play Dungeons and Dragons

Who can Program?

  • Rules for who can be a programmer:
    1. You have to be a math whiz
    2. You have to be male
    3. You have to play Dungeons and Dragons
  • FALSE!!!!
    • Everyone can code
    • You just need a fun reason to learn

Data Hacking

Open Data

Open Data

  • Government of Canada Open Data (CSS, XML, JSON):
    • http://open.canada.ca/data/en/dataset
  • Government of Ontario Open Data (mostly CSS, XLSX):
    • https://www.ontario.ca/open-data
  • City of Toronto Open Data (CSS, XML, JSON):
    • http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=1a66e03bb8d1e310VgnVCM10000071d60f89RCRD

CSV


Phillip,Fry
Turanga,Leela
Bender,Rodriguez
Hermes,Conrad
Amy,Wong
Hubert,Farnsworth
                        

XML


<?xml version="1.0" ?>
<class-list>
   <student firstName="Phillip" lastName="Fry" />
   <student firstName="Turanga" lastName="Leela" />
   <student firstName="Bender" lastName="Rodriguez" />
   <student firstName="Hermes" lastName="Conrad" />
   <student firstName="Amy" lastName="Wong" />
   <instructor firstName="Hubert" lastName="Farnsworth" />
</class-list>
                        

JSON


{"courseCode": "CSCI 0010u",
 "courseName": "Data Hacking 101",
 "classList": [
    {"sid": "100000001", "fName": "Phillip", "lName": "Fry"},
    {"sid": "100000001", "fName": "Turanga", "lName": "Leela"},
    {"sid": "100000001", "fName": "Hubert",  "lName": "Farnsworth"},
    {"sid": "100000001", "fName": "Amy",     "lName": "Wong"},
    {"sid": "100000001", "fName": "Bender",  "lName": "Rodriguez"},
    {"sid": "100000001", "fName": "Hermes",  "lName": "Conrad"}
 ]
}
                        

Data Hacking

Demos

Youth weight categories

Bike Share stations

Wrap-Up

  • In this talk, we answered the following questions:
    • What is hacking?
    • Who can do it?
    • Where can I find data?
    • How do I do it?