Training at Data Harvest 2014

Revised 15th of May 2014

Kaas & Mulvad did a lot of presentations and training at Data Harvest in Brussels 8th to 11th of May 2014

Get stories out of fresh Farmsubsidy data

We have now extracted 2013 data from 25 countries, totalling 26,1 bio Euro. Last year we got data on 26,6 bio. Euro in total.
You can see the google spreadsheet here with link to data and documentation:
https://docs.google.com/spreadsheet/ccc?key=0Ajagl3TOC7X_dFlzQ0ljaUxUWVNmNE40TGdweWNlcEE&hl=en#gid=5
Status is then:
Finished data from 16 countries:
BG, CZ, DE, DK, FI, FR, HU, IT, LT, LU, LV, NL, PT, SE, SK, SI
Raw data from 9 countries:
AT, BE, EE, ES, GB, IE, MT, PL, RO
No data yet (2 countries):
CY, GR

Importing PDF
A tipsheet with overview of good tools for importing PDF

Scraping with Helium
http://bit.ly/1ts0Vba

Visualisation with Google Fusion
Training-material
Dataset with recipients
Dataset on municipalities
Danish municipalities

Saturday 10th of May 2014

Fighting the secrecy about Multi-Resistant Bacteria

Three deaths of hospital bacteria spread by pigs

Danish pigs spread hospital bacteria

Possible punishment for revealing the names of pig farms

Pig-related types of MRSA in Netherland: mostly in rural areas with lots of pigfarms.

Animal related types of MRSA in Netherlands

Open Refine -cleaning the really dirty data
Training material

Taking scrapers to the next level
11 tips for scrapers at the next level

Friedrich Lindenberg also have recommends for this, here collected by Crina Boros:

Scraper Wiki

https://scraperwiki.com/

Scrape Twitter; extract PDFs; scrape the web

 

Planning alerts – schedule scrapers or run manually; wrote your scrapers; emails alerts for broken scrapers

http://morph.io/planningalerts

 

Lobby

OKFN – lobby facts data api

http://api.lobbyfacts.eu/

 

JENKINS

It runs your scraper with your set frequency

http://norton.pudo.org/jenkins/ – it requires a user name and password

http://jenkins-ci.org/

 

What if data spoke to me?

IF THIS THEN THAN THAT (IFTTT) is a service that let you create powerful connections with one simple statement:

You can scrape your own emails as well

www.ifttt.com/myrecipes/personal

 

80LEGS

A spider / webcrawler. It collects large amounts of data

http://www.80legs.com/

 

import.io – Web Data Extraction Made Easy

https://import.io/

 

KIMONO

It turns websites into structured APIs for your browser in seconds

www.kimonolabs.com

 

Rapid Miner

Data miner; analytics. Get the open version. It includes scrapers for data mining.

http://it.toolbox.com/wiki/index.php/RapidMiner

 

Python for journalists – Write your own scrapers

https://p2pu.org/en/groups/python-for-journalists-20112012/

Joint the mailing list for journalists

 

Scrapy – plugin for Python

http://doc.scrapy.org/en/latest/

https://pypi.python.org/pypi/Scrapy

 

Ruby on Rails

An open-source web framework

http://rubyonrails.org/

 

Comments are closed.