Revised 15th of May 2014
Kaas & Mulvad did a lot of presentations and training at Data Harvest in Brussels 8th to 11th of May 2014
Get stories out of fresh Farmsubsidy data
We have now extracted 2013 data from 25 countries, totalling 26,1 bio Euro. Last year we got data on 26,6 bio. Euro in total.
You can see the google spreadsheet here with link to data and documentation:
https://docs.google.com/spreadsheet/ccc?key=0Ajagl3TOC7X_dFlzQ0ljaUxUWVNmNE40TGdweWNlcEE&hl=en#gid=5
Status is then:
Finished data from 16 countries:
BG, CZ, DE, DK, FI, FR, HU, IT, LT, LU, LV, NL, PT, SE, SK, SI
Raw data from 9 countries:
AT, BE, EE, ES, GB, IE, MT, PL, RO
No data yet (2 countries):
CY, GR
Importing PDF
A tipsheet with overview of good tools for importing PDF
Scraping with Helium
http://bit.ly/1ts0Vba
Visualisation with Google Fusion
Training-material
Dataset with recipients
Dataset on municipalities
Danish municipalities
Saturday 10th of May 2014
Fighting the secrecy about Multi-Resistant Bacteria
Three deaths of hospital bacteria spread by pigs
Danish pigs spread hospital bacteria
Possible punishment for revealing the names of pig farms
Pig-related types of MRSA in Netherland: mostly in rural areas with lots of pigfarms.
Animal related types of MRSA in Netherlands
Open Refine -cleaning the really dirty data
Training material
Taking scrapers to the next level
11 tips for scrapers at the next level
Friedrich Lindenberg also have recommends for this, here collected by Crina Boros:
Scraper Wiki
Scrape Twitter; extract PDFs; scrape the web
Planning alerts – schedule scrapers or run manually; wrote your scrapers; emails alerts for broken scrapers
http://morph.io/planningalerts
Lobby
OKFN – lobby facts data api
JENKINS
It runs your scraper with your set frequency
http://norton.pudo.org/jenkins/ – it requires a user name and password
What if data spoke to me?
IF THIS THEN THAN THAT (IFTTT) is a service that let you create powerful connections with one simple statement:
You can scrape your own emails as well
www.ifttt.com/myrecipes/personal
80LEGS
A spider / webcrawler. It collects large amounts of data
import.io – Web Data Extraction Made Easy
KIMONO
It turns websites into structured APIs for your browser in seconds
Rapid Miner
Data miner; analytics. Get the open version. It includes scrapers for data mining.
http://it.toolbox.com/wiki/index.php/RapidMiner
Python for journalists – Write your own scrapers
https://p2pu.org/en/groups/python-for-journalists-20112012/
Joint the mailing list for journalists
Scrapy – plugin for Python
http://doc.scrapy.org/en/latest/
https://pypi.python.org/pypi/Scrapy
Ruby on Rails
An open-source web framework
Comments are closed.