Skip to main content

Open Data - the How?

Back to top
How to open data that is in closed format?

Electronically available data is often not in an open or machine readable formats, but instead in closed formats such as pdf documents or HTML (web text). Also it can be poorly structured. This makes it difficult for people and programs to  work with the datasets, and therefore it becomes essential to extract the data – i.e. turn it from closed data into open data. Extracting data from websites and pdfs into open format, is also called data scraping or web scraping. There are a number of ways in which data can be extracted from a closed format to make it open, ranging from simple software to intensive coding.

Helpful Resources

Open Data Manual – Open Knowledge Nepal: http://odap.oknp.org/files/Open%20Data%20Book%20Manual.pdf
Data Extraction Tools: https://bbvaopen4u.com/en/actualidad/data-extraction-tools-beginners-an…
Web Scraping (Python): https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scrapin…

How to clean data, especially after scraping from closed format?

Data cleaning is the process of identifying and removing incomplete, incorrect, inaccurate and irrelevant data from a certain dataset. Cleaning data helps to tidy up the numbers, and also ensures that datasets are clear to understand and useful to work with. Data cleaning is also known as data scrubbing.

Helpful Resources

Data Cleaning Tools: https://www.datasciencecentral.com/profiles/blogs/5-data-cleansing-tools
Data Cleaning Tools: https://analyticsindiamag.com/10-best-data-cleaning-tools-get-data/
School of Data: Scraping: http://schoolofdata.org/handbook/courses/scraping/
Scraping Website using the Scraper extension for Chrome: http://schoolofdata.org/handbook/recipes/scraper-extension-for-chrome/
Data Journalism Handbook: http://datajournalismhandbook.org/1.0/en/getting_data_3.html
QuickCode – Python and R Data Analysis: https://quickcode.io

How to analyse and visualize Open Data?

After the data is scraped and cleaned, one needs to work with the dataset to find relationships and patterns, and add statistical or logical techniques to build, support or falsify arguments. This process is called data analysis, which is simply the process of evaluating data with statistical or logical reasoning. Once data is analysed, to communicate and present the data or findings in an easy to understand way through visuals, such as graphs or charts, is called data visualization.

Helpful resources

School of Data: Analysing data: http://schoolofdata.org/handbook/courses/analyzing-data/.
Data Visualization Tools: https://www.forbes.com/sites/bernardmarr/2017/07/20/the-7-best-data-vis…
Open Source Data Visualization Tools: https://blog.capterra.com/free-and-open-source-data-visualization-tools/
Data Analysis in Microsoft Excel: http://www.excel-easy.com/data-analysis.html
Data Analysis in SPSS:  https://students.shu.ac.uk/lits/it/documents/pdf/analysing_data_using_s…
Data Analysis in Google Spreadsheets: https://support.google.com/docs/table/25273?hl=en&page=table.cs&rd=1
Pitfalls while analyzing data: http://schoolofdata.org/handbook/courses/common-misconceptions/

How to publish data in an open format?

To publish data in an open format, first start with picking the dataset that you wish to publish and apply an open license to it. Choose the format for the data and make it available online for the public for use.

Helpful Resources

How to Open Data: https://okfn.org/opendata/how-to-open-data/
Choosing the right format for open data: https://www.europeandataportal.eu/elearning/en/module9/#/id/co-01
Open File Formats: http://opendatahandbook.org/guide/en/appendices/file-formats/
Open Data: What is it and why should you care: http://www.govtech.com/data/Got-Data-Make-it-Open-Data-with-These-Tips…

Choosing the right gambling website: https://gamblechecker.com