Open data has been heralded as a powerful, emerging movement. But just how usable is the data involved?
Access to open data has significantly increased over the past few years. Pressure is being placed on companies, organisations and Governments alike to release their raw data in a number of formats. Similarly, developing open initiatives in the world of business endeavours to make effective use of, collect, and publish valuable datasets. The movement on the whole aims to increase public transparency and to allow for enhanced data-driven policy and interactions. But, despite growing momentum and broad potential, there are some limitations with the open data movement which creators and users alike are seeking to improve.
At the University of Southampton, a recent research festival has seen some innovative ways for computer programmers to make better use of valuable open datasets. Particularly, this looks at the format of data sets hosted on Government websites such as Gov.uk which currently provide data in both WMS and WFS formats sourced from local councils – a percentage of which do not work correctly. The number of datasets hosted exceeds 2500 and in some cases there may be data which crosses over with one another, but cannot be easily merged semantically. For example, one dataset could provide data about disabled parking in the city of Southampton, whereas another could provide data regarding all of the different types of parking in the same city. The second set holds the same disabled parking data, only the specific data requires more time to find.
Above: An example of the open data available on Data.Gov.UK (Tree Preservation Orders sourced from local councils, and provided in WMS/WFS formats), 2016.
The coverage and relevancy of WMS/WFS datasets may also be an area for improvement. Currently, these are the only two file options hosted by Gov.uk which commonly have few or no maintenance notes at all. This means that programmers looking to find and use data may find it difficult to find information regarding the history of the data, or even to identify whether some items are new. Most WMS servers are built on WFS data despite not having a clear way to link between WMS and WFS API’s for datasets. Although more open data is better, the format and standards of these datasets is integral to their effective use and overall value.
The Southampton team, lead by open data expert Christopher Gutteridge, explained that the WMS/ WFS data types can pose a problem for individual programmers who wish to manipulate the data, as only certain software can access it automatically. Said software, such as QGIS (free) and ARCGIS (commercial), are big complex programmes with a learning curve and very alien for the web programmer or command-line-programmer who just wants data in a file, or a standardised API. The APIs in this situation require human intervention and interpretation to select datasets which is fine for researchers but terrible for someone who wants to make a UK car-park discovery phone app. Therefore, in order to get hold of the actual data with ease the individual is forced to cut and paste the URL, and to explore it using services such as an OAI explorer which can take time and effort.
Above: The home page of the ‘Geo-Explorer’ application created by Christopher Gutteridge and team at the University of Southampton, 2016.
Although many datasets hosted on Gov.uk feature useful metadata, there is no enforced standard for identifying and creating metadata titles, descriptions, owners or licenses in the first place. As a result, each API endpoint has schema built from scratch by the person who created it. Ultimately, varying schema can cause confusion, especially when in combination of such diverse implementations of the API. These can range from invalid XML, failing to support the function “post”, to invalid dataset names (which must never contain spaces, but often do). The team highlighted these areas where the quality of a dataset can be developed further, and thus increase its usability.
In just three days the team began work on an application which works specifically with the University of Southampton’s endpoint. The goal was to be able to recommend application profiles, support open data standards, and to be able to define fields for creators to follow: all of which are essential steps towards being able to semantically merge relevant open data. This resulted in the creation of ‘Geo-Explorer’ (shown above), a tool for programmers and individuals alike wishing to access and use open map data easily. If you have a WFS that you want to get data out of, you simply paste the link into http://geo-explore.ecs.soton.ac.uk/ and it will build the query for you to get the data out, or view WFS/WMS data on a map.
Although the application is designed for use in the University of Southampton, the team are hopeful that the value of this type of research can be recognised by other open data supporters around the world. As the research project only lasted an extremely short duration of time the idea has the potential to be adopted and developed by other open data advocates looking to support and improve the open data movement on a short term scale. By taking open data that is geared up for specific applications, such as mapping, and then developing applications which allow the data to be more versatile the team hope that open datasets can become more valuable, usable and flexible.
Above: An example of the visualisation tool in the ‘Geo-explorer’ application, showing WFS open data of local voting districts rendered as a map overlay.