Add new comment
by Barrett Smith
(Part 4 of the "Open Gov" blog series)
In the previous posts in this series, we’ve talked about the requirements of the Administration’s Open Data mandate in terms of what needs to be implemented. In this post, we’ll talk about how to actually meet the November deadline using Drupal.
As with many things in Drupal, there are several ways you might implement functionality to meet the requirements of the mandate, depending on the distinct needs of your agency. Some aspects that should be considered when implementing a solution are:
How and where is the agency’s Enterprise Inventory stored?
How will public datasets from the inventory be added to the Public Data Catalog?
How will the agency collect and respond to feedback on the datasets listed in the Public Data Catalog?
In Drupal, rolling your own solution is always an option. There is one gotcha to be aware of though. The Views Datasource module seems like the logical pick for generating JSON output. However, because it does not properly render multipart/multivalued fields, the resulting JSON is invalid. As a result, the current best option seems to be custom coding the JSON output page.
The Public Data Catalog
Unless your agency’s needs are truly unique, there are two solutions which are pre-rolled for solving the Public Data Catalog of the Open Data mandate. Both take the approach of providing a node type for storing the required data and providing output lists of that those nodes.
At the 2013 Drupal Government Days conference, several attendees spent the second day in a code sprint to develop the Open Data module. The module is based on Features and provides a Dataset node type with fields for all the required elements of the CCM schema and a JSON listing of nodes of the Dataset type at the /data.json URL. An additional, optional module provides a human-readable version of the list at the /data URL.
The module is in active development, with the current recommended version being 7.x-1.x-beta5. The addition of RDFa markup to the human-readable listing is an open issue.
Another option is the DKAN distribution developed and maintained by New Amsterdam Ideas. DKAN is a Drupal implementation of popular CKAN data publishing platform. Taken as an entire distribution, DKAN bundles Drupal core, a custom theme, faceted search, and two content types for representing a dataset and resource (i.e., data file). In addition, the DKAN Datastore module provides capability to upload a CSV file and have the content stored in the database and made available by API automatically.
The DKAN distribution is a full-featured solution for agencies without a current site. Those with existing sites may instead choose to install and enable the component modules independently. However, one shortcoming is that DKAN does not currently provide a JSON feed of datasets.
The workflows which may be implemented to meet the customer feedback requirement are too numerous to list. However, the minimal requirements could all be fulfilled using Drupal’s core Commenting functionality and human practices to review and respond to the submitted feedback. Additional functionality could be provided by using the Fivestar module to allow users to vote up, or vote down, a particular dataset.
The Enterprise Inventory
Neither the Open Data module nor DKAN presume the manner in which the Enterprise Inventory is integrated. With either solution, the Inventory could be maintained on the same site as the Public Data Catalog by creating Dataset nodes for all entries and filtering out those which are not public. Conversely, the Public Data Catalog could be maintained entirely independently by only creating those nodes which should appear in the Public Data Catalog.
Another option which both solutions support is integrating with an external inventory using the Feeds module to read items from the external source and create matching Dataset nodes.
So, as you can see, meeting the Open Data mandate’s requirements for November is surprisingly easy in Drupal. The hard part is likely to be human and business processes necessary to support the technical implementation; the review processes for submitted feedback, the processes for reviewing datasets for releasability. Has your agency started on those yet?