In the last couple of decades, vast amounts of data have been collected about people, environments and situations. That data tends to exist in quarantined silos, aggregated only by organizations that spend large sums for that data and then keep the new aggregate in their own quarantined silos. This means that data is available for business use, but it is not necessarily available for appropriate public and humanitarian use. In addition, information is generally locked together with the applications that the data is attached to, meaning that research done by one organization is often difficult to share with another. The Data Anywhere Project is working to solve these problems by creating open databases that can be queried by many different applications.
Drew Hornbein wrote a blog post recently about how Data Anywhere can help the world of Disaster Response management. He described his experience doing some work with Staten Island Community and Interfaith Long Term Recovery Group, and how the data that he needed to work with was tied up in a format that could not be queried database-style. It took weeks to gather information that a simple query could have managed under different circumstances. Furthermore, the data was tied up in Google Documents that could either be shared in their entirety or not at all. In addition to massaging the data into useable information, he also had to export it into a format that could be shared more easily without over-sharing. He explains how Data Anywhere can help in situations like that.
He lays out an ideal picture of how data could be shared in unified Data Anywhere databases:
- Data would have a persistent home.
- Data would be machine readable.
- Data stewards could manage access to the data, making some public and keeping some private.
- Reports could be generated on the fly.
- Third party applications could query the data.
It is important to note that data privacy is an important factor in this project. One of the goals of the project is to give granular control over access so that the collecting organization can keep all of the information that they gathered stored in Data Anywhere, but other users of the system will not have access to things that they should not. This is a keen challenge for those interested in developing tight, secure code for big data.
At a recent #OccupyDataNYC hack day attended by Hornbein, developments on Data Anywhere were led by Gloria W and consisted of setting up a simple database, which could replicate itself, and simple scrapers on various virtual machines. Together they attempted to scrape a wide assortment of data as a proof of concept for the project. They also created an API for querying data to use it in custom applications.
There are many continuing challenges with this project. Standards need to be created for the format of data. Security features need to be built into the system and tested thoroughly. APIs must be written to handle a wide range of use cases.
Some of these challenges are highly technical, but some challenges need a different sort of expertise. Subject Matter Experts (SMEs) in the area of research and statistics would be very helpful in designing the data standards as well as describing use cases that will help shape the APIs. And although Hornbien’s focus is on Disaster Response uses, the initial idea behind Data Anywhere was for a much broader use of existing data on the Web, so SMEs who have research experience in a range of specific topics can bring their knowledge to bear.
You can explore the code for this project and fork a copy of your own at GitHub.