To date, access to this kingmaking power is very unequal. It has been concentrated in the hands of a select few companies, most of which were already quite powerful to begin with. They alone possess the enormous resources required to collect quality data and turn it into value.
Meanwhile, the rest of the world is drowning in data, but that data and the power it confers remain out of our reach. By extension, the ability to leverage that data in ways that maximize its potential for serving the collective good of society as a whole is limited.
Fortunately, that is now changing. New means of providing access to data, as well as novel tools for correlating data, contextualizing data and analyzing data in real time, promise to usher in an age of democratized data.
In this article, Flux provides a look at how data is being democratized. It focuses on the specific use case of collecting and analyzing environmental data, although the trends discussed below have the potential to play out anywhere that data leads to insight.
The Challenges of Data Democratization
In theory, anyone can collect data or access the various open data sets that are collected by government agencies and other organizations committed to open data.
Anyone can theoretically analyze and process data, too; after all, most of the major frameworks for big data processing, like Hadoop and Spark, are open source. There's no technical or legal barrier stopping someone from downloading and running them.
At a practical level, however, actually collecting, transforming, storing and/or analyzing data on a large scale is unfeasible for most individuals and organizations today. That's true for a number of reasons:
- Data is
harvested at high, broad levels. Most open-access environmental data sets focus on broad regions.
If you want to study a particular place or microclimate, it can be hard to
find the data you need. And while you could theoretically collect the data
yourself, deploying and managing your own sensors or other data collectors
is often not realistic because of a traditional lack of affordable, open
data sensors. - Data may
be biased. Even
if you have access to raw open-access data, you can’t be certain that the
data is not presented in ways that skew your ability to interpret it
accurately or fairly. - Data
storage is expensive. Sure,
you can now store data in the public cloud for just pennies per gigabyte.
But when your data sets reach terabytes in size, those costs add up and
few organizations have the budget to sustain them over the long term. (And
if you don't collect data over the long term, you are likely to miss out
on important insights, especially in contexts like environmental data
where change typically results from infrequent,
sudden events.) - Pre-collected
data is outdated data. If you
rely on data that was collected by someone else, chances are that the data
will be stale by the time you access it. Plus, it will take you more time
to transform the data, clean up data-quality issues and run analysis. By
the time all of this is done, the insights you can glean from the data may
be outdated. The only way to solve these problems is to collect and analyze
data in real time, but again, most organizations lack the resources to do
this on their own. - Lack of
data interpretation and artificial intelligence (AI) tools. Again, many frameworks for
collecting and processing large data sets are open source. But advanced
tools for making sense of data, like AI algorithms, tend to be
proprietary. The companies that develop these tools invest huge amounts of
money in them and rarely make them available to third parties. - Poor
incentives for data and AI sharing. Part of the reason for the challenge described in the preceding
point is that few organizations have strong incentives to share their data
and proprietary AI tools. To date, most companies that benefit from data
monetize it through advertising or internal research; there has been
little reward for them in sharing their data and data tools with third
parties.
What all of the above means is that data has tended, so far, to be very undemocratic. It increases the power of organizations that are powerful to begin with and therefore have the resources to undertake large-scale, proprietary data collection and analysis programs. It leaves everyone else struggling to make sense of the tidbits of data that are available from open data sets, which are usually of limited value for gaining real-time insights. And even if you find access to meaningful, relevant data, you may not have the advanced AI tools that are necessary to turn the data into value.
This is why we struggle to maximize the value of all of the data that is being generated around us. As Microsoft’s Lucas Joppa noted in Nature:
“Today, we know more than ever about human activity. More than one-quarter of the 7.6 billion people on Earth post detailed information about their lives on Facebook at least once a month. Nearly one-fifth do so daily. ... Yet we are flying blind when it comes to understanding the natural world.”
Joppa continued by pointing to some of the reasons why we do such a poor job of transforming all of the environmental data surrounding us into insight. The problem is not only that scientists “don’t have the kinds of data needed to make such predictions” but also that they “lack the algorithms to convert data into useful information.”
When all but a handful of organizations have the power to glean meaningful insight from environmental data, and they are not actually doing it, people who interact in critical ways with the environment — like foresters and builders — cannot make data-driven decisions that are in the best interests of all stakeholders. Nor can anyone hold them to account.
What It Takes to Democratize Data
It doesn’t have to be this way. Data can be democratized in ways that make it practical for any person or group to derive insights from the data surrounding us.
Doing so requires:
- The ability to store and share data in
an open, decentralized way. Shared data would not only make more data available to people who
need it but would also — and this is a really important part of data
democratization — allow us to place data from many different sources on
the same plane and in the same context, so that we maximize visibility and
insights. - An incentives system that rewards
people for sharing data with each other and makes it feasible to monetize
data in ways that are not purely self-serving. - Access to AI-powered data analysis
tools that anyone can use. - Open, affordable data harvesters that
anyone can deploy.
These solutions are all part of the platform that Flux is building. Flux is the antidote to the natural tendency of big data and AI to monopolize power, rather than democratize it. By leveraging blockchain technology for open, affordable data storage, Flux provides the advanced AI tools necessary to reward organizations for sharing and collecting data via open-sourced hardware sensors called MICOs. In summary, Flux is creating a new environmental standard for data collection, storage and intelligence.
Learn more by downloading the Flux white paper.