Managing Big Data’s Big Risks
In the last 15 years, we have witnessed an explosion in the amount of digital data available – and in the computer technologies used to process it. But while Big Data will undoubtedly deliver important scientific, technological, and medical advances, we must not lose sight of four major risks.
NEW YORK – In the last 15 years, we have witnessed an explosion in the amount of digital data available – from the Internet, social media, scientific equipment, smart phones, surveillance cameras, and many other sources – and in the computer technologies used to process it. “Big Data,” as it is known, will undoubtedly deliver important scientific, technological, and medical advances. But Big Data also poses serious risks if it is misused or abused.
Already, major innovations such as Internet search engines, machine translation, and image labeling have relied on applying machine-learning techniques to vast data sets. And, in the near-future, Big Data could significantly improve government policymaking, social-welfare programs, and scholarship.
But having more data is no substitute for having high-quality data. For example, a recent article in Nature reports that election pollsters in the United States are struggling to obtain representative samples of the population, because they are legally permitted to call only landline telephones, whereas Americans increasingly rely on cellphones. And while one can find countless political opinions on social media, these aren’t reliably representative of voters, either. In fact, a substantial share of tweets and Facebook posts about politics are computer-generated.