Netflix, Facebook — and the NSA: They’re all in it together

On June 9, the Wall Street Journal reported that for the last few years the National Security Agency has been relying on a software program with the quirky name Hadoop to help it make sense of its enormous collections of data. Named after a toy elephant that belonged to the child of one of the original developers of the program, Hadoop, reported the Journal, is a crucial part of a computing and software revolution a piece of free software that lets users distribute big-data projects across hundreds or thousands of computers.

Revolution is probably the most overused word in the chronicle of Internet history, but if anything, the Wall Street Journal undersold the real story. Hadoops importance to how we live our lives today is hard to overstate. By making it economically feasible to extract meaning from the massive streams of data that increasingly define our online existence, Hadoop effectively enabled the surveillance state.

And not just in the narrowest, Big Brother, government-is-watching-everyone-all-the-time sense of that term. Hadoop is equally critical to private sector corporate surveillance. Facebook, Twitter, Yahoo, Amazon, Netflix just about every big player that gathers the trillions of data events generated by our everyday online actions employs Hadoop as a part of their arsenal of Big Data-crunching tools. Hadoop is everywhere as one programmer told me, its taken over the world.

The Journals description of Hadoop as a piece of free software barely scratches the surface of the significance of this particular batch of code. In the past half-decade Hadoop has emerged as one of the triumphs of the non-proprietary, open-source software programming methodology that previously gave us the Apache Web server, the Linux operating system and the Firefox browser. Hadoop belongs to nobody. Anyone can copy it, modify, extend it as they please. Funny, that: A software program developed collaboratively by programmers who believe that their code should be shared in as open and transparent a process as possible has resulted in the creation of tools that everyone from the NSA to Facebook uses to annihilate any semblance of individual privacy. But whats even more ironic, and fascinating, is the sight of intelligence agencies like the NSA and CIA joining in and becoming integral players in the world of open source big data software. The NSA doesnt just use Hadoop. NSA programmers have improved and extended Hadoop and donated their changes and additions back to the larger community. The CIA actively invests in start-ups that are commercializing Hadoop and other open source projects.

Theyre all in it together. The spooks and the social media titans and the online commerce goliaths are collaborating to improve data-crunching software tools that enable the tracking of our behavior in fantastically intimate ways that simply werent possible as recently as four or five years ago. Its a new military industrial open source Big Data complex. The gift economy has delivered us the surveillance state.

Hadoops earliest roots go back to 2002, when Doug Cutting, then the search director at the Internet Archive, and Michael Cafarella, a graduate student at the University of Washington, started working on an open-source search engine called Nutch. But the project did not get serious traction until Cutting joined Yahoo and began to merge his work into Yahoos larger strategic goal of improving its search engine technology so as to better compete with Google. Significantly, Yahoo executives decided not to make the project proprietary. In 2006, they blessed the formation of Hadoop, an open-source project managed under the auspices of the Apache Software Foundation. (For a much more detailed look at the history of Hadoop, please read this four-part history of Hadoop at GigaOm.)

Hadoop is basically a nifty hack. The definition, per Wikipedia, is surprisingly simple: It supports the running of applications on large clusters of commodity hardware. Bottom line, Hadoop provides a means for distributing both the storage and processing of an enormous amount of data over lots and lots of relatively inexpensive computers. Hadoop turned out to be cheap, fast and scalable meaning it could expand smoothly in capacity as the flows of data it was crunching burgeoned in size, simply though plugging in extra computers to the network. Hadoop was also fundamentally modular different parts of it could be easily replaced by custom designed chunks of software, making it seamlessly adaptable to the individual circumstances of different corporations or government agencies.

Hadoops debut was timely, addressing not only the problems Yahoo faced in managing the enormous amounts of data produced by its users, but also those that the entire Internet industry was simultaneously struggling to cope with. Basically, the Internet had become a victim of its own success. The enormous flows of data generated by users of the likes of Facebook and Twitter far overwhelmed the ability of those companies to make sense of it. There was too much coming in too fast. Hadoop helped companies cope with the tsunami it was, in the words of Jeff Hammerbacher, an early employee of Facebook, our tool for exploiting the unreasonable effectiveness of data.

Before Hadoop, you were at the mercy of your data. After Hadoop, you were in charge. You could figure out all kinds of interesting things. You could recognize patterns in the data and start to make inferences about what might happen if you made tweaks to your product. What did users do when the interface was adjusted like this? What kinds of ads made them more likely to pull out their credit cards? What did that batch of millions of Verizon calls reveal about the formation of a potential terrorist cell? Facebook wouldnt be able to exploit the insights of its so-called social graph without tools like Hadoop.

Hadoop has become the de facto standard tool for cost-effectively processing Big Data, says Raymie Stata, who served as chief technology officer at Yahoo before eventually starting his own Hadoop-focused start-up, Altiscale. And the significance of being able to cheaply process Big Data, to accurately measure what your users are doing, he added, is a big deal.

Originally posted here:
Netflix, Facebook — and the NSA: They’re all in it together

Related Posts

Comments are closed.