Big Data, Politics and My Favorite Tech School
If you didn’t know–the Internet itself was born out of DARPA (who was Advanced Research Projects Agency (ARPA) within the DoD) when the US gov funded a project into a really fun topic called “packet switching” (the old telephone system used circuit switching end to end communications and packet switching meant stuff could be grouped into (digital) packets and sent by any connected node) which would allow universities and research labs to share information–quickly and efficiently. While massive computers had already been ticking data onto Hollerith cards already, big data was a term not yet used.
The original Request For Quote (RFQ) was sent to 140 bidders that only 12 (wiki) responded to (much like Ga Tech here), four were chosen, and one award to BBN acquired by Raytheon in 2009–the project successfully ended with several computers (gateways) relaying those packets. It subsequently made this message, endless E-mails, Cyber Monday’s, electronic payments, Mobile banking, social networking, peer-to-peer sharing, and many more great and awesome things possible. Just one caveat – we are amassing data at exponential rates and data can help us understand some of our core basics –isn’t the amount of knowledge you now receive reshaping your world? Think on the smallest scale; how many images of puppies gone owner-less has changed your ideas of dog breeding? Is the world around you a much larger place than the radius which surrounded you before being socially networked with your friends, your friend’s friends and subsequently the world? We’re talking self-optimization, medical breakthrough technology, insight into building a better government that supports focused areas that are identified by data attributes/facts, and ideas and concepts that we haven’t even considered.
Enter The Georgia Institute of Technology and its little $2.7m pot of money to work with DARPA on figuring out how to handle “big data.” Big data doesn’t mean all those books on your electronic device. Nor does it mean the last 12 years of your bank account – although if you took 5 million subscribers bank data and crossed it with every US census ever taken, anonymous medical records with age, gender, race, job type,etc of 311,591,917 US citizens, mortality data, overlay GIS data, top US cities data, home ownership data from public sites, traffic and transit data, climate… exhausting the brain, much less the underlying infrastructure that must not only contend with storage but herewith be able to analyze, visualize and show correlations, ” allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions”. “Scientists regularly encounter limitations due to large data sets in many areas, including meteorology, genomics, connectomics, complex physics simulations, and biological and environmental research.” Look, there is a new word: connectomics. It comes from the neurology field, but it feels so similar to data clustering that I think I’ll recycle/reuse.
But if the current administration is cutting back on DoD spending then what’s the store? Obama is a nerd at heart. Okay, maybe not, but he knows the power of big data, and he also knows what happens when you toss a bunch of data heads into the room, “The magic tricks that opened wallets were then repurposed to turn out votes. The analytics team used four streams of polling data to build a detailed picture of voters in key states.” and “Data helped drive the campaign’s ad … rather than rely on outside media consultants to decide where ads should run, [they] purchases on the massive internal data sets. “We were able to put our target voters through some complicated modeling, to say, O.K., if Miami-Dade women under 35 are the targets, [here is] how to reach them,” said one official. As a result, the campaign bought ads to air during unconventional programmings, like Sons of Anarchy [and] The Walking Dead.”. Now you see? Obama won the election, and Nate Silver knew he would. How did Nate know that Obama would win? He’s a nerd too, “But here is the absolute, undoubted winner of this election: Nate Silver and his running mate, big data.” So limited government spending or not; our president knows the value of big data. The DoD understands that big data will not only apply causative data analysis across the board but imagine analyzing an entire war-fighting force at its fundamental levels? Physiological? Neurological? What events and situations lead to a better soldier?
“Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks. The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s; as of 2012, every day 2.5 quintillion bytes of data were created.” Just imagine that the small pieces of data your cell phone collect but even the places you go which collect data about you. Imagine being able to say which environments promote the optimum healthy lifestyle credibly, could potentially debunk myths, give medical practitioners better insights into not only the human in front of them but the attributes which surround the patient to help them understand the whole picture. What will data say about our generation? What will data about us now provide for better environments for our successors?
Why are suicides exceeding motor vehicle crashes as the leading cause of injury death in the U.S.? Why was there an increase in hanging and suffocation in middle-aged adults? This is the idea, as I’m using this as an example, ” Recognition of the changes in suicide methods is a critical precursor to developing prevention programs and services,” (Johns Hopkins University) – but the only way to begin to analyze all the clues is to have all the data available and perhaps, of sorts, to put it all together like a jigsaw puzzle slowly reveals an image (visualization) so that you can understand causation, relational factors, physiological, proximity, precursors, …. what I’m saying is..big data!
Now GaTech will be able to, “focus on producing novel machine-learning approaches capable of analyzing very large-scale data. [and] pursue development of distributed computing methods that can process data-analytics algorithms very rapidly by simultaneously utilizing a variety of systems, including supercomputers, parallel-processing environments, and networked distributed computing systems.” All this with open source technology–a component of next-generation U.S. Government critical acquisition plans as defined by Frank Kendall the Under Secretary of Defense Acquisition, Technology & Logistics. [page 6]
“The Georgia Tech XDATA effort will build upon [a previous] research initiative, a 17-university program led by Georgia Tech and funded by the National Science Foundation and the Department of Homeland Security…involves enabling these algorithms to run on a networked distributed computing system…configuring the software so that it operates on multiple processors simultaneously …ensure that the algorithms solve problems very rapidly – a requirement of the DARPA award.”
What will the $2.7 seed ultimately generate to return us technology much as the original 1968 ARPANET project did? I’m excited about the possibilities, and you should be too!