Computers have proven competence—no, fluency—in yet another aspect of human life: writing. Narrative Science, a Chicago-based startup, has developed an innovative platform that writes reported articles in eerily humanlike cadence. Wherever there is data, Narrative Science founders say, their software can generate a prose analysis that's robust, reliable, and readable. They claim their technology will reshape our relationship to data, media, and the way we consume information
Birnbaum and Hammond, both Yale-educated professors of computer science, have academic backgrounds in linguistic systems—and their serious interest in the science of story arc is apparent at Narrative Science. Here, because they each contribute such valuable work, writers and coders inhabit the same hierarchical plane. Programmers are crucial because they maintain and improve the robust authoring platform that is the company's foundation. This foundation is enormously powerful. "We've created a horizontal platform that's vertically agnostic, industry agnostic," CEO Stuart Frankel told me. "We can write just about any kind of content, using any kind of data." But each client not only has different rules—house style, publication tone, specialized vocabulary. They also tell different kinds of stories. That's why Narrative Science needs journalists.
"Data is tremendously valuable," he told me. "It's unbelievably valuable. But it's not valuable as a spreadsheet of numbers. It's valuable based on the insights that you can glean from it." We're swimming in numeric data, he insists, almost drowning in it—which strikes him as odd because most people don't actually like numbers very much. Spreadsheets confound us because human beings think in stories. So, in Hammond's view, wherever there are numbers, we should have stories instead--and that's where Narrative Science comes in. "In the long run," he said, "our technology ends up being the mediator between data and the human experience."When I ask him what this means for human writers, he points out that his work has long been a collaboration between computer scientists and journalists. In his ongoing work at Northwestern'sIntelligent Information Laboratory, which he co-chairs with Narrative Science Chief Scientific Advisor Larry Birnbaum, he routinely partners with students and faculty at the University's Medill School of Journalism to create from "cross-functional teams" of writers and coders. (This itself is a pioneering move, as journalists and computer scientists tend not to cross paths in scholarship or public life.)
When Narrative Science inks a deal with a new client, their writers begin work customizing the existing platform within a configuration layer. House style—how to format names and dates, when to italicize, and so on—is the easy part. What takes more time is establishing the facts and inferences that will conceivably be drawn from client data, as well as a "constellation" of possible story angles through which the data might be presented. In the case of baseball, this means "all the scenarios that might be derived from the raw data of a box score": the slugfest, the shutout, the pitcher's duel, the back-and-forth, postponed by rain, on and on.
In this way, Narrative Science writers don't think about specific stories as much as they outline a web of story possibilities. "They know how to configure our technology to allow them to become what are essentially meta-journalists," Frankel told me, "people who can write millions of stories opposed to a single story at a time." As the technology progresses, we may see more and more writers working on this macro level.
But using Narrative Science to write baseball games is a little like hammering a nail with an atom bomb. The platform's inference engine, Hammond says, is supported by "hardcore data analytics"—it can handle vast, truly complex information, data sets that would boggle any human mind. In this regard, the platform may one day serve as a kind of all-star assistant for human journalists.
Imagine, for instance, the prospect of deducing how Twitter users feel about the Republican presidential candidates on a particular day. A human journalist simply couldn't do it—trying to monitor any significant sample size would be impossible. Twitter moves so fast, and at such a high volume, that it eludes us. The problem with social media," Hammond writes on his blog, "is that there's so damned much of it."Now consider how valuable this kind of data-combing could be for investigative journalists. In his novelThe Pale King, the late David Foster Wallace argued that the era of secrecy is over: The post-Watergate government hides secrets in plain sight, obscuring them in an unchartable morass of freely available information. The result? We lose interest, civilians and journalists and activists alike.
"One of the great and terrible PR discoveries in modern democracy," the book's narrator tells us, "is that if sensitive issues of governance can be made sufficiently dull and arcane, there will be no need for officials to hide or dissemble, because no one not directly involved will pay enough attention to cause trouble. No one will pay attention because no one will be interested...we recoil from the dull."
Both Hammond and Frankel insisted that, while Narrative Science will certainly replace some types of human-generated writing, the stories they're most excited about are the ones journalists rarely cover. Because of readership expectations, no journalist would write a story with relevance to only one person, or a few—sports writers, for instance, don't write about Little League games in the first place. That's why the company's putting special effort into what they call "audience of one" applications—narratives that bring professional-caliber prose insight where right now we only have confusing data.
CHARTING THE SEAS OF BIG DATA
Though computer authoring will almost certainly reshape our relationship to content, Narrative Science will also have huge impact in the growing arena of corporate data collection and management. "We're looking at a situation where every company worth its salt right now is metering and monitoring their business processes, and amassing huge databases of information," Hammond told me. Cost, production, sales, and earnings figures are scrupulously measured over in ever-broadening array of categories, with increasingly sharp detail. Frankel says the standard business mindset is to collect "as much data as possible" in order to become more competitive and profitable.
But here's the strange thing about our current moment: Though companies invest heavily in data collection, they can only work their findings in very limited ways. Since there's a deluge of information, much of it radically new, many data collectors simply throw up their hands. "It's painful to see how much data has gone fallow," Hammond says. Gleaning understanding from these lodes of information—what Hammond calls "big data"—is a primary focus for Narrative Science.
That's why Hammond yearns to improve the software until it can look for insights that haven't yet occurred to its creators—the Rumsfeldian "unknown unknowns" that continually elude us. "We can't do it now," he told me, "but the entire notion behind the platform is to get there." "As the system becomes smarter and smarter," Frankel predicted, "it will be able to draw on data that it analyzes to make its own conclusions." Eventually, he said, the platform will be able to "draw some conclusions without even first knowing what the subject matter is."
Frankel told me that the platform already works with some "unstructured data"—it can understand the driving "sentiment" within a Tweet or blog comment, for instance. But further developments in computer understanding of human language could blow the current technology open. When Narrative Science can scan written documents with the same comprehension it brings to number sets, its viability increases dramatically.
Taken together, these two advancements—the ability to drawn unique conclusions and the ability to work with difficult unstructured data—would render an astoundingly human authoring platform. One that could read, say, all the artificial intelligence scholarship ever written in an afternoon, and then use it to make original claims.
As a journalist and fiction writer, it of course struck me to think about the relevance of all of this to what I do. I arrived at the Chicago office prepared to have my own biases confirmed—that the human mind is a sacred mystery, that our relationship to words is unique and profound, that no automaton could ever replicate the writerly experience. But speaking with Hammond, I realized how much of the writing process—what I tend to think of as unpredictable, even baffling—can be quantified and modeled. When I write a short story, I'm doing exactly what the authoring platform does—using a wealth of data (my life experiences) to make inferences about the world, providing those inferences with an angle (or theme), the creating a suitable structure (based on possible outcomes I've internalized from reading and observing and taking creative writing classes).
Besides, the best journalism is always about people in the end—remarkable individuals and their ideas and ideals, our ongoing, ever-changing human experience. In this, Frankel agrees. "If a story can be written by a machine from data, it's going to be. It's really just a matter of time at this point," he said. "But there are so many stories to be told that are not data-driven. That's what journalists should focus on, right?"
And we will, we'll have to, because even our simplest moments are awash in data that machines will never quantify—the way it feels to take a breath, a step, the way the sun cuts through the trees. How, then, could any machine begin to understand the ways we love and hunger and hurt? The net contributions of science and art, history and philosophy, can't parse the full complexity of a human instant, let alone a life. For as long as this is true, we'll still have a role in writing.