Archive for October, 2014

Nate Silver and NOT Elections: Computers Can Think

October 23, 2014

This is a post in response to “Rage Against the Machines” by Nate Silver on his statistical prediction website fivethirtyeight.com.

I work with computers, models and prediction a lot — not as much as Mr. Silver, but enough to know what I’m doing. I also find artificial intelligence and the predictive capabilities of computers fascinating, and love talking about them. So, when fivethirtyeight published Mr. Silver’s chapter on the utility of computers for making predictions, I was intrigued.

I want to start off by saying that I respect Mr. Silver’s general approach to prediction, as well as most of the themes outlined in that chapter (to be clear, I’ve only read the free chapter and skimmed a little bit of the Climate Science chapter from the Signal and the Noise, Mr. Silver’s book on our ability to make predictions). I especially appreciate his thesis — that computers and humans predict in complementary ways and that the best performance often arises when a computer’s brute-force calculations complement a human being’s long-term strategic predictions.

However, I take issue with the following observation Mr. Silver makes towards the end of the chapter:

“Be wary, however, when you come across phrases like “the computer thinks the Yankees will win the World Series.” If these are used as shorthand for a more precise phrase (“the output of the computer program is that the Yankees will win the World Series”), they may be totally benign. With all the information in the world today, it’s certainly helpful to have machines that can make calculations much faster than we can.

But if you get the sense that the forecaster means this more literally—that he thinks of the computer as a sentient being, or the model as having a mind of its own—it may be a sign that there isn’t much thinking going on at all.”

A few sentences later, Mr. Silver quotes Kasparov as saying that it is possible that nobody will ever design a computer that thinks like a human being. In combination, these quotes represent what I feel is a misguided belief in the power of human cognition as separate from computer cognition.

I first encountered this belief in college, as a cognitive science major. My major was new at college, and the curriculum design was a bit strange: students had to pick from four “menus” of computer science, philosophy, psychology, linguistics, and neuroscience. This design is good in principle, but in following it I experienced the strangest case of major cognitive dissonance:

In my philosophy classes, I would learn about the awesome, unknowable power of the brain, a perfectly enclosed room that we can never peek into, sending and receiving messages without revealing anything about its underlying functionality. Human brains were unique, incomprehensibly powerful, and certainly could never be emulated (or simulated) by an actual, constructed computer.

In my computer science classes, I would learn how to emulate human thinking with actual, constructed computers.

What was going on? Either my computer science professors were wrong, or my philosophy ones were, or the truth lay somewhere in the middle. In most cases, you would expect the truth to lie somewhere in the middle. However, I think that in this case, my philosophy professors were wrong when talking about the brain as an unknowable entity, one whose capacity computers can never achieve.

The human brain is certainly an impressive entity: 10 billion cells, 100 trillion connections. It is difficult to grasp that scale of processing power, so our brains (fittingly) balk at reasoning about themselves. We, conveniently, think of the brain as a black box — sights, noises, smells come in, get encoded as electric signals; electric signals come out, get translated into muscular activity. In consequence, when someone comes along and tries to peer inside the black box, we get nervous. It’s impossible to peer inside the black box, we have decided; so we form arguments about why the brain is an unknowable entity.

Let me outline some of these arguments:

At first, philosophers argued that thought is the ultimate authority on existence. I think therefore I am, said Descartes. Everything outside my thoughts may be an illusion but the thoughts themselves are real. While this argument did not stipulate the unknowability of the brain (in some ways, it claimed that our thinking apparatus — which is not necessarily restricted to the brain — was the only truly knowable entity), it did establish a fundamental difference between the brain and everything outside of it.

Much later, the behaviorist school of thought countered that, well, we can actually observe the inputs and outputs to the brain pretty well, so maybe describing the brain is just about measuring those inputs and outputs?

No, countered John Searle! The brain is like a mostly-sealed room. We can pass inputs into the room, and get outputs out of it, but we can’t tell what’s going on inside. Thoughts are sealed away from us.

The neuroscientists working on brain function found some problems with the mostly-sealed-room theory. They have analyzed human vision and hearing systems and, while they found these systems complex, they were able to figure out their structure and function to a great degree. Meanwhile, computer scientists (starting with people like Ada Lovelace and Grace Hopper) also built computers that could emulate many human abilities: performing logical operations, adding numbers, doing statistical analysis.

No, countered many philosophers and analytical thinkers (including Garry Kasparov, whom Mr. Silver quotes above). Those kinds of abilities are “base” — they have nothing to do with higher brain functions like creativity and abstract thought. One can never make a computer that truly excels at what humans excel, for example, playing chess.

Then Feng-hsiung Hsu built a machine that beat the world chess champion. Mr. Silver describes the match in some detail in the post I linked, though his argument seems to be that that machine, Deep Blue, won because of a bug that spooked Garry Kasparov. He alludes to (but never explicitly talks about) the fact that, starting about 2004, computers have not just edged out but outwardly trounced human opponents.

This line of arguments (and the events that disprove them) has a recognizable pattern: people arguing for the intractability of human thinking are losing ground, ever more quickly in recent years, even as they cling to ever more narrowly-defined behavior as “true”, idiosyncratic properties of humans that can never be replicated by a computer. Sure, computers are good at chess, but what about trivia, they say? What about the Turing test? What about art?

In focusing on specific tasks, the adherents of the brains-are-special theory are missing the bigger picture: computers can think, and they think in the same way as humans do. The processes that lead Watson to be good at Jeopardy share a key quality with the processes his competitors, human Jeopardy champions Brad Rutter and Ken Jennings employ, and it’s not just the inputs (Jeopardy answers) and outputs (correct questions). In one sentence, the core of thought is this: using computation and, specifically, statistics to construct models that help the thinker interact with their environment. Or, put more simply: thought is just taking in input from our environment, looking for patterns in that input, and making patterns on top of patterns.

Some of these patterns are already well understood. Earlier, I talked about vision: rod and cone cells in the human brain process light waves and transform them into electrical signals that provide rough, “pixelated” details about our visual environment. Here’s where the first layer of pattern recognition comes in: our brains, over millions of years of evolution, have gotten really good at helping us survive in our environment. One of the key tasks for survival is identifying and classifying objects and other living beings: for example, if you see a predator, run away. And so our visual system evolved to recognize patterns of “pixels” that correspond to living beings moving closer to us, very quickly. Humans who were better at recognizing these patterns could get away from predators faster, survived more, and passed on their genes — along with the evolved visual system — to their offspring.

Over time, humans got very good at interpreting patterns. Turns out, there are a whole lot of different patterns of visual stimulus in the world. There are predators, prey, lovers and friends. There are poisonous plants and nutritious ones. It was inefficient for our visual system to store every single pattern at the same level. In response to the evolutionary pressure to identify many different kinds of visual stimuli, each important for our survival, we evolved higher-level patterns.

We learned that plants that may look different have a similar function, and began to associate those plants together. Thus, the model of a plant was born — probably, at first, “poisonous plant” and “edible plant”, which then collapsed into “plant” as we grew less focused on the special task of dealing with flora. Same with models of “animal”, “friend”, “potential mate,” and so on. We also learned models of objects and materials, which helped us build tools to ward off predators and rise to the top of the food chain.

Then, over the course of thousands more years, we constructed ever more complex patterns. Patterns that helped us figure out how to make up new tools. Patterns that helped us communicate our patterns with fellow humans, for hunting or building together. Our brains created models for art and science and engineering. And along the way, we built another model: one that realized that all these patterns were very helpful for our protection, and coalesced into a notion of “I.” Consciousness arose, and we became aware of our world not just as a bunch of stimuli, but as an extremely complex model, a nested set of patterns that includes physical phenomena and nation states and general relativity.

That is just what computers are doing. Our search engines and our chess playing programs and our automated medical analysts are learning and communicating about patterns. They take input data (like text in a document, or set of symptoms), run it through a statistical engine, and see what pattern the input data matches. These computations are the model-level building blocks of thought and consciousness. As we add ever more processing power, as we increase the speed and the memory banks and the parallel functionality of computers, they will be able to learn ever more complex patterns, to stack those patterns on top of each other and form models, to use models to interpret their environment. Until one day, a machine realizes that all this input is key to its functionality, and forms a new model for organizing its thoughts: I exist.

Just a Reminder that there are Still Things Worth Fighting for

October 23, 2014

Here’s a link to, imo, a pretty great speech by Wendy Davis:

I think she does a great job, and I encourage you to watch the whole thing, or at least the last half (I know, all of us have busy lives). I think the thing that this speech reminded me of especially, is how important it is to keep fighting for equality, whether in Texas, New Hampshire, Russia, or wherever.

It’s easy to give up, or at least to get complacent. The Democrats may lose the Senate. The government’s in gridlock. Progress seems agonizingly slow sometimes — one step forward, two steps sideways. We have a progressive president who authorizes drone strikes. We have two major political parties in the US, one of which is much more progressive than the other, but both of which are heavily dependent on moneyed special interests.

But we can’t give up. We can do so much, with so little. If you have time to vote, to google your candidates and get informed, to make a call or two to undecided voters in swing states, to have an honest conversation about politics with your friends — that’s what will keep us moving forward, towards, dare I say it, a better tomorrow. Or if not tomorrow, then the day after, or the day after that. It can be extremely frustrating to watch progress inch by when there are so many issues that need urgent addressing, right now, but these small changes will and do add up.

Wendy Davis may lose in November, but we will remember her filibuster. The next campaign, the next woman who runs in Texas, will feel more empowered to speak up about abortion and women’s rights. And so on and on, these small steps of social change, these small grains of progress, will combine into something truly awesome — a future where the rich don’t earn an order of magnitude or two more than the poor; a future where there IS equal pay for equal work; a future where we are taking care of our planet instead of stripping it dry. A future where we can look back on our lives and say, we helped change things for the better.

Pres. Obama said something early in his presidency. He urged all of us to be the change, with him. I value that one brief phrase more than most other things he’s done or said (ok, not more than Obamacare, but it’s up there). It was never going to be about him, about one candidate or one law making our lives better. It was about all of us, doing the hard work to transform our country and our world into a better place to live. I hope you keep these words in mind, not just this election season, but whenever social change seems impossible and progress seems fleeting. With small steps, we will get to a better place, together.

Ada Lovelace Day: Advanced Language Technologies with Prof. Lee

October 16, 2014

This post is in response to a prompt for Ada Lovelace Day: writing about a woman in science, technology, engineering or maths whom I admire. I would like to write about Prof. Lillian Lee at Cornell University, whose class Advanced Language Technologies made me believe I could do math again.

As a child, I had this fascination / veneration of mathematics. My dad got his Master’s in math, and he would tutor me by giving me hard problems. Problems I could never be expected to solve. It was difficult, and frustrating, and the fact that we never talked about it contributed both to a worsening relationship with my father and to a feeling that I was hopeless at the subject. I did well at school, sure, but my parents were quick to point out that this was “weak American education,” that “higher maths” was this beautiful thing that was hopelessly outside of my reach. Their words rang true when I went to college and did disastrously in my Linear Algebra / Vector Calculus class. I remember getting like a 50% on the first assignment — my first failing grade in ages! — and asking the Professor for help, and seeing his contempt at my pathetic work. I stuck through the class, just barely, then did not return for the second semester, convinced math was beyond me.

And yet, math was useful and beautiful and I kept coming back to it. I learned that, with math, one could analyze and even predict the behavior of human societies. I learned about complex systems, and how the interaction of simple rules led to the irreducible beauty of natural phenomena from atomic lattices to natural habitats to riots. I wanted to study human behavior at the group scale, to understand a sort of physics of sociology. That’s what I told my mom I would be working on in graduate school (I wasn’t talking so much to my dad at the time). She said that nobody was interested in the subject; that I should study linguistics, as she had; and that at any rate, I did not have the math aptitude to study something like that. Her words hurt.

Still, I gave grad school a try. I only got into one graduate program out of the five I applied to, but it was probably my favorite of the five — a program in Information Science at Cornell University, young and small and full of academics asking precisely the kinds of questions I was interested in: what motivates group behavior? How do societies form and collapse? What are the socio-physical forces acting upon friend groups, communities and whole countries to enact global change? The program also had a rigorous course requirement — seven graduate courses, in sub-fields ranging from technology in its sociocultural context (with Prof. Phoebe Sengers, another woman in science who inspired me!) to advanced natural language technologies with Prof. Lee. I remember fellow grad students speaking of Prof. Lee’s class with fear — the math was too hard, her standards too exacting, the subject matter too abstract. It was with a lot of nervousness, remembering my mother’s words about my math-inadequacy, that I went to the first day of class.

I expected twenty students, and was surprised to see only six or seven, including a couple of my friends. Still, the atmosphere was tense — little eye contact, little conversation before class. I remember Prof. Lee going up to the board and starting the first lecture.

Prof. Lee started class off with a Nabokov quote. Then she talked about language, linguistics, and what computer science tries to do differently from computational linguistics, and how it’s better by being simpler. I kept waiting for my eyes to glaze over, for the math to overwhelm me. Instead, Prof. Lee patiently walked us through tf-idf — one of the core formulae in natural language processing, developed by a woman — Karen Spärck Jones. I followed the explanation. I understood.

Surely this was just the first day, I told myself. Surely, things were going to get far too complicated for us later. I went back for the second class.

Prof. Lee had us break up into study groups and tasked each group with compiling lecture notes *ahead of class*, so they could better understand the material. She warned us when a formula was going to be especially difficult (like topic modeling and Latent Dirichlet Allocation) and she encouraged us to work together if we did not understand a concept or a problem. She was not condescending. She talked fast and thought fast, and sure she was intimidating, but she was kind and patient with all her students — a fact I did not realize for a long time, so intimidated I was by the class subject matter, so sure I was that I was going to fail.

Prof. Lee was also tough. I remember her calling me and my friends on the carpet when one of us had plagiarized notes from a textbook. Again, though, she did not belittle us or humiliate us — she expected us to do better. We worked together with the student who plagiarized to help them understand why it was wrong to do so — in their culture, copying a textbook was the norm — and we did not repeat our mistake.

Still, when time came for the midterm, I felt pretty hopeless. The questions were hard and I did not have a good grasp of all of the material — I hadn’t studied hard enough. I got a grade in the 30%s, not failing because of the curve, but far from an A.

It was high time for me to give up, to either accept a low grade or just drop the class altogether and find another way to satisfy that course requirement. And yet, I didn’t. Strangely, I felt motivated to study harder. I paid close attention to the complex lectures on Latent Semantic Analysis and context-free grammar. I read over my notes and did test problem after test problem. I stayed late for study parties with fellow grad students and started attending my office — something I hadn’t done before.

The final exam was brutal. I literally walked uphill, in a snowstorm, to the exam hall, where I was presented with 5 advanced problems in Advanced Language Technologies. I solved one to reasonable satisfaction, and made notes on the rest. It was the best I could do, but I knew that I was dealing with hard material. I walked out of the class with an A-, again thanks to the generous curve, feeling disappointed in myself for not really earning the grade, but at least proud of having learned something.

The next semester, Prof. Lee invited me to her seminar.

I expected that she, like most other authority figures in my life, would look down on my pathetic math aptitude. Instead, she wanted me to read cutting-edge research in the field! I joined, hesitant, but ever more excited. We met, talked about papers, joked. Prof. Lee kept things going at a quick pace, not letting us slack off, but always inviting conversation in computer science, even when it veered off into tangents about algorithm performance and syntax structure. I read the papers, even the ones full of math formulae, and slowly, they began to make sense to me.

At around the same time, I found that I was no longer struggling in my other grad school classes, especially the mathematical ones — I understood the material, I was able to read cutting-edge research and critique it. It wasn’t all thanks to Prof. Lee’s classes, but a good chunk of my newfound comfort with abstract topics like term-document matrices and linear programming was due to her teaching and to my hard work in her classes — hard work I would have never dared to do without her encouragement.

My thesis was, in a large part, a mathematical proof. I became the math guy on several of my academic projects. Today, I have a job at the forefront of industrial social media analytics, heavy in mathematical analysis. I explain multidimensional matrices to our company’s lawyers as part of developing our patent portfolio. Just this summer, I supervised a student who did some excellent Latent Dirichlet Allocation work on our internal data set to demonstrate the flow of news topics between journalists and high-profile media figures on social media. At no point did I stop to think, maybe I can’t do it. Maybe I am not good enough at math. I have Prof. Lee to thank for that.

You are an inspiration, Professor, and I hope you continue to enjoy an awesome, successful academic career, and introduce many more students — eager or nervous — to the mathematical analysis of natural language.