Zeynep Tufekci [ Machine Learning ] I started my first job as a computer programmer in my very first year of college — basically, as a teenager.
Soon after I started working, writing software in a company, a manager who worked at the company came down to where I was, and he whispered to me, “Can he tell if I’m lying?” There was nobody else in the room.
“Can who tell if you’re lying? And why are we whispering?”
The manager pointed at the computer in the room. “Can he tell if I’m lying?”
That manager was having an affair with the receptionist.
And I was still a teenager. So I whisper-shouted back to him, “Yes, the computer can tell if you’re lying.”
I laughed, but actually, the laugh’s on me. Nowadays, there are computational systems that can suss out emotional states and even lying from processing human faces.
Advertisers and even governments are very interested.
I had become a computer programmer because I was one of those kids crazy about math and science. But somewhere along the line I’d learned about nuclear weapons, and I’d gotten really concerned with the ethics of science. I was troubled. However, because of family circumstances, I also needed to start working as soon as possible. So I thought to myself, hey, let me pick a technical field where I can get a job easily and where I don’t have to deal with any troublesome questions of ethics. So I picked computers.
Well, ha, ha, ha! All the laughs are on me. Nowadays, computer scientists are building platforms that control what a billion people see every day. They’re developing cars that could decide who to run over. They’re even building machines, weapons, that might kill human beings in war. It’s ethics all the way down.
Machine intelligence is here. We’re now using computation to make all sort of decisions, but also new kinds of decisions. We’re asking questions to computation that have no single right answers, that are subjective and open-ended and value-laden.
We’re asking questions like, “Who should the company hire?” “Which update from which friend should you be shown?” “Which convict is more likely to reoffend?” “Which news item or movie should be recommended to people?”
Look, yes, we’ve been using computers for a while, but this is different. This is a historical twist, because we cannot anchor computation for such subjective decisions the way we can anchor computation for flying airplanes, building bridges, going to the moon. Are airplanes safer? Did the bridge sway and fall? There, we have agreed-upon, fairly clear benchmarks, and we have laws of nature to guide us. We have no such anchors and benchmarks for decisions in messy human affairs.
To make things more complicated, our software is getting more powerful, but it’s also getting less transparent and more complex. Recently, in the past decade, complex algorithms have made great strides. They can recognize human faces. They can decipher handwriting. They can detect credit card fraud and block spam and they can translate between languages. They can detect tumors in medical imaging. They can beat humans in chess and Go.
Much of this progress comes from a method called “machine learning.” Machine learning is different than traditional programming, where you give the computer detailed, exact, painstaking instructions. It’s more like you take the system and you feed it lots of data, including unstructured data, like the kind we generate in our digital lives. And the system learns by churning through this data. And also, crucially, these systems don’t operate under a single-answer logic. They don’t produce a simple answer; it’s more probabilistic: “This one is probably more like what you’re looking for.”
Now, the upside is: this method is really powerful. The head of Google’s AI systems called it, “the unreasonable effectiveness of data.” The downside is, we don’t really understand what the system learned. In fact, that’s its power. This is less like giving instructions to a computer; it’s more like training a puppy-machine-creature we don’t really understand or control. So this is our problem. It’s a problem when this artificial intelligence system gets things wrong. It’s also a problem when it gets things right, because we don’t even know which is which when it’s a subjective problem . We don’t know what this thing is thinking.
So, consider a hiring algorithm — a system used to hire people, using machine-learning systems. Such a system would have been trained on previous employees’ data and instructed to find and hire people like the existing high performers in the company . Sounds good. I once attended a conference that brought together human resources managers and executives, high-level people, using such systems in hiring. They were super excited. They thought that this would make hiring more objective, less biased, and give women and minorities a better shot against biased human managers.
And look — human hiring is biased. I know. I mean, in one of my early jobs as a programmer, my immediate manager would sometimes come down to where I was really early in the morning or really late in the afternoon, and she’d say, “Zeynep, let’s go to lunch!” I’d be puzzled by the weird timing. It’s 4pm. Lunch? I was broke, so free lunch. I always went. I later realized what was happening. My immediate managers had not confessed to their higher-ups that the programmer they hired for a serious job was a teen girl who wore jeans and sneakers to work. I was doing a good job, I just looked wrong and was the wrong age and gender.
So hiring in a gender- and race-blind way certainly sounds good to me. But with these systems, it is more complicated, and here’s why: Currently, computational systems can infer all sorts of things about you from your digital crumbs, even if you have not disclosed those things. They can infer your sexual orientation, your personality traits, your political leanings. They have predictive power with high levels of accuracy. Remember — for things you haven’t even disclosed. This is inference.
I have a friend who developed such computational systems to predict the likelihood of clinical or postpartum depression from social media data. The results are impressive. Her system can predict the likelihood of depression months before the onset of any symptoms — months before. No symptoms, there’s prediction. She hopes it will be used for early intervention. Great! But now put this in the context of hiring.
So at this human resources managers conference, I approached a high-level manager in a very large company, and I said to her, “Look, what if, unbeknownst to you, your system is weeding out people with high future likelihood of depression? They’re not depressed now, just maybe in the future, more likely. What if it’s weeding out women more likely to be pregnant in the next year or two but aren’t pregnant now? What if it’s hiring aggressive people because that’s your workplace culture?” You can’t tell this by looking at gender breakdowns. Those may be balanced. And since this is machine learning, not traditional coding , there is no variable there labeled “higher risk of depression,” “higher risk of pregnancy,” “aggressive guy scale.” Not only do you not know what your system is selecting on, you don’t even know where to begin to look. It’s a black box. It has predictive power, but you don’t understand it .
“What safeguards,” I asked, “do you have to make sure that your black box isn’t doing something shady?” She looked at me as if I had just stepped on 10 puppy tails.
She stared at me and she said, “I don’t want to hear another word about this.” And she turned around and walked away. Mind you — she wasn’t rude. It was clearly: what I don’t know isn’t my problem, go away, death stare.
Look, such a system may even be less biased than human managers in some ways. And it could make monetary sense. But it could also lead to a steady but stealthy shutting out of the job market of people with higher risk of depression. Is this the kind of society we want to build, without even knowing we’ve done this, because we turned decision-making to machines we don’t totally understand?
Another problem is this: these systems are often trained on data generated by our actions, human imprints. Well, they could just be reflecting our biases, and these systems could be picking up on our biases and amplifying them and showing them back to us , while we’re telling ourselves, “We’re just doing objective, neutral computation.”
Researchers found that on Google, women are less likely than men to be shown job ads for high-paying jobs . And searching for African-American names is more likely to bring up ads suggesting criminal history, even when there is none. Such hidden biases and black-box algorithms that researchers uncover sometimes but sometimes we don’t know, can have life-altering consequences.
In Wisconsin, a defendant was sentenced to six years in prison for evading the police. You may not know this, but algorithms are increasingly used in parole and sentencing decisions. He wanted to know: How is this score calculated? It’s a commercial black box. The company refused to have its algorithm be challenged in open court . But ProPublica, an investigative nonprofit, audited that very algorithm with what public data they could find, and found that its outcomes were biased and its predictive power was dismal , barely better than chance, and it was wrongly labeling black defendants as future criminals at twice the rate of white defendants .
So, consider this case: This woman was late picking up her godsister from a school in Broward County, Florida, running down the street with a friend of hers. They spotted an unlocked kid’s bike and a scooter on a porch and foolishly jumped on it. As they were speeding off, a woman came out and said, “Hey! That’s my kid’s bike!” They dropped it, they walked away, but they were arrested.
She was wrong, she was foolish, but she was also just 18. She had a couple of juvenile misdemeanors. Meanwhile, that man had been arrested for shoplifting in Home Depot — 85 dollars’ worth of stuff, a similar petty crime. But he had two prior armed robbery convictions. But the algorithm scored her as high risk, and not him. Two years later, ProPublica found that she had not reoffended. It was just hard to get a job for her with her record. He, on the other hand, did reoffend and is now serving an eight-year prison term for a later crime. Clearly, we need to audit our black boxes and not have them have this kind of unchecked power.
Audits are great and important, but they don’t solve all our problems. Take Facebook’s powerful news feed algorithm — you know, the one that ranks everything and decides what to show you from all the friends and pages you follow. Should you be shown another baby picture?
A sullen note from an acquaintance? An important but difficult news item? There’s no right answer. Facebook optimizes for engagement on the site: likes, shares, comments.
In August of 2014, protests broke out in Ferguson, Missouri, after the killing of an African-American teenager by a white police officer, under murky circumstances. The news of the protests was all over my algorithmically unfiltered Twitter feed, but nowhere on my Facebook. Was it my Facebook friends? I disabled Facebook’s algorithm, which is hard because Facebook keeps wanting to make you come under the algorithm’s control, and saw that my friends were talking about it. It’s just that the algorithm wasn’t showing it to me. I researched this and found this was a widespread problem.
The story of Ferguson wasn’t algorithm-friendly. It’s not “likable.” Who’s going to click on “like?” It’s not even easy to comment on. Without likes and comments, the algorithm was likely showing it to even fewer people, so we didn’t get to see this. Instead, that week, Facebook’s algorithm highlighted this, which is the ALS Ice Bucket Challenge. Worthy cause; dump ice water, donate to charity, fine. But it was super algorithm-friendly . The machine made this decision for us.
A very important but difficult conversation might have been smothered, had Facebook been the only channel.
Now, finally, these systems can also be wrong in ways that don’t resemble human systems. Do you guys remember Watson, IBM’s machine-intelligence system that wiped the floor with human contestants on Jeopardy? It was a great player. But then, for Final Jeopardy, Watson was asked this question: “Its largest airport is named for a World War II hero, its second-largest for a World War II battle.”
(Hums Final Jeopardy music)
Chicago. The two humans got it right. Watson, on the other hand, answered “Toronto” — for a US city category! The impressive system also made an error that a human would never make, a second-grader wouldn’t make.
Our machine intelligence can fail in ways that don’t fit error patterns of humans, in ways we won’t expect and be prepared for . It’d be lousy not to get a job one is qualified for, but it would triple suck if it was because of stack overflow in some subroutine.
In May of 2010, a flash crash on Wall Street fueled by a feedback loop in Wall Street’s “sell” algorithm wiped a trillion dollars of value in 36 minutes. I don’t even want to think what “error” means in the context of lethal autonomous weapons .
So yes, humans have always made biases. Decision makers and gatekeepers, in courts, in news, in war … they make mistakes; but that’s exactly my point. We cannot escape these difficult questions. We cannot outsource our responsibilities to machines.
Artificial intelligence does not give us a “Get out of ethics free” card.
Data scientist Fred Benenson calls this math-washing . We need the opposite. We need to cultivate algorithm suspicion, scrutiny and investigation . We need to make sure we have algorithmic accountability, auditing and meaningful transparency. We need to accept that bringing math and computation to messy, value-laden human affairs does not bring objectivity; rather, the complexity of human affairs invades the algorithms. Yes, we can and we should use computation to help us make better decisions. But we have to own up to our moral responsibility to judgment, and use algorithms within that framework, not as a means to abdicate and outsource our responsibilities to one another as human to human.
Machine intelligence is here. That means we must hold on ever tighter to human values and human ethics.
Tufekci is a contributing opinion writer at the New York Times, an associate professor at the School of Information and Library Science at University of North Carolina, Chapel Hill, and a faculty associate at Harvard’s Berkman Klein Center for Internet and Society.
Her book, Twitter and Tear Gas: The Power and Fragility of Networked Protest, will be published in 2017 by Yale University Press. Her next book, from Penguin Random House, will be about algorithms that watch, judge and nudge us.
FEATURED IMAGE CREDIT: Allan Chatto