Seth Stephens-Davidowitz, a data scientist and bestselling author, is known for his brilliant use of data to upend conventional wisdom - often with humorous, surprising, and occasionally shocking results. His latest book, Who Makes the NBA, uses data to interrogate some of basketball’s biggest questions, consistently yielding unexpected insights. Here’s the kicker - he wrote the entire book in just 30 days after discovering Code Interpreter.
Unsurprisingly for a former quant, I had a blast chatting to Seth. Topics discussed include why so many NBA players are called Chris, whether basketball is due for a Moneyball moment, and why so many of us misunderstand the rags-to-riches story.
I hope you enjoy this conversation as much as I did. We’ve shared some highlights below, together with links & a full transcript. As always, if you like what you hear/read, please leave a comment or drop us a review on your provider of choice.
Highlights
Becoming an NBA Player is Weirdly Easy… If You’re Seven Feet Tall
“If you're under six-foot tall, you have less than one in a million chance of making the NBA. If you're seven feet or above, you have a one in seven chance of making the NBA, which is just wild. There's no other trait that gives you such a great odds of becoming a multi multi-millionaire just with that genetic trait. I don't think there's anything comparable. And then another thing I kind of started... I found that and then I talked to my friend and he's like, one of the things that means is if you're really, really tall, you don't have to be a great athlete to make the NBA because if you have a one in seven chance, if one in seven, seven-footers are in the NBA, you don't have to be that good. You just have to be a one in seven athletic talent.”
Use Your Unfair Advantage
“Sometimes the story of someone who overcame the odds is so compelling that we forget the reason that it's so compelling is because it's not normal. It's hard. It's difficult. I think also people sometimes make the wrong career decisions based on this. They learn about a story that is so compelling that they forget that it's not normal and they end up thinking, making decisions. There's a line in business, use your unfair advantage. So you don't want to do things that are necessarily really hard for you or really lean into your disadvantage even if it would make a better story, you want to pick something where you have an unfair advantage where you have, whether it's capital or your network or the university you went to or your expertise.”
Revealed vs Stated Preferences
“So one of my favorite examples, I compare social media and search. So social media is the ultimate stated preference. So when you're on social media, all your friends see it. It's very public. And search is kind of revealed preference. Nobody's seeing it. So I compare how people describe their husbands on the different sources. So on social media when they're posting, "my husband is", is, "My husband is the best, the greatest, so cute, adorable." And then on search, when it's private, "My husband is a jerk, cheating on me. I can't stand my husband. He's so annoying." And I think that's a great example of the difference based on the incentives you give people, what kind of data you get.”
The Path to Happiness is More Obvious Than You Think
“Do the obvious things that make you happy, that make people happy. And don't trust your gut. I talk about all the research on happiness and the things they find make people happy, being with friends, being near a beautiful lake, hunting, fishing, gardening, things that we've kind of done since our hunter-gatherer days. And I think a lot of people just don't do those things. And you live in a city, you work 80-hour weeks, and you're not seeing your friends very much. And so I definitely encourage people to read the research on happiness. And then if you're not happy, just how much of your week are you spending doing the very obvious things that tend to make people happy?”
Books & Articles Mentioned
Who Makes the NBA?: Data-Driven Answers to Basketball's Biggest Questions; by Seth Stephens-Davidowitz
Don't Trust Your Gut: Using Data to Get What You Really Want in Life; by Seth Stephens-Davidowitz
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are; by Seth Stephens-Davidowitz
What Works on Wall Street: A Guide to the Best-Performing Investment Strategies of All Time; by Jim O’Shaughnessy
Transcript & Links
Jim O’Shaughnessy:
All right. Well, hello everybody. It's Jim O'Shaughnessy with yet another Infinite Loops. I am so excited by today's guest, Seth Stephens-Davidowitz. Did I get that right?
Seth Stephens-Davidowitz:
Close enough. Yeah, close enough.
Jim O’Shaughnessy:
Okay. For a guy whose “O’whatassey”. I get that all the time. So I'm used to it. You are one of my favorite people for so many reasons, but I think one of the main ones is we both love data, data, data, data, more data. It can unlock so many things that you do not intuitively get to listen. You've got BA in philosophy from Stanford. You were Phi Beta Kappa, PhD in Harvard and Economics. Wow. You are quite the overachiever my friend, data scientist, bestselling author, sought after keynote speaker. Also, this is the part that I love, a graduate of the American Comedy Institute.
Seth Stephens-Davidowitz:
Yeah, that is true. Yeah.
Jim O’Shaughnessy:
What a talent stack. Welcome, Seth.
Seth Stephens-Davidowitz:
Well, it was more just, I was so burnt out. I was doing a PhD in economics and I was just so burnt out one summer, I'm like, I can't read another economics paper. I need to do something creative, more creative or different. So I googled comedy stand up classes and I found American Comedy Institute and yeah, it was really fun.
Jim O’Shaughnessy:
I love it.
Seth Stephens-Davidowitz:
It ended with a show in New York City. I kind of rigged the audience. I had all my friends and family there, so everyone just cheered. Even when my jokes sucked, everyone just cheered anyways. But yeah.
Jim O’Shaughnessy:
Yes, I always try to rig things as best I can as well. So we have, we're simpatico there as well. I have a funny story. I have a useless, as it turns out, degree, just a BA, which I barely got frankly only because my mother wanted me to in economics, but I was taking a graduate level course in economics and the guy was going on and on about the model does this, the model does that. And it was typical, I'm 64, so it was when Keynes was still kind of semi important back then. It was just the dawn of the rational expectations guys. But anyway, I raised my hand in the class as he's droning on and on about the model this, the model that, and I said, "I'm sorry professor, and I don't mean to be rude, but have you ever looked at what actual humans do?" And he goes, "maybe you better try comedy instead of economics Mr. O'Shaughnessy."
Well, what I love, I love all your books. I recommend them to people, Who Makes the NBA, Don't Trust your Gut, Everybody Lies. They are just packed full with really unintuitive, non-intuitive truths that I hope we're going to get a chance to talk about all of them today. But I want to start with a comment that you made when you discovered Code Interpreter. You were like, holy shit, this stuff rocks. And then you made yourself a promise. I'm going to write a book in 30 days using AI and the Code Interpreter and you did it and it's a great book. So let's start there. Was it kind of like just this Satori or Eureka moment when you found it?
Seth Stephens-Davidowitz:
Yeah, so I had read these two books. Everybody Lies and Don't Trust Your Gut. And I was really happy, proud of them and they led to all these great opportunities and they were bestsellers and traveling the world, talking and consulting and everything. So I'm like, this is great, but writing a book is just such a pain in the ass and I am just like, I don't know if I can do this again. For me, I'm just a perfectionist. I kind of disappear for years in writing these books and I do a lot of regional research, data analysis and then I came across Code Interpreter and I'm just like, oh my God, things that were taking me four months or all of a sudden taking me four hours, it's crazy. And all the things I don't like doing it was doing. So I don't like cleaning a data set, merging a data, all these annoying kind of rote tasks that are just so big a part of a data scientist's life Code Interpreters just doing it in seconds.
And I'm just like, wow, that just leads me to do the things I actually really like, which is coming up with ideas, testing them, writing, writing them out, explaining them. So the 30-day thing was just, yeah, it just gave me kind of motivation and was kind of a fun way to do it. And yeah, I kind of say it was the best 30 days of my life, which I was explaining to a friend that recently and I just said that like that was a normal thing. And they're like, Seth, that's kind of odd. Most people when they describe the best 30 days of their life, it's like their honeymoon, when their child was born, second semester, senior year. Not I was working on a book for 30 days. That's probably says a lot about me.
Jim O’Shaughnessy:
Oh, you must be just a, but here's the thing, Seth, you are a blast to talk with and at a cocktail party and all of those things. So what some would see, I always, I've been saying a lot recently the geek shall, the geek shall truly inherit the earth because of the power that we now have in our hands with things like Code Interpreter, AI models in general, et cetera. And having written four books. Yeah, I completely agree with you, but man, I have been dreaming of AI and that type of stuff. I keep journals and the early, I didn't call it AI, but one of the earliest references I had was in 1982. Wouldn't it be fucking great if we had a machine that could do all this?
And that's the part about it that really excites me. Your book about the NBA, which we're going to get into in a minute is really fun because it's about the NBA, right? What AI is really, really good at is getting rid of all of that laborious, boring, just time-consuming stuff. One of my big things when I was at OSAM, which we subsequently sold to Franklin Templeton, was we came out of Bear Stearns into the great financial crisis and I kind of sat down with my team and said, well, we're probably not going to sell another long only portfolio for the next three years, so here's what I want you to do. And the thing that they rebelled most against was cleaning the data because it is having done it for my book, What Works on Wall Street? Oh my God. When you start dreaming of the numbers going down a screen and you can see the wrong one in your dream, you know that things are pretty dire, but data is dirty. A lot of people don't get that.
Especially financial data. And to get them a little more revved up, I took, I think it was Apple and put it up on the screen in the conference room and I said, "what's a PE of Apple?" And everyone sort of the traders shouted it out and some of the data guys set a different number. And then I pulled up all the various sources, FactSet, one PE, Reuters, another PE, Financial Times, third PE, none of the PE's matched. I said, "guys, Apple is one of the biggest companies in the world and none of this is cohesive at all." When you start-
Seth Stephens-Davidowitz:
Yeah, I think AI is going to really fix a lot of this, revolutionize a lot of it, get the right numbers, clean the numbers, fix the numbers, explain when the numbers are off, why they might be off. Quantitative finance is definitely an area where AI is coming big time and going to improve things big time, but there're just all kinds of areas like, Code Interpreter was just kind of, yeah, my eureka moment where I'm just, it was like, oh my God, this changes everything. I thought about AI up until that point as people are talking about it, I kind of thought it was cool. I used it a couple times. I wrote some poems for my girlfriend using ChatGPT and they were okay poems, but then Code Interpreter was the first time I'm like, holy cow, this is just going to change absolutely everything in the areas I know. I can just guarantee this is a total revolution in the creative process.
Jim O’Shaughnessy:
And I think a lot of things get over hyped. I honestly think even with all, now, some... Much of the hype is bullshit in my opinion because, well, that'd be a long conversation, but on the things that you've just mentioned, it is very difficult to over hype what that will be able to do across every industry. Now certain industries are going to be affected more than others. You make a point that if you really want to clean up, find an industry that is really bad at collecting data, using data, and if you glance at OSV's website, you're going to see a lot of verticals there that are those industries.
And obviously with my quant finance background, it is ever present in my mind. But let's talk about the NBA book because it was such a joy. You talked to a group of OSVers and guests and people talked about your talk endlessly after it because what you found in the book was really many times counterintuitive. I'm not a sports guy, but many of the people who work for me are, and some of them were really reacting kind of like, I can't believe the thing about the names for example, we'll get to names in a minute, but let's start with the obvious one, height. Tell us about what you found about the correlations of height and the NBA.
Seth Stephens-Davidowitz:
I mean, I think you'd have to be blind not to notice that tall people are advantaged in basketball. Obviously height plays a role, but when you do the math, do the data analysis, I didn't realize just quite how it plays out in that each inch, roughly doubles your chances of becoming an NBA player, which is just wild throughout the height distribution. So if you're 5'11, you have twice the chance of making the NBA compared to someone who's 5'10 all the way. And if you're 6'5, you have twice the chances of making the NBA compared to someone 6'4 all the way out as far as we can measure. So if you're seven feet tall, you have twice the chance compared to someone 6'11. It's like a perfect linear log relationship basically. And what that means is if you're under six-foot tall, you have less than one in a million chance of making the NBA.
If you're seven feet or above, you have a one in seven chance of making the NBA, which is just wild. There's no other trait that gives you such a great odds of becoming a multi multi-millionaire just with that genetic trait. I don't think there's anything comparable. And then another thing I kind of started... I found that and then I talked to my friend and he's like, one of the things that means is if you're really, really tall, you don't have to be a great athlete to make the NBA because if you have a one in seven chance, if one in seven, seven-footers are in the NBA, you don't have to be that good. You just have to be a one in seven athletic talent.
So one of the things you also see in the data is that the tallest NBA players are just not particularly good athletes. They jump about the average level of a high school person. They run about the average speed of maybe a high school track, an average high school track athlete. I could shoot free throws better than many of the hall of fame NBA, tall NBA players just because height is such an advantage, everything else just kind of isn't that important if you're so tall. So yeah, that was kind of really cool to see in the data.
Jim O’Shaughnessy:
And it led you to develop a very cool new stat. You called it the Muggsies stat. Talk to us a little bit about that.
Seth Stephens-Davidowitz:
Muggsies stat is just how good every player would be if they were the same height. And there's a whole bunch math backing it up, which I won't bore people with. But one of the things I like about it's, I had some Muggsies, the number one, the player ranked number one on Muggsies was Muggsy Bogues who was five foot three and a player in the NBA for 14 seasons. Kind of an insane accomplishment to be five foot three and a competent NBA player for 14 years. So he's number one. So I called the stat Muggsies after him and I'm like, it'd be really interesting.
That should be an acronym for something. Muggsies should stand for something. But I couldn't think, what should Muggsies stand for? So I asked ChatGPT, what does Muggsies stand for? And it came up with a metric for understanding game given sporting individuals effectiveness and size, which is just so, so good. And I think also shows the power of these AI tools where I don't think I ever would've come up with an acronym like that and ChatGPT was able to do it, AI was able to do that. So it does, these things that we wouldn't be able to do otherwise we are able to do thanks to AI.
Jim O’Shaughnessy:
And it just brings up so many questions like is basketball ready for its Moneyball moment through data analytics? And then a side question that doesn't have a lot to do with basketball, but is there a height equivalent for a data scientist? In other words, seven foot tall, one and seven chance to get into the NBA, right? What about not being height, but is there a similar metric for finding that fabulous data scientist?
Seth Stephens-Davidowitz:
It's a good question. I don't think one in seven of being a multi multi-millionaire think having a, being reasonably good at math, you probably have a one in seven chance of having a pretty good job, I would say. But I wouldn't say a one in seven chance of being worth a 10 million dollar a year salary and all the other perks of NBA stardom. So I think what makes it, and I don't think one in seven chance of reaching the very, very top, I don't think other fields are so dependent on one thing in the same way that NBA basketball is, there are so many skills that tend to contribute to someone being good, even just other sports, what makes a great soccer player. I don't think there's something like height that is so important relative to everything else. I think there are lots of traits that come into play.
Jim O’Shaughnessy:
And on that note, do you think then for the first part of the question, is there a way you can Moneyball basketball?
Seth Stephens-Davidowitz:
Yeah, I think there definitely are. It's a little bit more complicated. I actually have been talking to people. I'm trying to figure out other things I want to do with my life and I'm like, maybe I should work in basketball because I love the game so much and I'm kind of like, what would that be like? So I talked to a whole bunch of teams trying to figure out if there's a way to get into the field and it's a little complicated. I found out some interesting things. One of the things that I found out is that kind of the stats people, so much of picking the players just have to do with arcane and very formulaic salary cap rules. So they're not trying to find necessarily the massive inefficiencies in the market. They're trying to find out just who fits in due to the weird salary cap we have. So it is not as moneyballed in some ways because of that, which is kind of interesting, I think. And another thing I learned is I talked to an assistant general manager of the Denver Nuggets and they drafted, if you're not a sports fan, you might not know, but Nikola Djokic, like one of the greatest players, they drafted him into the second round.
Jim O’Shaughnessy:
Even I know him.
Seth Stephens-Davidowitz:
Even you know. They drafted him in the second round and it's very rare to get a great player in the second round. So I'm like, how did you come up with this idea to draft Djokic? And he is like, "It was coming to our pick and we had six players ranked, six players we were looking at and we had them ranked. This is our favorite, this is our second favorite, this is our third favorite.
This is our fourth favorite. This is our fifth favorite, sixth favorite. The five right before us were drafted, and that left us with six, who was Djokic and we picked him." So it was entirely luck. There was no great genius that led to that pick. If another team hadn't have picked the player that they had ranked higher, they would've just picked that player. So there's a lot of luck in even the great picks in basketball.
Jim O’Shaughnessy:
And outside of sports, there are many other industries that kind of fall into the same sunk cost fallacy. And by that I mean no, this is the way we've always done it and we love it this way and there's a lot of tradition around it and it becomes this social norm. And social norms are much more restrictive in my opinion, than many people believe they are.
Like the data you're finding here, I think one of the reasons why so many of my colleagues were really excited by your talk was because they just were scratching their head. And that kind leads to another thing you found, which is the names of players, specifically black players in the NBA, which are the majority. And the common wisdom is that they have unusual, strikingly unusual in many instances names, but you found that was not true.
Seth Stephens-Davidowitz:
Well, there's been an idea for a while that NBA players come from more troubled backgrounds and there are examples. So LeBron James, one of the greatest NBA players of all time, the greatest player of the last 20 years, he was born in poverty, Akron, Ohio, single mother. And there are lots of examples of players, NBA players who come from similarly troubled difficult backgrounds.
And there's an idea that that gave them the drive that, compare that to a kid who grew up, who was raised by a doctor and a lawyer in the suburbs, and they have so many options, are they going to spend every afternoon at the basketball court? Someone from a difficult background, their only escape is through basketball, so they're going to do whatever it takes. They're just hungrier than everyone else.
And that's never been true. If you look at the data, both African American and Caucasian males much more likely to reach the NBA if they come from a new-parent home, if they come from a middle-class background rather than a poor background. Wealth is a huge advantage to reaching the NBA. And the way you see it most striking is in the first names of black NBA players. So the title for the chapter was, why are so many NBA players named Chris?
And it turns out that NBA players, black NBA players are much more likely to have common names, names that are given to many people, so names like Chris and Paul and Kevin. Those names are much more frequently given to NBA players than unique names, names that aren't given to anybody else, names like Shaniqua or things like that.
They're much less likely. So compared to the average African-American NBA black players are about twice as likely to have common names, which is ... And what the reason for that is that your name can give away your socioeconomics that wealthier African-Americans, upper middle class African Americas, are much more likely to have common names to be given names like Chris and Paul.
The fact that when you listen to an NBA game, you hear that Chris is passing the ball to Paul or Kevin just blocks the shot of James. That's a clue of the socioeconomics of the guys on the court, which is much more likely to be middle-class, upper middle-class, even wealthy.
Jim O’Shaughnessy:
And I think one of the reasons why people were struck by that is because elsewhere you write about, there's a huge percentage of stories that fit into six structures and one of them is the rags to riches. And we just love those stories. Actually, our Infinite Books division is writing a book about David Rhoiney, who is a surgeon, came from a very disadvantaged background and got a scholarship to Annapolis because of basketball and was driven in all of those things.
Sometimes our imaginations, David's story, which is amazing and that's why we're writing a book about it, what he overcame, he's an extraordinary guy, but I think one of the reasons it appeals is because of what he had to overcome to go on to be so successful. And so that makes a lot of sense when you think about it kind of logically.
Well yeah, of course, but we're so drawn by that archetype of that story, rags to riches or overcoming obstacles and adversity, et cetera. It just makes for a more interesting story for people to get involved with emotionally. But you also point out that location, where the player grows up, very, very important. The father, super important to the player as well. And these things are not intuitive to many people, right?
Seth Stephens-Davidowitz:
Yeah. In some ways they should be intuitive. I think you're right, sometimes the story of someone who overcame the odds is so compelling that we forget the reason that it's so compelling is because it's not normal. It's hard. It's difficult. I think also people sometimes make the wrong career decisions based on this. They learn about a story that is so compelling that they forget that it's not normal and they end up thinking, making decisions.
There's a line in business, use your unfair advantage. So you don't want to do things that are necessarily really hard for you or really lean into your disadvantage even if it would make a better story, you want to pick something where you have an unfair advantage where you have, whether it's capital or your network or the university you went to or your expertise. In one of my books, I talk about what makes a successful entrepreneur and they tend to be older. The average successful entrepreneur is 45 years old and the chances of creating a successful business increase until the age of 60. And they tend to be insiders, really, they know their business well and they tend to have already been good employees. They were good at their job before they left out of their own to be an entrepreneur.
And it makes perfect sense, but it goes against some of these kind of wacky stories that we hear. We love the story of someone who came out of totally nowhere. They were disgruntled, they hated their job, nothing worked. And they're just like, well, I'm going to create some wild business. I talk about some woman who created Poo-Pourri, a product that basically gets rid of the smell of poop despite having no backgrounds in chemistry or anything in that area and everyone just loves this story.
They're like, what an amazing story. I want to do that too. But if you look at the data, it's not a smart gamble to try to start a business in something you know nothing about compared to something you've been studying for 10, 15, 20 years. You're much more likely to have success in something you're an insider at.
Jim O’Shaughnessy:
Well, and that might be consistent with how we evolve too. I think evolution made us really cognizant of novel things, especially dangers. We pay really close attention to novel dangers, dangers that have been around for a long, long time. We're like, eh, we know about that.
And as I was listening to you, of course, intuitively one would get, at least if you ask me, who's going to probably have a higher probability chance of doing well, some individual who grew up in a two-parent household that was relatively well-off, well-educated, et cetera, or somebody who had a very dysfunctional child, single parent, maybe even foster care.
I'm going to always pick this one over here. I'm going to say on a big probability basis, this individual is going to probably go on to do better. And I think that's part of the appeal. That's not novel. That's kind of table stakes. Over here, we love the novelty of the fact that good Lord look at what they overcame or look at, they invented something they had no expertise in.
It's the vividness of those stories. One of my favorite sports movies is Rudy. Now I like Notre Dame and I had two kids graduate from there. But again, you're five foot nothing, you weigh a buck nothing. And Rudy didn't go on to be a big sports star, but the whole story was just about the fact that he gave so much to the Notre Dame team that they put him on the field so he could be included in the picture of that particular class of football players.
And I think that it makes sense that the vivid, unexpected, unusual is going to grab our attention much more than the more pedestrian mundane data. But you're probably better off if you're not like Rudy or not one of those people. As you say, maybe don't look at some inspiring story of one in a million type guy and say, I'm going to be the second guy.
Seth Stephens-Davidowitz:
And it's dangerous. I talk in one of my books, Don't Trust Your Gut about after the Social Network movie came out where they talked about Mark Zuckerberg creating Facebook, there was just a huge rise in 18-year-old entrepreneurs, people dropping out of college to start their business. And the failure rate of an 18-year-old entrepreneur is just insane, through the roof.
It's very, very hard to start a business at the age of 18. And Mark Zuckerberg is that one in ... he's a one in a billion outcome. And I think it is dangerous to build your life around these one in a billion outcomes rather than studying the people who had a success that was higher probability that there are many examples of them.
And again, use your unfair advantage. If you're seven feet tall, go into basketball. If you have been studying a field for 10, 15 years, use your network to start a business there. There are all these areas where you can really have a great probability of a big success, but they may not be as exciting as gambling at all on some one-off thing. They're not going to make a movie about you basically if you do that necessarily, right?
Jim O’Shaughnessy:
Right. Yeah. And that's the part that intrigues me to no end. The movies are entertainment and what's entertaining? What's entertaining is not watching a guy go build a car dealership in Akron, Ohio. But it also focuses societies or our hive mind in ways that as you point out, can lead to really bad decisions down the road.
Because if you are going the root of I am going to take this incredibly rare individual who I have great admiration for and I'm going to do that, you are stacking the deck against yourself, not the other way around. When I was early on in factor investing, which is quant investing, the stories that I would tell are, would you go to a doctor who says he's going to wing your treatment?
You walk in and you're like, oh my God, I've got this horrible pain here in my right side of my belly. And he looks at you and says, a pharmaceutical rep just came in and handed me these little yellow pills. And he said they were great. Why don't you try? You're going to run like hell. You're going to want to go to a doctor that says, here's what you have.
We've done massive meta studies on the medications that work in this particular case, therefore this one is going to be what's right for you. But even when I would tell people that I used to say, I tell stories to explain to you why you shouldn't pay attention to stories when making your investment decisions, but it seems to me that we humans, our OS runs on stories. And so, one of the things that I love about you is that you are really good at telling a fun and compelling story, but basing those fun and compelling stories on data.
Seth Stephens-Davidowitz:
Yeah, that's right. That's something I try to do. And it is a challenge. It's always a challenge. How do you explain to people that the average successful entrepreneur owns a beverage distribution company? Yeah, it's not as fun as the stories about the woman who cured poop or Zuckerberg being an entrepreneur at 25 years old.
So it's a challenge, but I think people do, stories do stick in people's minds. So you can't just necessarily just show the charts and the data. You have to add the story and add, I talk about someone I talked to who runs a beverage distribution company and explained what his life is like and try to make it more compelling to people.
And I think for me, even just charts and graphs and data, there is something inherently ... I see the stories in that. In some ways they're better stories because there are millions of people behind that. There are millions of stories that are leading to that incredible clear pattern in the data. So I think in some ways data can be better for storytelling than other methods of telling stories because they're the stories of the masses, they're the stories of how society works in some fundamental way.
And that is something I do put a lot of work into. How do you show the data, show the numbers, allow people to really understand how the world works, but also make it compelling to people?
Jim O’Shaughnessy:
Yeah, I think that was something I learned the hard way and thus changed the way I presented what I was doing to people. Because the first time around I was a bit geeky about it to be honest, and I showed a lot of numbers and I showed a lot of graphs, and I was very disappointed after one of my first real big presentations, people were just kind of like, eh. And then I kind of realized, well, that's because you didn't tell any stories.
Seth Stephens-Davidowitz:
Yeah, yes and no, because one of my favorite pieces of data, I think this one was also Don't Trust Your gut, is in dating, who are the most successful daters who get the most matches on online dating sites? And the most successful daters are kind of the obvious people you'd expect. So they're just incredibly good-looking.
They look like Natalie Portman or Brad Pitt, and everyone wants to date them. We know that. I'm not in that group of people. Most people aren't in those group of people, so that's not going to work for you. So who else is successful in dating? And they found that the most successful daters are really polarizing. So they have, it's women who shaved their head or dyed their hair blue or just they get a lot of really terrible ratings and a lot of really great ratings, but they have an audience who loves them.
And I think there's something similar in business sometimes where even if you are like a total geek, maybe you just need to sell the total geeks. If you came and gave the presentation, it was all numbers and graphs, and I had money to invest, I'd be like, okay, Jim's my guy. So you don't always need to.
It depends if you need to appeal to how many people you need to appeal to and how strongly you need to appeal to them. Sometimes what you want to do is it's okay if a bunch of people in the room are falling asleep or hate you or don't understand you, as long as a few people in the room really love you and think you're unique and special.
Jim O’Shaughnessy:
Yeah. And not surprisingly, the group that I did extremely well with were engineers in the Bay Area. They were just like, just the numbers, man, just show me the numbers.
Seth Stephens-Davidowitz:
I think it's in dating and business and life, a lot of it is also finding your audience. So leaning into who you are and then finding your audience. I think some people make a mistake, they give a presentation and nobody likes it, and they think the problem is the presentation.
The problem may have been the audience in that you haven't found your crowd yet and yet in similar, you're getting rejected over and over in dating, it doesn't necessarily mean that you're doing something wrong. You may just have not found the right people to go on dates with.
Jim O’Shaughnessy:
Yeah, and that also brings up your observation that people often with regularity choose the wrong metric. Like, beauty, career, are they rich, are they beautiful? Those are pretty metrics actually, if you really want to go on a date and you point out that the predictive power belongs to looking at psychological traits and the one you just mentioned, they're somewhat polarizing.
Seth Stephens-Davidowitz:
So if you do studies of what leads to a successful marriage, and they've studied tens of thousands of couples, and how happy are they, speaking of AI, they've used machine learning to build models of every possible trait on these two couples. What predicts that they're going to get along lead to being a happy lasting relationship?
And the overwhelming evidence is that the predictive power, the partner that's going to most likely make you happy in the long term, the things that don't matter are all the superficial things like are they beautiful? What career are they in, how tall are they? And the things that do make you happy are various psychological traits, growth mindset, conscientiousness, something called a secure attachment style, satisfaction with life.
Basically a happy person, kind of a happy, well-adjusted, nice person who's conscientious. That's what really matters in the long run according to the data. So that's another area where shininess kind of tricks us. So if someone shows up to the date, like one of them is just stunning, but they don't really have their stuff together and the other one is a little plain looking, but gets the job done, a lot of us also evolutionarily, we're going to be drawn to that stunning face and stunning body, but it's not necessarily what leads to long-term happiness.
Jim O’Shaughnessy:
And also one of the reasons why it intrigues me so much is it seems to be universal across whatever you're looking at. So stocks, obviously glamour stocks, they even call them glamour stocks. And generally speaking, at least historically now, we've had a long period where this has not been true, so we'll have to wait and see as data comes in, but glamour stocks historically, are not great performers at all.
We often would do a study where we took something intuitive. For example, what if we just bought the 50 stocks with the greatest percentage increase in revenues over the previous one, three and five-year period? That sounds like something, oh, they're doing something right. Their revenues are doing really, really well. And then I found a little sub-period where that particular single line strategy, buy the 50 stocks with the highest revenue gain, kind of like tripled the S&P 500 over a five-year period.
And I would put that chart up and I would tell the story, like does this look like a really cool way to invest? Really simple. You go to any stock screener and most of them are free. You could find this. What do you think? And people are like, yeah, that's a great idea. You blah, blah, blah. Then I would show them, here's the results of starting with all of the data that we have.
And in this instance, it went back to the mid-1950s and that strategy underperformed T-bills. And so trying to get them to focus on the larger picture as opposed to the smaller was very helpful. But I think that we also, we can't do that everywhere. It's like the famous story about the Brits trying to get rid of rats in colonial India, and they said, yeah, we'll give you a bounty for a rat's tail.
And of course what happened was people started breeding rats so they could cut off the tail. Are there other metrics elsewhere that you've seen and went, God, everybody looks at this and it has no predictive power at all?
Seth Stephens-Davidowitz:
Yeah, I mean, definitely there are lots of examples of that. I'm trying to think what would be even the best one. But I think definitely one of the things that stands out is kind of the danger of glamour. I think you're right that it's more universal.
So I also did a study of you can predict how long a business is going to last, how quickly it's going to fold, and one of the biggest predictors is basically just how sexy the business is. So the worst businesses are things like record stores. The single worst historically has been a record store, a candy store, a toy store, a beauty store, all these things that sound really cool and fun, they might have movies about them. They have a massive shot of going out of business very quickly. So I think glamour in general, I've found tends to be a negative predictor. We think it's a positive predictor and if anything over the long term it seems to be negative predictors. So that's kind of a universal pattern I've seen in many areas. Yeah.
Jim O’Shaughnessy:
And one of the things that is also difficult, and I wonder if you've run into this situation as well. Sometimes when a pattern, or again, I'm thinking kind of in terms of stock selection models, many times when it really goes against intuition, even if it works beautifully, both in back test and real time. Right. So you've got 50 years of showing it's been matched, all of that sort of stuff. You've got 10 years of showing it doing really well in real time. People still would be like, it really bothers me that you've got this particular variable in there. I remember one time I got very far along in a huge assignment for a small to mid-cap growth placement from a group in London. And then we got on the final call and the guy was very polite and he's like, "We love your stuff. We've really tried to break it as many ways, which we can, but we're going to pass." And I'm like, "Okay, why?" And he said, "It's just too simple."
Seth Stephens-Davidowitz:
Yeah.
Jim O’Shaughnessy:
Do you hear that often?
Seth Stephens-Davidowitz:
Yeah, I definitely, I think there is, I mean, that was a problem I had in academia as well. I never really loved academia because I think they do glorify complexity for complexity's sake. And you need a really wild math model and you need a simulate, crazy simulations. And it's kind of like sometimes the best ideas I've found are just very, very simple and very, very clear. And you could explain them to your mom, your grandma, your siblings. That's definitely a thing. Another thing I've also, thinking about things that don't predict outcomes, you have to be a little careful about that. So Google did this study famously where they found out that GPA of applicants has no correlation with how the employees perform. So if someone has a 4.0, someone has a 2.0, they're just as likely to be good Google employees. But there's a little danger of going from that to we're not going to care about GPA at all.
Because a lot of the people with the 2.0 GPAs had something else that made Google want to hire them despite the 2.0 GPA, right? So they had, maybe they were won a coding competition or they have amazing reference or there are lots of other things that led to the hiring without the great GPA. So going forward you have to be a little careful sometimes the reason for the correlation or the lack of correlation. There could be something else that's driving that, that you have to cautious with and using the correlation in the past to make decisions in the future.
Jim O’Shaughnessy:
Yeah, obviously. But I would argue back that at the very least, you wouldn't want to ignore GPA entirely, but you wouldn't want to make it your centerpiece. You wouldn't want to have a model that says everyone with a GPA below 3.8 is excluded, right, because then you're going to miss all of those extra special somethings, right?
Seth Stephens-Davidowitz:
For sure. And I try to be, like I'm a big supporter of Bayes rule, which is you're constantly updating your understanding of the world based on information. So anytime you have a correlation that's surprising, you're going to make slight adjustments. Right. So if you were expecting that there would be a strong positive relationship between GPA and how an employee performed and you find there's no correlation, you should be using GPA less in your analysis than you might have otherwise because you now have new information that surprised you, that changes your model of how the world works. So I'm kind of always adjusting my understanding of the world based on data I'm coming across.
Jim O’Shaughnessy:
And my personal experience here is sort of interesting because I look at the world the same way as you do. And another thing that was found to actually not add anything, and in certain studies subtract is the standard interview. Where do you see yourself in five years? What is your greatest weakness? I just care too much, right? And yet,-
Seth Stephens-Davidowitz:
I work too hard sometimes.
Jim O’Shaughnessy:
I work too hard, I care too much about the company I'm working for. It really kills me. I just care too much. But the reason I bring it up is because I stopped doing traditional interviews and luckily for most of my career I've had my own company. And so I could do what I wanted to do. For a while, I worked for Bear Stearns running their quantitative investing group. And the pushback that I got from people about abandoning the traditional interview process was really extreme. And it goes back to what I mentioned earlier. I was violating a well-established cultural norm. And people really don't like it when you do that. And so I'll tell you a funny story. So they let me once, and you'll learn why it was only once, they let me interview candidates for an internship at Bear Stearns. And literally I only recommended one person who by the way, got recommendations from every other big investment bank and chose to go I think with Goldman Sachs.
But what I did in the interview process was I wanted to see how they thought. So the question that I gave all of them was the S&P 500 is a total return index. In other words, it reinvests dividends and gives you the total return had you reinvested dividends over this period from 1926, 27 through here. That itself has some things that isn't quite right about it because it isn't the S&P 500, it's a proxy for it, etc.
And then I would say the Dow Jones Industrial Average is only 30 stocks and it doesn't include the reinvestment of dividend and it's not cap weighted like the S&P 500 is. And then I said, if the Dow was cap weighted and reinvested dividends, what would it be at today? And literally, I knew they weren't going to come up with the right answer. I didn't come up with the right answer when I tried to do it. What I was looking for, how do they do things? How do they go about solving problems? Right.
And that led to this sorting mechanism, which got this guy. But again, they were mad at me. They were really mad at me. And I think one of the reasons were, like a math major, I asked when they sat down, I was like, "Oh, math major. Cool. What's 17% of 72?" And just because I didn't care. What I wanted to see was the reaction. How do they react to that? Do you see that as kind of a pre-existing kind of cultural reticence to be too oriented toward data analytics, et cetera? Or did I just stumble into the wrong interview process?
Seth Stephens-Davidowitz:
No, I think one of the frustrations I have with the world and definitely business is just people don't collect enough data. It's really frustrating to me that they've been doing all these interviews and they haven't correlated how did they rate people on interviews and how does that lead, how happy are they with the employee down the road? Nobody does that. And people just do things because they did them. And I think even if you have a different idea, I recommend people collect data on your new idea. Maybe you're just so pissed at this old idea that you come up with a new idea that actually doesn't work, but then you have your own biases in that direction and maybe, so I don't know. I think there's just not enough data on these things. These are answerable questions in my opinion. Do interviews predict long-term performance, are answering questions that are more about how you think than the answer? Does that lead to better performance?
I think all these questions do have an answer, but I think for whatever reason, people just aren't in the habit of testing their hypotheses here and or they, kind of speaking to the idea that one example, shiny example kind of biases us. If there's one time someone answered a question really well at an interview and they were a great performer, everyone's going to be like, that's the question we need to ask. When it's really only one example that led to that. So I think we sometimes do, we're not very systematic in some of these decisions and really understanding. We all have our biases and our ideas and our anger about the way our old boss did it or the way we were passed over that lead to theories that aren't necessarily data-driven.
Jim O’Shaughnessy:
And by the way, they didn't call my group at Bear Stearns quantitative equity. They called it systematic equity.
Seth Stephens-Davidowitz:
Right.
Jim O’Shaughnessy:
You just used it. Yeah. Right.
Seth Stephens-Davidowitz:
Yeah. Yeah.
Jim O’Shaughnessy:
And one of the things that I have always believed is I'm probably wrong and so when you have that as one of your mental models, that leads you to seek tons of data. Right. I learned early on, I trained myself to be interested in those kind of one in a million stories because they are interesting. But I also trained myself to say they're not likely and they're possible. They're not probable. And one of the things that I found, and I developed some pretty good stories, honestly, using data and compelling people to pay more attention to it. But I did notice and continue to notice a reticence to go to, there's the old sports thing where the guy says, "Let's go to the videotape," and I used to say that all the time and people would get that.
But I also find the whole idea of A/B testing, I think it's one of the most important things in the world to do. Right. Who knows whether this idea that I think is great, really is great or really is going to attract the right type of audience that I'm interested in? And one of the things you call a doppelganger search, but nearest neighbors, like that's how Facebook makes all of its money. People are like, well, how is Facebook still surviving with the average age of the user there is like 60 minutes these days? How are they doing it? Well, they're doing it by doing a lot of A/B testing in silico. And talk about that a little bit because that's another love that we share.
Seth Stephens-Davidowitz:
Yeah. So I mean, basically what tech companies realize, the gold standard of seeing whether something works. We know this in medicine, is randomized controlled trials. So you want to know if a pill works, you give two groups, control group gets placebo medicine, treatment group gets the medicine, then you measure their outcomes and see whether the people in the treatment group less likely to have the disease in however many months. And it's kind of a standard for doing things, standard for, increasingly other areas of social science have used randomized controlled trials, but randomized controlled trials take forever.
They're expensive. You have to raise all this money, you have ethical issues, you have to get them through an IRB board. And what Silicon Valley companies realize is they have all this amazing data. You can create a randomized controlled trial with a couple lines of code so you can show different groups of your users, different versions of the website and see which one leads to more clicks, more advertising spending, more time on your site, whatever metric you're interested in. And I think Google was one of the first ones that really use this. And initially they're doing a few experiments, then they realized we could do way more expensive. We could do thousands of experiments. Think I, Facebook does more experiments in a day than the FDA approves in a year. It's insane how many experiments these companies use and all the top companies have used this.
Netflix has brought data to entertainment, which usually was something where people kind of winged it, relied on their intuition. And it's really powerful tool for understanding what makes people want to stay at your website. Okay. So if the button's red versus green, the red button may lead to an extra 10 minutes a month on your website. So you're going to use the red button. All these different examples, it's a little creepy. One of the reasons that people are so addicted to these sites is that the data analysis is so powerful, the A/B testing is so powerful. They've really figured out exactly what maximizes the chances that you do what they want you to do, which is spend more time on the site and click on more ads.
Jim O’Shaughnessy:
Yeah. But also to just make the opposite argument, like in all the verticals that we have, books, entertainment, podcasts, et cetera, we also look at it as finding the right fit with the audience that wants that kind of content. Right. I hate that word by the way, but I've got to figure out a new word for content because I don't like it. But with books, for example, Amazon, there's a reason why they're recommending you books that you end up reading and saying, God, I really love this book. It's because they looked at thousands and thousands and thousands of people who read similar books to yours that you liked. Right. And then they recommend it to you, the ones that you haven't read yet.
And I definitely think, I mean, dating for example, I'm seeing a lot of new things. We're getting pitched on dating apps that are only matching people based on their literary interests or their other kind of hobbies, et cetera. And the data, especially on the literary side, is pretty compelling, that people who love the same authors and love the same genres tend to get along with each other a lot better than putting together, I only read nonfiction about quantum physics with I love romance. Right. Those might not go together well.
Seth Stephens-Davidowitz:
Yeah, so that's another example of scale. So one of the examples of scale in these big tech companies is they can just do a lot of experiments because it's very easy to get a group of 1000 people for a treatment control group to do an experiment. And another advantage of scale is they can find someone just like you, what I call a doppelganger and Amazon or Facebook or Google, we think we're so unique, but there are probably 1000 people who have pretty similar tastes to you on many dimensions, similar political views, similar taste in books, similar taste in music. And if they know that everybody with these tastes has really liked this new show, then they can inform you about this new show. And that is really, really powerful. And I think is largely, I think A/B testing, I'm not sure. It feels a little creepy to me. And they may be getting you addicted to things and getting you to spend too much time on these websites.
So the ethical issues around A/B testing I think are real, but I think there aren't really necessarily ethical issues about doppelganger searches. I think that's just a cool thing, to find exactly the people who are like you and what they like and then informing you about that. And that is a really powerful tool that all these firms, Amazon, Netflix, Facebook, Google, they're all using them, YouTube.
Jim O’Shaughnessy:
I agree, by the way, about doing A/B testing just to addict you to a particular type of behavior or time on a site or anything like that. But I think that the idea of nearest neighbors, the idea of doppelgangers, right, like it's going to unlock so much knowledge, in my opinion in medicine. Right. How cool will it be to have me in silico and then do all sorts of horrible things to my silicon doppelganger that you could not do ethically to me as a human and find yeah, for this particular individual or group of individuals, this composition of the meds works best, et cetera.
I have a relative who's like you, highly degreed. He's also got an MD, but I think he's got a PhD in data science as well. And that's what he does. He's brought in by big pharmaceutical companies to interrogate the data with drugs that failed for whatever reason, for what the pharmaceutical company hoped they would be able to be efficacious at. And then Grant will go in and say, "Oh, well you've got a drug here that, well, you wouldn't want to prescribe it broadly. It's efficacy in just post-menopausal women who are slightly overweight, not obese is much, much better." Do you see those kinds of things getting unlocked across various disciplines and industries?
Seth Stephens-Davidowitz:
For sure. But you have to be careful. I don't want to get too technical, but obviously over fitting is a huge danger here.
Jim O’Shaughnessy:
Right.
Seth Stephens-Davidowitz:
If the group is so small that,-
Jim O’Shaughnessy:
Yeah. Using the story to illustrate the idea of the use. I agree.
Seth Stephens-Davidowitz:
Yeah. Yeah. Yeah.
Jim O’Shaughnessy:
You don't want to over fit.
Seth Stephens-Davidowitz:
You don't want to over fit and think you found, you have to be very careful with these tiny groups that is that a real effect or just noise in the data? So I say that the first time we really saw the power of doppelganger search was actually in sports and baseball where one of the ways they figured out is a player still going to be good or are they shot is just compare all the other players who had a similar profile and see what happened to them. And definitely then it kind of came to recommending content with the big companies. And I think it is coming more and more to health.
One of the huge frustrations in health is just the data is all over the place. There's so much data, but it seems clear that you could cure all kinds of diseases at this point if someone just had access to all the different data siloed in different companies and different nonprofits and in file drawers, doctor notes. And it's hard to kind of get a hold of it and to get the huge data sets that really, these tools, A/B testing, doppelganger searches, they really rely on enormous data sets. And you can't do it with a small sample of surveying 1000 patients or anything. You're not going to find the middle-aged overweight, but not obese, post-menopausal woman. But you're going to need an enormous data set to find that.
Jim O’Shaughnessy:
Yeah. And obviously the quality of the data as you point out and the availability of the data are critical. And one of the reasons why, I invested in a lot of various AI companies, and one of the first things that we found was that issue. Right. Companies were absolutely terrified of letting a large language model behind their firewall because they did not want their data escaping into the wild. And this seems to be a really pernicious problem. Do you have any ideas for how that could be solved?
Seth Stephens-Davidowitz:
I don't totally know. It just strikes me that people are, there's this example I talk about in Everybody Lies, my first book. It was all about all the things we could learn about people from search data. And there's this study where they followed people over time who had got diagnosed with pancreatic cancer, their searches over time. And pancreatic cancer, that's a disease, the earlier you find out about it, the more likely you are to survive. It's not a good disease, it's very low survival rate, but you can double or triple the survival rate if you learn early enough from five to 15%.
And they basically followed people on the internet, all their searches, what symptoms did they search before they got diagnosed with pancreatic cancer? And they found these really subtle patterns. So indigestion followed by abdominal pain is a risk factor for eventually getting diagnosed with pancreatic cancer. Whereas abdominal pain followed by indigestion or abdominal pain by itself or indigestion by itself are not risk factors. So it's like the precise order, indigestion first, then abdominal pain. And that is just not known by the medical community. Nobody has data sets that big with the time series of symptoms and it's kind of like, but so Microsoft did this study, but I don't think Google's done a study like this and it kind of angered me. I'm just like, we could do this study. I don't see the risk here. You're not talking about any individual patient, you're just talking about patterns writ large. And there's a question, was this study ethical? Was it ethical to do this study? I think it's unethical not to do the study.
Jim O’Shaughnessy:
Not to do it.
Seth Stephens-Davidowitz:
Yeah, I'm like, it pisses me off that all these sites have all this information about people's health and that could potentially save lives and they're not doing studies on it. So I think one of the biggest things is just changing the idea of what's ethical and what's not ethical. And I think there is a bit of a paranoia, I feel around some of these issues. Why do we care that a big company is mining data of people's symptoms to see what predicts disease? Isn't that unambiguously a good thing? Isn't that what we want them to be doing? But when they do a study like that, there is kind of a question, well, this is such personal information.
So I think there's, yeah, I don't know. Maybe we're not explaining the positive case enough, just how powerful these tools can be and how much good they could do. That's my main thought. Because I think the data is all there. Google has data on all people's search symptoms and their diseases, and they could work with other companies who also have data. So the data's there. It's just a matter of explaining why we'd want to analyze and understand some of this data.
Jim O’Shaughnessy:
Yeah, I think a big factor here is cultural lag. In other words, the time it takes a culture to change and adopt some of these new tools, that it can be, if you're an entrepreneur, it can be a great advantage to you if you do non consensus things before other people. But the-
Seth Stephens-Davidowitz:
[inaudible 01:16:26].
Jim O’Shaughnessy:
Go ahead.
Seth Stephens-Davidowitz:
Yeah. Well, just one thing I wanted to say is, I don't want to minimize that there are issues with companies knowing so much about us. So you could imagine-
Jim O’Shaughnessy:
Oh, of course.
Seth Stephens-Davidowitz:
... A company knowing that you're at risk of pancreatic cancer, well, are they going to charge you higher insurance rates? Or, there are all kinds of... I understand why we have fears of companies having that much knowledge on us. So yeah, I don't want to say it's unambiguously good for companies to do this or we don't have to have regulations in other areas. But it just does... I am sure that there are diseases that could be cured and there are many, many lives that could be saved with existing data that we should be able to study without harming anybody's privacy.
Jim O’Shaughnessy:
And I think that that caveat is absolutely correct and true, and I agree with you entirely. I think that it's sometimes used not for that reason though. It's sometimes used to, because again, culturally we feel really weird about the things that these models can unlock, and what they know about us and everything else. So I get it, but I think bringing out the idea, raise your hand if you think it's a bad idea to take all of this data and then possibly come up with a protocol that saves a ton of lives by diagnosing pancreatic cancer earlier because it leads that way. It's obviously a very complex issue, but I think that... But it links to my next question for you.
So I was a Google fanatic because I'm a research junkie and I just love to do that stuff, but I have pretty much entirely switched my searches to Perplexity and some other large language models. Do you think that that is going to... Google was a gravity well, right? Everything went in there. Do you think that it becoming more fragmented with less people using Google for all their searches and using large language models like Perplexity, how does that play out?
Seth Stephens-Davidowitz:
Yeah, if I were Google, I'd be pretty paranoid because I'm definitely using Perplexity, large language models. Even just ChatGPT, if I'm looking, I was saying, "I'm going to St. Barts," and I would use to say, I'd Google things to do in St. Barts. And now I just go to ChatGPT, things to do in St. Barts. Then I could say, "Well, I have seven days," and, "What would be good to do on a weekend?" and plan a whole itinerary, and do all these things that you couldn't do with Google.
So I definitely, I'd be worried if I were Google, because they can't seem to keep up with some of these smaller AI companies in the AI products they're offering. And it does seem like there's a change in behavior. I think generally these changes start with, you and I probably are, I'm guessing based on everything about me and everything I know about myself, that we're the type that is very excited by new ideas. And we don't want to do, just because we've been using Google for 10 years, I guess 25 years at this point, we're not like, oh, well now I don't want to learn how to use Perplexity. Or as soon as we see Perplexity, as soon as you see ChatGPT, our first instinct is, oh, this can change everything. How can I replace all my ways of doing things with these new tools?
So I don't think it's led to large scale changes yet where it's really hurting Google's business, but I would imagine it could down the road for sure.
Jim O’Shaughnessy:
Yeah, it's something I've been thinking a lot about recently because of what we've been discussing, right? Google was kind of the ideal data source for so many things because so many people used it.
Let's use that as a segue to talk about some of the fun stuff, a lot of the fun stuff in Everybody Lies. If you could, for our listeners and viewers, point out, and this probably, if I'm going to guess right about what examples you're going to give, it won't surprise too many people. But this whole idea between stated preferences and revealed preferences, often there's a chasm between your stated preferences and your revealed preferences, and how better to reveal preferences than to look what people search for. And you have a very amusing and fun look at that, the difference between what women search for and what men search for. If you could give our listeners and viewers some examples of those?
Seth Stephens-Davidowitz:
Oh, yeah. Well, there are lots of different examples. So one of my favorite examples, I compare social media and search. So social media is the ultimate stated preference. So when you're on social media, all your friends see it. It's very public. And search is kind of revealed preference. Nobody's seeing it.
So I compare how people describe their husbands on the different sources. So on social media when they're posting, "my husband is", is, "My husband is the best, the greatest, so cute, adorable." And then on search, when it's private, "My husband is a jerk, cheating on me. I can't stand my husband. He's so annoying." And I think that's a great example of the difference based on the incentives you give people, what kind of data you get.
And also the difference in what's really going on. I think a lot of depression these days, and particularly among teenagers, comes from social media. It makes it seem like everybody else has such an amazing life. And they're always on a great vacation. Their family life is perfect, their relationship is perfect. And I think it's good to keep in mind that that's the stated ideas and it's not necessarily the full picture. And you see more of the full picture definitely in people's Google searches.
Jim O’Shaughnessy:
I have long said that social media is the ultimate Potemkin village.
Seth Stephens-Davidowitz:
Yeah, yeah, I definitely agree with that. Although sometimes there's a darkness to what you see in the search data. So I think I start the book talking about racism. And that's an area where you ask people in a survey, are you racist? Everybody says, "No. No, of course not." But I was shocked by how many people on Google make really, really explicit racist searches. And the areas in the country where people were making those racist searches, African-Americans had worse outcomes, they're more likely to be stopped by police, worse health outcomes, lower wages. So some of the things you learn are just are amusing, and funny, and cute. And some of the things can be really dark because there is darkness inside, in the underbelly of society that isn't always talked about, that can be dangerous too. It can lead to riots, and wars, and all kinds of problems.
Jim O’Shaughnessy:
Yeah. And should not be ignored. And it brings us back to that point, right? I agree with you on the sensitivity of the data that you have to, how do you ethically do that? But by the same token, if we could use or have an early warning system on dark things like racism, likelihood of teen suicide, all those kinds of things, like, wow, that would be really hard for me to argue against having that available to people. And potentially it changes their mind.
Seth Stephens-Davidowitz:
Yeah, I think it was a public service. So a lot of the data I analyze in Everybody Lies is Google Trends data. And they make all this anonymous area data available. And I think part of it is PR, and Google's in the news a lot. Google doesn't necessarily need PR, everyone knows who Google is. I think part of it was Google's just such a research, it's an organization with tons of scientists and researchers, and I think they did feel like this was somewhat of a public service and could be of use to the research community. And there have been examples of researchers using it to predict flu outbreaks or various disease outbreaks. And there are so many more things that can be done with this type of data to help improve society in various ways.
Jim O’Shaughnessy:
The other thing that we've touched on throughout our conversation, but we haven't explicitly called out, and that is the seeming preference of many people for small data sets where they themselves have, like a doctor, right, "Of the hundred patients that I saw, you seem to fit into this particular thing." And small samples always have the craziest distribution pattern. Larger samples, the classic example is the question you ask people, "If I told you that there were two hospitals, one which was in a city of 8 million people and one was in a town of 80,000, and that there were nine boys and one girl born in the hospital, which hospital do you think it is, the big city one or the one in the small town?". Most people guess it's in the big city hospital for I think somewhat obvious reasons, like, "Well, there's so much more," and blah, blah, blah, but that's wrong, of course. The small one is far more likely to have the unusual outcomes.
What are some of the other aspects of big data that make it really unique for doing these kinds of research studies and learning so many new things that people should consider?
Seth Stephens-Davidowitz:
Yeah, I think a lot of it is slicing and dicing the data set. So the example of post-menopausal, overweight, but not obese women. No doctor who's seen a hundred patients is going to be able to pick up a pattern like that. A pattern that needs an enormous data set to be uncovered. And just all kinds of things, seeing what's happened in one town versus another town. If you do a survey of 2000 people, you're going to have 10 people from all of Connecticut. Whereas if you have an enormous data set, you're going to have millions of people from Connecticut and you can really see how Connecticut people differ from New Jersey people, differ from people in California. And really seeing, zooming in on various patterns, I think is one of the biggest powers of enormous data sets.
Jim O’Shaughnessy:
Yeah, I agree. Are you working on a new book right now?
Seth Stephens-Davidowitz:
I don't know. Maybe. I'm going on the St. Bart's trip and if I get inspired, I might start one. Initially when I wrote Who Makes the NBA, I'm like, man, all I want to do, this is the best month of my life. I just want to sit back and write books like this the whole time.
And the problem is, and maybe you can relate to this, I'm just such an obsessive person that when I'm working on a project, I don't know how to work nine to six. I work, I get all consumed, and it's 18-hour days. And I'm just so obsessed with work. So I have other things going on in my life. I have consulting projects, I have a relationship. I am trying to spend more time with my parents for various reasons. I'm a little wary to just go deep in another project right now, but I probably will pretty soon. So I have a whole bunch of ideas cooking.
Jim O’Shaughnessy:
I definitely hear you on that obsessive pull. I'm very much like you in that respect, I guess we could say, I got to always make the pitch for Infinite Books. Maybe you should spend even more time in St. Barts because, and then do the work there because just think about the beautiful environment. You'd be able to do all of that, and then we'll give you a lot of help over Infinite Books if you don't want to self-publish or go with another publisher.
Seth Stephens-Davidowitz:
Yeah, we've been talking about maybe working together on something, which could be fun.
Jim O’Shaughnessy:
We would love to do that because we are very, very simpatico in the way that we look at the world.
Well, this has been absolutely fascinating, Seth. I always ask my guests at the end of our conversation, we're going to make you the emperor of the world. You can't kill anyone. You can't put anyone in a reeducation camp. But what you can do is we're going to hand you a magical microphone and you can speak two things into it that are going to incept the entire population of the world. They're going to wake up whenever their next morning is and say, " You know what? I just had two of the greatest ideas. And unlike all the other times, I'm going to actually act on these two ideas starting today." What are you going to incept in the world's population?
Seth Stephens-Davidowitz:
Wait, so are they just going to hear the idea and decide whether to act or they're going to hear the idea and just do it?
Jim O’Shaughnessy:
No, they're going to think they came up with the idea, thus inception, and then they're going to act on it.
Seth Stephens-Davidowitz:
Okay. Well, I guess because it's me, it's to record your data, to just get more data on everything about you. So I'm big into the quantified self movement. And just like stop winging it in your life. Record all your data. Record when you feel good, when you have high energy, when you are happy, and notice patterns in it. Detect patterns in it. What led to that? I think that could be... I think a lot of people... This is maybe more than one sentence, sorry, but-
Jim O’Shaughnessy:
That's okay.
Seth Stephens-Davidowitz:
... A lot of people are basically, I think a lot of people just have very obvious things holding them back, that if they just saw the data, they wouldn't. Like, they have a drink before sleeping and it ruins their sleep every time they do that. And then they're in a cranky mood and then they're not getting work done. And they've been doing that for three years, for 30 years, and they'd be way better at functioning on every level if they just didn't do that. So really, I think recording your data would be what I'd insist upon.
Jim O’Shaughnessy:
Okay. That just counts as one. You got another one.
Seth Stephens-Davidowitz:
Another one would be to do the obvious things that make you happy, that make people happy. And don't trust your gut. I talk about all the research on happiness and the things they find make people happy, being with friends, being near a beautiful lake, hunting, fishing, gardening, things that we've kind of done since our hunter-gatherer days. And I think a lot of people just don't do those things. And you live in a city, you work 80-hour weeks, and you're not seeing your friends very much. And so I definitely encourage people to read the research on happiness. And then if you're not happy, just how much of your week are you spending doing the very obvious things that tend to make people happy? Ask yourself that question every Sunday or something and then reevaluate your next week, I would say
Jim O’Shaughnessy:
I love both of those. Data can definitely set you free if you use it the right way and ask the right questions. It can also enslave you, as we talked about, when people know so much about keeping you glued to doom scrolling or staying on that particular website. That's not such a great outcome. But I think we shouldn't let it besmirch the absolute power of big data married to AI, I just think is going to cause such a revolution and discovery in a bunch of innovations, and obviously a bunch of bad stuff too. So we have to be very careful about that, and be always aware of it, and not be panglossian about our attitude. "Oh, this is just going to work out perfectly,"
Seth Stephens-Davidowitz:
Totally, totally agree.
Jim O’Shaughnessy:
Where can people find you? We'll have it in the show notes.
Seth Stephens-Davidowitz:
Oh, I guess I'm on X, although I don't post as much as I probably should. SethS_D on X. That's probably the best place.
Jim O’Shaughnessy:
Perfect. All right my friend, this has been great. I really appreciate you joining me, and can't wait to get together and hear about that maybe next book.
Seth Stephens-Davidowitz:
Thanks so much, Jim.