Sunday, June 26, 2022

Why Statisticians LOVE Odds #Statistics

 

Why Statisticians LOVE Odds 


Probability took center stage in your introductory statistics class. Odds were designed for gamblers. 

However, it turns out that odds play an important role in statistics as well. The two have a straightforward relationship.

To estimate the likelihood that event "A" will occur, divide the number of times A occurs by the total number of events, A and not A (~A), or

A / (A + ~A)


The odds of A happening are calculated by removing A from the denominator A/~A.


  • With low probabilities of A, the two are nearly identical. 
  • Why not use probabilities instead of odds? Two important applications come to mind.


1. Logistic Regression Makes Use of Odds


After linear regression, logistic regression is the most helpful member of the linear model family. It has been a powerful and efficient model for predicting the likelihood of an outcome and understanding relationships between predictor variables and that outcome for decades.


Unfortunately, a linear model is ineffective at predicting a probability that does not increase as predictor values increase. A probability value must be between 0 and 1. (you cannot have more than a 100 percent chance of something). 


The odds are not as limited. Odds can be any positive value (for example, a probability of 23 is the same as odds of 2/1). If we use odds instead (actually the log of odds, or logit), we get a linear regression.


2. Use of Odds in Retrospective Medical Studies 

A prospective study is an important tool in medical research. In a prospective study, subjects are tracked over time to see if they develop a disease. 

In most cases, we want to compare the risk (probability) of getting the disease for subjects with some attribute or condition to the risk for those who do not have that condition. 

For example, we could compare the risk of developing heart disease between smokers and nonsmokers. 

This comparison is typically expressed as a ratio: the risk of heart disease for smokers divided by the risk of heart disease for nonsmokers. 


This ratio (the "relative risk") indicates how much additional risk you incur by smoking. 

Unfortunately, prospective studies are time-consuming because the subjects must be given enough time to contract the disease.

A case-control retrospective study compares people who already have the disease to matched controls who do not, and asks whether they have the condition of interest. 

Consider looking at patients with heart disease as well as a group of matched controls and asking whether or not they smoke.


Because an equal number of heart disease patients and healthy patients were purposefully chosen for the study, this design cannot estimate the likelihood of contracting the disease. Instead, the odds of the antecedent (being a smoker) are calculated, not the outcome (heart disease). 


For example, we learn the chances of a heart disease patient being a smoker, not the chances of a smoker getting heart disease. The odds ratio for this risk factor can be calculated by dividing the odds of smoking by the odds of not smoking.


This design has the disadvantage of not providing information in terms most relevant to patients - the increased likelihood of getting a disease as a result of exposing yourself to some risk factor. It does, however, have the advantage of being able to use existing data and produce results immediately after analysis.


As a result of their convenience and efficiency, retrospective studies are popular. As a result, odds ratios are well known among medical professionals and statisticians who analyze their data.


3. Odds, Gambling, and Behavioral Research


Let's look at gambling, or more specifically, betting. Odds are essential to betting; they are the means by which gamblers and gambling establishments convert estimated probabilities into bets.


Betting odds are expressed in terms of the payoff that a bet will produce. On a 36-number roulette wheel (excluding the house's 0 or 0s), betting on 9 of the 36 numbers yields 3 to 1 odds. 


This means that a $1 bet on the roulette wheel will return $3 if you win (also known as a 2-1 payout: you get your original $1 back plus $2).


This translates to a 14 percent or 25% chance that the wheel will land on one of your segments. If your play wins 25% of the time (gaining you $3) and loses 75% of the time (losing you $1), then the 3 to 1 odds are a fair bet, meaning that you will come out even in the long run. 


To earn inexorable profits in the long run, gambling houses set betting odds that are slightly more advantageous to themselves than a fair bet. To put it another way, the casino wants the probabilities suggested by the odds to be greater than one. 


This is accomplished in roulette by adding a 0 to the 36 numbers (Europe) or two 0s (United States) that belong to the house: if the wheel lands on 0, no one wins and the house wins. This house margin is thin in competitive and transparent gambling games: make it too large, and you open the door to a competitor offering better odds.


The potential reach of betting extends far beyond sports betting and casino play. Hypothetical bets are an effective experimental tool in social psychology. In his research and book Thinking, Fast and Slow, Daniel Kahneman popularized this technique. 


He proposed the following option:

Scenario A: You've been handed a $1,000 check. You must now select between

  • A 50% chance of winning $1000
  • Obtain $500 without a doubt

Scenario B: You have received $2000. You must now choose between 

  • A 50% chance of losing $1000 and a 100% chance of winning $1000.
  • Definitely lose $500.


The final states of expected wealth for the two scenarios are the same. Nonetheless, almost everyone chooses the safe bet in Scenario A and the gamble in Scenario B. This demonstrates the significance of a reference point, such as a $1000 or $2000 endowment. You can only make money from $1000, and the sure thing of $500 locks in that profit. You can only lose money starting with $2000, and the gamble offers a good chance of not losing any money. 


Reference points are an important component of Kahneman and Tversky's prospect theory. Prospect theory supplements older theories that hypothesize how people behave. Kahneman and Tversky used bets and games of chance, such as the ones mentioned above, to demonstrate how actual human behavior differs from what utility theory predicts.


Kahneman's bets were purely hypothetical in order to demonstrate academic theories; no actual money was exchanged. Political scientists have attempted to go even further by establishing markets with actual payouts. The goal is to learn about upcoming political events. These "prediction markets" are thought to represent the totality of available knowledge on a subject (see The Wisdom of Crowds). 


Bets in a prediction market perform a similar function to bid and offer prices in an economic market. For a variety of reasons, such political forecasting efforts have had limited success. 


The US Defense Department's 2003 Policy Analysis Market, for example, vanished after it was suggested that future terrorist attacks might be an appropriate topic. There was widespread criticism that the US government should not be facilitating a market in which those with inside knowledge of impending attacks could profit.


Conclusion


Bettors see a lot of potential in it beyond the casino and racetrack. They are drawn to it as a tool for honing estimation and decision-making skills. 


However, as Kahneman and Tversky demonstrated, it runs up against a widespread aversion to strict probabilistic thinking, which is required for efficient betting.


Indeed, as I read Thinking, Fast and Slow, I wondered how often Kahneman actually got people to take their bets, even if only hypothetically. 


People are often free to express their opinions with qualifiers such as "probably," "rarely," "a lot," or something similar, but moving from there to odds, probabilities, and bets is usually a bridge too far.













No comments:

Post a Comment

A Roadmap to Becoming a Data Analyst in 2023: Skills and Steps to Success

Introduction: In today's data-driven world, the demand for skilled data analysts is constantly on the rise. If you have an interest ...