# Statistics Statistics is a relatively simple concept: 1. Find many similar things to measure (e.g., a population of people). 2. There's usually a *ton* of those things, so figure out how many of that thing to measure to give it [scientific rigor](science.md) without wasting too much time or money on it. 3. Pile up that information until it shows a bell curve across a numbered range. 4. Calculate the center of that bell curve, then determine where most of those things sit from that center. Typically, over half of everything will fit within 1 standard deviation, and about 95% will fit inside two of them. Linearity vs nonlinearity is one of the central distinctions in mathematics. Nonlinear thinking means which way you should go depends on where you already are. The relation between eating and health isn't linear, but curved, with bad outcomes on both ends. The Laffer curve: The horizontal axis here is level of taxation, and the vertical axis represents the amount of revenue the government takes in from taxpayers. On the left edge of the graph, the tax rate is 0%. Linear reasoning is everywhere. You're doing it every time you say that if something is good to have, having more of it is even better. Statistics are essential for reasoning about a huge variety of problems. - but ONLY if that person has legitimate experience in that domain or something relatively similar. statistical analysis is the WORST thing because it works against how our minds and bias operate - a likely thing according to our minds isn't always statistically likely because we'll favor that odd experience and forget the rest of the time - an unlikely thing according to our minds is statistically random, and we were just lucky/unlucky QUIZ: A test of a disease presents a rate of 5% false positives. The disease strikes 1/1000 of the population. People are tested at random, regardless of whether they are suspected of having the disease. A patient's test is positive. What is the probability of the patient having the disease? Most doctors say 95%! True answer is 1/50. (2%). The positive reading just made it 20 times more likely the patient has it. Now 1/1000 chance is increased to 1/50 chance. Like actors into stardom, people patronize whatother people like to do. Forcing rational dynamics on the process would be impossible. This is called a "path dependent outcome" and has thwarted many mathematical attempts at modeling behavior. Detecting covariation (or correlation): you have to pay attention to all four cells in order to be able to answer the simple question about association. Compute the ratio comparing the number of people who don't have the disease but do have the symptom with the number of people who don't have the disease and don't have the symptom. Since the two ratios are the same, you know that the symptom is not associated with the disease. Often a given correlation is so consistent with plausible ideas about causation that we tacitly accept that the correlation establishes that there is a causal relation. Causal inferences are often irresistible. If I tell you that people who eat more chocolate have more acne, it's hard to resist the assumption that something about chocolate causes acne. (It doesn't, so far as is known.) Inquiry is fatal to certainty. - Will Durant, philosopher There is a fairly plausible hypothesis called the "germ exposure theory" that could account for the correlational and natural experiment evidence. Now are you ready to get your baby dirty? Personally, I'm not sure I would be. Natural experiments, correlational evidence, and plausible theories are all well and good. But I would want to see a true experiment of the double-blind, randomized control sort, with babies assigned by the proverbial flip of a coin to an experimental high-bacteria exposure condition versus a control, low-bacteria condition. Both the experimenter and the participants (the mothers in this case) should be ignorant of (blind to) the condition the babies were assigned to. Ignorance resulting from this double-blind design rules out the possibility that results could have been influenced by either the experimenter's or the participant's knowing what condition the participant was in. - this never happens in real life because measuring that information is often [taboo] Studies that rely on correlations to establish a scientific fact can be hopelessly misleading. Does classroom size affect learning? Are multivitamins good for your health? Is there employer prejudice against the long-term unemployed - simply because they've been out of a job for a long time? All the questions in this chapter ask whether some independent or predictor variable - an input or a presumed cause - affects some dependent or outcome variable - an output or an effect. The smaller the sample size - the greater the variation. The law of averages is not very well named, because laws should be true, and this one is false. Coins have no memory. So the next coin you flip has a 50-50 chance of coming up heads, the same as any other. The way the overall proportion settles down to 50% isn't that fate favors tails to compensate for the heads that have already landed; it's that those first ten flips become less and less important the more flips we make. Don't talk about percentages of numbers when the numbers might be negative. The more chances you give yourself to be surprised, the higher your threshold for surprise had better be. The null hypothesis, in executive bullet-point form: Run an experiment. Suppose the null hypothesis is true, and let p be the probability (under that hypothesis) of getting results as extreme as those observed. The number p is called the p-value. If it is very small, rejoice; you get to say your results are statistically significant. If it is large, concede that the null hypothesis has not been ruled out. The lexical double-booking (double meaning) of the word "significance" has consequences. Twice a tiny number is a tiny number. Both numbers are more or less zero. "Statistically noticeable" or "statistically detectable" would be a better term than "statistically significant" That would be truer to the meaning of the method, which merely counsels us about the existence of an effect but is silent about its size or importance. But it's too late for that. We have the language we have. Statistical study that's not refined enough to detect a phenomenon of the expected size is called underpowered - the equivalent of looking at the planets with binoculars. Moons or no moons, you get the same result, so you might as well not have bothered. Players who had just made a shot were more likely to take a more difficult shot on their next attempt. The "hot hand" in basketball might "cancel itself out": players, believing themselves to be hot, get overconfident and take shots they shouldn't. Suppose the hypothesis H is true. It follows from H that a certain fact F cannot be the case. But F is the case. Therefore, H is false. Suppose the null hypothesis H is true. It follows from H that a certain outcome O is very improbable (say, less than Fisher's 0.05 threshold). But O was actually observed. Therefore, H is very improbable. Impossible things never happen. But improbable things happen a lot. Scientists, subject to the intense pressure to publish lest they perish, are not immune to temptations. It takes a lot of mental strength to stuff years of work in the file drawer. Scientists may "torture the data until it confesses." The purpose of statistics isn't to tell us what to believe, but to tell us what to do. Statistics is about making decisions, not answering questions. The average value of a large collection of measurements is about the same as the average value of a small collection, whereas the extreme value of a large collection is considerably more extreme than that of a small collection. The average scientist in tiny Belgium will be comparable to the average scientist in the United States, even though the best scientist in the United States will in general be better than Belgium's best. Questions arise naturally when one transcends one's self, family, and friends. How many? How long ago? How far away? How fast? What links this to that? Which is more likely? Our innate desire for meaning and pattern can lead us astray if we don't remind ourselves of the ubiquity of coincidence. Regression to the mean is the natural behavior of any random quantity. Behavior is most likely to improve after punishment and to deteriorate after reward. The sequel to a great movie is usually not as good as the original. The same can be said of the novel after the best-seller, the album that follows the gold record. Simply another instance of regression to the mean. Statistics is to probability as engineering is to physics - an applied science based on a more intellectually stimulating foundational discipline. Capture-recapture method: Assume we want to know how many fish are in a certain lake. We capture one hundred of them, mark them, and then let them go. After allowing them to disperse about the lake, we catch another hundred fish and see what fraction of them are marked. If eight of the hundred we capture are marked, then a reasonable estimate of the fraction of marked fish in the whole lake is 8 percent. Of course, care must be taken that the marked fish don't die as a result of the marking, that they're more or less uniformly distributed about the lake, that the marked ones aren't only the slower or more gullible among the fish, etc. Central limit theorem states that the sum (or the average) of a large bunch of measurements follows a normal curve even if the individual measurements themselves do not. Quite often, two quantities are correlated without either one being the cause of the other. Changes in both quantities to be the result of a third factor. Body lice were considered a cause of good health. When people took sick, their temperatures rose and caused the body lice to seek more hospitable abodes. The lice and good health both departed because of the fever. The correlation between the quality of a state's day-care programs and the reported rate of child sex abuse in them is certainly not causal, but merely indicates that better supervision results in more diligent reporting of the incidents which do occur. A technically correct yet misleading statistic is the fact that heart disease and cancer are the two leading killers of Americans. This is undoubtedly true, but according to the Centers for Disease Control, accidental deaths - in car accidents, poisonings, drownings, falls, fires, and gun mishaps - result in more lost years of potential life, since the average age of these victims is considerably lower than that of the victims of cancer and heart disease. High-profile psychological studies were more likely not to replicate than to stand up. The famous psychological results are famous not because they are the most rigorously demonstrated, but because they're interesting. Regression toward the mean: in any series of random events an extraordinary event is most likely to be followed, due purely to chance, by a more ordinary one. The law of small numbers. [124: Blogofractal - explain xkcd](https://www.explainxkcd.com/wiki/index.php/124:_Blogofractal) [216: Romantic Drama Equation - explain xkcd](https://www.explainxkcd.com/wiki/index.php/216:_Romantic_Drama_Equation) [403: Convincing Pickup Line - explain xkcd](https://www.explainxkcd.com/wiki/index.php/403:_Convincing_Pickup_Line) [5 Ways to Measure Mutual Fund Risk](https://www.investopedia.com/investing/measure-mutual-fund-risk/) [A New Coefficient of Correlation | Hacker News](https://news.ycombinator.com/item?id=29687613) [[1909.10140] A new coefficient of correlation](https://arxiv.org/abs/1909.10140) [Are You Trading or Gambling? | Hacker News](https://news.ycombinator.com/item?id=26283650) [Are you trading or gambling? - by Khris - Games of Chance](https://gamesofchance.substack.com/p/are-you-trading-or-gambling) [Normalization of Deviance (2015) | Hacker News](https://news.ycombinator.com/item?id=34791106) [Normalization of deviance](https://danluu.com/wat/) [Statistics Done Wrong | Hacker News](https://news.ycombinator.com/item?id=19038374) [Welcome - Statistics Done Wrong](https://www.statisticsdonewrong.com/) [The World is Built on Probability (1984) | Hacker News](https://news.ycombinator.com/item?id=35937375) [The World Is Built On Probability : Lev Tarasov : Free Download, Borrow, and Streaming : Internet Archive](https://archive.org/details/lev-tarasov-the-world-is-built-on-probability-mir-2023) [Statistical Rethinking (2022 Edition) | Hacker News](https://news.ycombinator.com/item?id=29956390) [rmcelreath/stat_rethinking_2022: Statistical Rethinking course winter 2022](https://github.com/rmcelreath/stat_rethinking_2022) [What are the most important statistical ideas of the past 50 years? | Hacker News](https://news.ycombinator.com/item?id=30417811) [Full article: What are the Most Important Statistical Ideas of the Past 50 Years?](https://www.tandfonline.com/doi/full/10.1080/01621459.2021.1938081) [552: Correlation - explain xkcd](https://www.explainxkcd.com/wiki/index.php/552) [Statistics 110: Probability](https://projects.iq.harvard.edu/stat110/home) [Statistics and its application | Britannica](https://www.britannica.com/summary/statistics) [A Concrete Introduction to Probability (2018) | Hacker News](https://news.ycombinator.com/item?id=27379366) [pytudes/ipynb/Probability.ipynb at main · norvig/pytudes](https://github.com/norvig/pytudes/blob/main/ipynb/Probability.ipynb) [Seeing Theory: A visual introduction to probability and statistics (2017) | Hacker News](https://news.ycombinator.com/item?id=18769099) [Seeing Theory](https://seeing-theory.brown.edu/index.html) [Simple Statistics](https://simple-statistics.github.io/) [Statistics for Beginners - Top Stats Concepts to Know Before Getting into Data Science](https://www.freecodecamp.org/news/top-statistics-concepts-to-know-before-getting-into-data-science) [An Introduction to Statistical Learning with Applications in Python | Hacker News](https://news.ycombinator.com/item?id=36643999) [An Introduction to Statistical Learning](https://www.statlearning.com/) [Introduction to Modern Statistics | Hacker News](https://news.ycombinator.com/item?id=37854846) [Introduction to Modern Statistics (2nd Ed)](https://openintro-ims2.netlify.app/) [Learn Statistics for Data Science, Machine Learning, and AI - Full Handbook](https://www.freecodecamp.org/news/statistics-for-data-scientce-machine-learning-and-ai-handbook/) [Cassie Kozyrkov](https://towardsdatascience.com/stats-gist-list-an-irreverent-statisticians-guide-to-jargon-be8173df090d) (2022) Stats Gist List: An Irreverent Statistician’s Guide to Jargon Plain-language band-aids to fix gaps in your statistics knowledge ## misc specific domains [Scope Insensitivity - LessWrong](https://www.lesswrong.com/posts/2ftJ38y9SRBCBsCzy/scope-insensitivity) [Shut Up and Multiply - LessWrong](https://www.lesswrong.com/tag/shut-up-and-multiply) [Expected value - Wikipedia](https://en.wikipedia.org/wiki/Expected_value) [Reference class forecasting - Wikipedia](https://en.wikipedia.org/wiki/Reference_class_forecasting) [Base rate - Wikipedia](https://en.wikipedia.org/wiki/Base_rate) [Statistical mechanics - Wikipedia](https://en.wikipedia.org/wiki/Statistical_mechanics) [Cromwell's rule - Wikipedia](https://en.wikipedia.org/wiki/Cromwell's_rule) [The Spoiler Effect | The Center for Election Science](https://electionscience.org/library/the-spoiler-effect/) [What is Statistical Significance? P Value Defined and How to Calculate It](https://www.freecodecamp.org/news/what-is-statistical-significance-p-value-defined-and-how-to-calculate-it) [What is a Correlation Coefficient? The r Value in Statistics Explained](https://www.freecodecamp.org/news/what-is-a-correlation-coefficient-r-value-in-statistics-explains) [The Monty Hall Problem](https://www.philosophyexperiments.com/montyhall/Default.aspx) [Power law - Wikipedia](https://en.wikipedia.org/wiki/Power_law#Lack_of_well-defined_average_value) [Simpson's paradox | Hacker News](https://news.ycombinator.com/item?id=39673754) [Simpson's paradox - Wikipedia](https://en.wikipedia.org/wiki/Simpson's_paradox) [Berkson's paradox - Wikipedia](https://en.wikipedia.org/wiki/Berkson's_paradox) [Exploring Histograms](https://tinlizzie.org/histograms/) [Permutation and Combination: The Difference Explained with Formula Examples](https://www.freecodecamp.org/news/permutation-and-combination-the-difference-explained-with-formula-examples) [Poisson distribution - Wikipedia](https://en.wikipedia.org/wiki/Poisson_distribution) [Overly analytical guide to escorting | Hacker News](https://news.ycombinator.com/item?id=28924751) [Becoming A Whorelord: The Overly Analytical Guide To Escorting - Knowingless](https://knowingless.com/2021/10/19/becoming-a-whorelord-the-overly-analytical-guide-to-escorting/) [What is Stratified Random Sampling? Definition and Python Example](https://www.freecodecamp.org/news/what-is-stratified-random-sampling-definition-and-python-example) [Viewing Matrices & Probability as Graphs](https://www.math3ma.com/blog/matrices-probability-graphs) [What is an Outlier? Definition and How to Find Outliers in Statistics](https://www.freecodecamp.org/news/what-is-an-outlier-definition-and-how-to-find-outliers-in-statistics) [The Birthday Paradox Experiment](https://pudding.cool/2018/04/birthday-paradox/) [Law of Large Numbers: What It Is, How It's Used, Examples](https://www.investopedia.com/terms/l/lawoflargenumbers.asp) [How percentile approximation works and why it's more useful than averages | Hacker News](https://news.ycombinator.com/item?id=28526966) [How Percentile Approximation Works (and Why It's More Useful Than Averages)](https://www.timescale.com/blog/how-percentile-approximation-works-and-why-its-more-useful-than-averages/) [Possibility Space](https://www.possibilityspace.org/tutorial-generative-possibility-space/) [Possibility Space](https://www.possibilityspace.org/tutorial-sampling/index.html) [Prior probability - Wikipedia](https://en.wikipedia.org/wiki/Prior_probability) ## probability An identical set of winning numbers came up twice in a single week. It was improbable that the names of medieval rabbis are hidden in the letters of the Torah. But is it? Improbability is a relative notion, not an absolute one. When we say an outcome is improbable, we are always saying, explicitly or not, that it is improbable under some set of hypotheses we've made about the underlying mechanisms of the world. To calculate probabilities: If two events are independent in the sense that the outcome of one event has no influence on the outcome of the other, then the probability that they both occur is computed by multiplying the probabilities of the individual events. The probability of obtaining two heads in two flips of a coin is ½ × ½ = ¼ since the probability of five straight coin flips resulting in heads is (½)⁵ = 1⁄32. The probability that an event doesn't occur is equal to 1 minus the probability that it does (a 20 percent chance of rain implies an 80 percent chance of no rain). (5⁄6)⁴ is the probability of not rolling a 6 in four rolls of the die. Hence, subtracting this number from 1 gives us the probability that this latter event (no 6s) doesn't occur or, in other words, of there being at least one 6 rolled in the four tries: 1 - (5⁄6)⁴ = .52. Likewise, the probability of rolling at least one 12 in twenty-four rolls of a pair of dice is seen to be 1 - (35⁄36)²⁴ = .49. Binomial probability distribution arises whenever a procedure or trial may result in "success" or "failure" and one is interested in the probability of obtaining R successes in N trials. Half of the time that twenty-three randomly selected people are gathered together, two or more of them will share a birthday. By dividing this latter product (365 × 364 × 363 × 362 × 361) by 3655, we get the probability that five people chosen at random will have no birthday in common. Now, if we subtract this probability from 1, we get the complementary probability that at least two of the five people do have a birthday in common. A similar calculation using 23 rather than 5 yields ½, or 50 percent, as the probability that at least two of twenty-three people will have a common birthday. It would be very unlikely for unlikely events not to occur. If you don't specify a predicted event precisely, there are an indeterminate number of ways for an event of that general kind to take place. The expected value of a quantity is simply the average of its values weighted according to their probabilities. For example: If ... * 1/4 of the time a quantity equals 2 * 1/3 of the time it equals 6 * 1/3 of the time it equals 15 * remaining 1⁄12 of the time it equals 54 ... then its expected value equals 12. This is so since [12 = (2 × 1/4) + (6 × 1⁄3) + (15 × 1⁄3) + (54 × 1⁄12)]. Consider a home-insurance company: On average, each year... * one out of every 10,000 of its policies will result in a claim of $200,000 * one out of 1,000 policies will result in a claim of $50,000 * one out of 50 will result in a claim of $2,000 * the remainder will result in a claim of $0. The insurance company would like to know what its average payout is per policy written. The answer is the expected value, which in this case is ($200,000 × 1/10,000) + ($50,000 × 1/1,000) + ($2,000 × 1/50) + ($0 × 9,789/10,000) = $20 + $50 + $40 = $110. Chuck-a-luck: You pick a number from 1 to 6 and the operator rolls three dice. If the number you pick comes up on all three dice, the operator pays you $3 if it comes up on two of the three dice, he pays you $2 if it comes up on just one of the three dice, he pays you $1. Only if the number you picked doesn't come up at all do you pay him anything - just $1. Say you pick the number 4. Chances that a 4 will come up on all three dice? = 1⁄6 × 1⁄6 × 1⁄6 = 1⁄216; so, approximately 1/216th Chances of a 4 coming up only twice? = Use the binomial probability distribution: X44, 4X4, or 44X, the X indicating a non-4. The probability of the first is 5⁄6 × 1⁄6 × 1⁄6 = 5⁄216, the probability of a 4 coming up on two of the three dice is 15⁄216, probability of obtaining exactly one 4 among 4XX is 1⁄6 × 5⁄6 × 5⁄6 = 25⁄216, Adding, we get 75⁄216. Chances that no 4s come up when we roll three dice? = find how much probability is left over. Subtract (1⁄216 + 15⁄216 + 75⁄216) from 1 The expected value of your winnings is thus: ($3 × 1⁄216) + ($2 × 15⁄216) + ($1 × 75⁄216) + (-$1 × 125⁄216) = $(-17⁄216) = - $.08 And so, on average, you would lose approximately eight cents every time you played this seemingly attractive game. Gambler's fallacy is the mistaken belief that because a coin has come up heads several times in a row, it's more likely to come up tails on its next flip. Peter and Paul, who flip a coin at the rate of once a day and who bet on heads and tails respectively. Whoever is ahead will probably have been ahead almost the whole time. If Peter is ahead at the end, it's considerably more likely that he's been ahead more than 96 percent of the time than that he's been ahead between 48 percent and 52 percent of the time. It can take a long, long time for the lead to switch. The number of accidents each year at a certain intersection, the number of rainstorms per year in a given desert, the number of cases of leukemia in a specified county, have all been described quite accurately by the so-called Poisson probability distribution. It's necessary first to know roughly how rare the event is. But if you do know, you can use this information along with the Poisson formula to get a quite accurate idea of the percentage of years in which there would be no desert rainstorms, one such storm, two storms, three, and so on. In this sense, even very rare events are quite predictable. Assume the probability to be one out of 10,000 that a particular dream matches in a few vivid details some sequence of events in real life. Since (9,999/10,000)365 is about .964, we can conclude that about 96.4 percent of the people who dream every night will have only nonmatching dreams during a one-year span. But that means that about 3.6 percent of the people who dream every night will have a predictive dream. Incredible coincidences, whose probability, let's say, is estimated to be one in a trillion (1 divided by 1012, or 10−12). Should we be impressed? Not necessarily. Since by the multiplication principle there are (2.5 × 108 × 2.5 × 108)/2 or 3.13 × 1016 different pairs of people in the United States, and since we're assuming the probability of this collection of coincidences to be about 10−12, the average number of "incredible" linkages we can expect is 3.13 × 1016 times 10−12, or about 30,000. Astrology: The gravitational pull of the delivering obstetrician far outweighs that of the planet or planets involved. Does this mean that fat obstetricians deliver babies that have one set of personality characteristics, and skinny ones deliver babies that have quite different characteristics? There is no correlation between the date of one's birth and scores on any standard personality test. Birth dates of more than 16,000 scientists and 6,000 politicians found the distribution of their signs was random, the signs uniformly distributed throughout the year. The records of 3,000 married couples found no correlation between their signs and astrologers' predictions about compatible pairs of signs. Many mundane mistakes in reasoning can be traced to a shaky grasp of the notion of conditional probability. Unless the events A and B are independent, the probability of A is different from the probability of A given that B has occurred. The probability of rolling a pair of dice and getting a 12 is 1⁄36. The conditional probability of getting a 12 when you know you have gotten at least an 11 is 1⁄3. A confusion between the probability of A given B and the probability of B given A is also quite common. Imagine a man with three cards. One is black on both sides, one red on both sides, and one black on one side and red on the other. He drops the cards into a hat and asks you to pick one, but only to look at one side; let's assume it's red. The man notes that the card you picked couldn't possibly be the card that was black on both sides, and therefore it must be one of the other two cards - the red-red card or the red-black card. He offers to bet you even money that it is the red-red card. Is this a fair bet? At first glance, it seems so. There are two cards it could be; he's betting on one, and you're betting on the other. But the rub is that there are two ways he can win and only one way you can win. His chances of winning are thus 2⁄3. The conditional probability of the card being red-red given that it's not black-black is ½, but that's not the situation here. We know more than just that the card is not black-black; we also know a red side is showing. Confusing a conditional statement - if A, then B - with its converse - if B, then A - is a very common mistake. A slightly unusual version of it occurs when people reason that if X cures Y, then lack of X must cause Y. If the drug dopamine, e.g., brings about a decrease in the tremors of Parkinson's disease, then lack of dopamine must cause tremors. If some other drug relieves the symptoms of schizophrenia, then an excess of it must cause schizophrenia. One is not as likely to make this mistake when the situation is more familiar. Not too many people believe that since aspirin cures headaches, lack of aspirin in the bloodstream must cause them. Imagine four dice, A, B, C, and D, strangely numbered as follows: * A has 4 on four faces and 0 on two faces * B has 3s on all six faces * C has four faces with 2 and two faces with 6 * D has 5 on three faces and 1 on three faces. If die A is rolled against die B, die A will win - by showing a higher number - two-thirds of the time. If die B is rolled against die C, B will win two-thirds of the time. If die C is rolled against die D, it will win two-thirds of the time. Nevertheless, and here's the punch line, if die D is rolled against die A, it will win two-thirds of the time. A beats B beats C beats D beats A, all two-thirds of the time. That die C beats die D may require some explanation: Half of the time, a 1 will turn up on die D, in which case die C will certainly win. The other half of the time, a 5 will turn up on die D, in which case die C will win one-third of the time. Thus, since C can win in these two different ways, it beats D exactly ½ + (½ × 1⁄3) = 2⁄3 of the time. A dress whose price has been "slashed" 40 percent and then another 40 percent has been reduced in price by 64 percent, not 80. Always ask yourself: "Percentage of what?" If profits are 12 percent, for example, is this 12 percent of costs, of sales, of last year's profits, or of what? When I hear that something or other is selling at a fraction of its normal cost, I comment that the fraction is probably 4⁄3. The "broad base" fallacy: quoting the absolute number rather than the probability. "Holiday Carnage Kills 500 Over Four-Day Weekend" (this is about the number killed in any four-day period). Someone offers you a choice of two envelopes and tells you one has twice as much money in it as the other. You pick envelope A, open it, and find $100. Envelope B must, thus, have either $200 or $50. When the proposer permits you to change your mind, you figure you have $100 to gain and only $50 to lose by switching your choice, so you take envelope B instead. The question is: Why didn't you choose B in the first place? It's clear that no matter what amount of money was in the envelope originally chosen, given permission to change your mind, you would always do so and take the other envelope. Without any knowledge of the probability of there being various amounts of money in the envelopes, there is no way out of this impasse. Variations of it account for some of the "grass is always greener" mentality. A tenfold gap in sexual activity - with one person having sex once a month, and another having sex ten times a month - is far more common than a tenfold gap in income. Derren Brown once produced an undoctored film of him tossing a coin into a bowl and getting heads ten times in a row. Brown later explained the trick: the stunning sequence came only at the end of nine excruciating hours of filming, when the string of ten heads finally materialized. Extraordinary events can happen without extraordinary causes. Random events often look like nonrandom events. The probability that two events will both occur can never be greater than the probability that each will occur individually. Cicero wrote that "probability is the very guide of life." These three laws, simple as they are, form much of the basis of probability theory: - If two possible events, A and B, are independent, then the probability that both A and B will occur is equal to the product of their individual probabilities. - If an event can have a number of different and distinct possible outcomes, A, B, C, and so on, then the probability that either A or B will occur is equal to the sum of the individual probabilities of A and B, and the sum of the probabilities of all the possible outcomes (A, B, C, and so on) is 1 (that is, 100 percent). - When you want to know the chances that two independent events, A and B, will both occur, you multiply; if you want to know the chances that either of two mutually exclusive events, A or B, will occur, you add. [Probability theory and its application | Britannica](https://www.britannica.com/summary/probability-theory) [Techniques for probability estimates - LessWrong](https://www.lesswrong.com/posts/r8aAqSBeeeMNRtiYK/techniques-for-probability-estimates) [Show HN: A Set of Dice That Follows the Gambler's Fallacy | Hacker News](https://news.ycombinator.com/item?id=14805265) [xori/gamblers-dice: A terrible idea, now real.](https://github.com/xori/gamblers-dice) [Gambler's Dice](https://xori.github.io/gamblers-dice/) ## linear regression Linear regression is a marvelous tool, versatile, scalable, and as easy to execute as clicking a button on your spreadsheet. Whenever you want to understand which variables drive which other variables, and in which direction, it's the first thing you reach for. And it works on any data set at all. But, like a table saw, if you use it without paying careful attention to what you're doing, the results can be gruesome. [The Truth About Linear Regression (2015) | Hacker News](https://news.ycombinator.com/item?id=41111115) [The Truth About Linear Regression](https://www.stat.cmu.edu/~cshalizi/TALR/) ## axioms Benford's law: In any collection of statistics, a given statistic has roughly a 30% chance of starting with the digit 1. ## randomness [Elegant six-page proof reveals the emergence of random structure | Hacker News](https://news.ycombinator.com/item?id=31162576) [Quanta Magazine](https://www.quantamagazine.org/elegant-six-page-proof-reveals-the-emergence-of-random-structure-20220425/) [random(random(random(random()))) | Hacker News](https://news.ycombinator.com/item?id=35953286) [random(random(...)) - OpenProcessing](https://openprocessing.org/sketch/1575230/) [Fair coins tend to land on the same side they started | Hacker News](https://news.ycombinator.com/item?id=37829926) [[2310.04153] Fair coins tend to land on the same side they started: Evidence from 350,757 flips](https://arxiv.org/abs/2310.04153) [Fair coin - Wikipedia](https://en.m.wikipedia.org/wiki/Fair_coin#Fair_results_from_a_biased_coin) [We think this cool study we found is flawed. Help us reproduce it | Hacker News](https://news.ycombinator.com/item?id=31222642) [We think this cool study we found is flawed. Help us reproduce it.](https://pudding.cool/2022/04/random/) [221: Random Number - explain xkcd](https://www.explainxkcd.com/wiki/index.php/221:_Random_Number) [When Random Isn't | Hacker News](https://news.ycombinator.com/item?id=38994817) [When Random Isn't | orlp.net](https://orlp.net/blog/when-random-isnt/) [Is the largest root of a random real polynomial more likely real than complex? | Hacker News](https://news.ycombinator.com/item?id=40316788) [pr.probability - Is the largest root of a random polynomial more likely to be real than complex? - MathOverflow](https://mathoverflow.net/questions/470951/is-the-largest-root-of-a-random-polynomial-more-likely-to-be-real-than-complex) ## bayesian statistics [Better Bayesian Filtering](https://paulgraham.com/better.html) [Bayes' theorem - Wikipedia](https://en.wikipedia.org/wiki/Bayes'_theorem) [Bayesian Optimization Book | Hacker News](https://news.ycombinator.com/item?id=29197908) [Bayesian Optimization Book | Copyright 2023 Roman Garnett, published by Cambridge University Press](https://bayesoptbook.com) [Bayesian Statistics: The three cultures | Hacker News](https://news.ycombinator.com/item?id=41080373) [Bayesian statistics: the three cultures | Statistical Modeling, Causal Inference, and Social Science](https://statmodeling.stat.columbia.edu/2024/07/10/three-cultures-bayes-subjective-objective-pragmatic/) ## t-test [The t-test was invented at the Guinness brewery | Hacker News](https://news.ycombinator.com/item?id=40485313) [How the Guinness Brewery Invented the Most Important Statistical Method in Science | Scientific American](https://www.scientificamerican.com/article/how-the-guinness-brewery-invented-the-most-important-statistical-method-in/) ## big o [Big O Cheat Sheet - Time Complexity Chart](https://www.freecodecamp.org/news/big-o-cheat-sheet-time-complexity-chart) [How Big O Notation Works - Explained with Cake](https://www.freecodecamp.org/news/big-o-notation) ## galois theory [Galois Theory | Hacker News](https://news.ycombinator.com/item?id=41255456) [Galois Theory | The n-Category Café](https://golem.ph.utexas.edu/category/2024/08/galois_theory.html) ## machine learning statistics [Math Basics for Computer Science and Machine Learning [pdf] | Hacker News](https://news.ycombinator.com/item?id=20570025) [math-deep.pdf](https://www.cis.upenn.edu/~jean/math-deep.pdf) [Probabilistic Machine Learning: An Introduction | Hacker News](https://news.ycombinator.com/item?id=25593262) [probml.github.io/pml-book/book1.html](https://probml.github.io/pml-book/book1.html) [Statistics for Data Science - a Complete Guide for Aspiring ML Practitioners](https://www.freecodecamp.org/news/statistics-for-data-science) [The matrix calculus you need for deep learning (2018) | Hacker News](https://news.ycombinator.com/item?id=26676729) [[1802.01528] The Matrix Calculus You Need For Deep Learning](https://arxiv.org/abs/1802.01528) [Introduction to Linear Algebra for Applied Machine Learning with Python](https://pabloinsente.github.io/intro-linear-algebra) [Machines Are Inventing New Math We've Never Seen](https://www.vice.com/en/article/xgzkek/machines-are-inventing-new-math-weve-never-seen)