Now, I just happened to pick California off the top of my head, but let's get a full probability distribution for every state. So if you think about a fair coin, a fair coin has two sides, and so there's an equal chance or equal probability of the coin landing on heads or landing on tails if you flip it, and so here's just some examples, some notation that we might use, so we have the probability that the coin lands on heads is gonna be equal to 0.5 or 1/2, and here's some notation that you might encounter in other places if you see it, so sometimes probability is denoted as lowercase p and the event is gonna be in parentheses, sometimes, it'll, especially with coins, sometimes just shortened to h or t. Sometimes we capital p, sometimes we'll write the full word Prob for probability but this is just some notation. In each case we may have some knowledge of the likelihood of various possible results, but we cannot predict with any certainty the Read More. For coin flipping, there is an equal probability of having heads or tails (1/2 each), and we represent it by the following expression: Probability is usually represented by "p" and the event is denoted with a capital letter between parentheses, but there's not really a standard notation as seen above. This we just call dot size, will give us the number of flights in each group. So, please use all of these CSV files I have to your advantage so you get a better understanding of the dataset, so let's get started. Your email address will not be published. So we can do that, we'll say num flights per state. There's just single number and I'm just taking each one of these groups and dividing it by total flights. Well, in that case, we have two possibilities (sides 5 and 6 of the dice). Probability is an area of study which involves predicting the relative likelihood of various outcomes. We can do some more advanced, we can do a bit of more advanced problems, so we can just ask, what's the likelihood that this dice lands on an even number? And if you go inside, I have a ton of CSV files and the main dataset that we're gonna be working with, this is flights.csv, and it has a little over half a million US domestic flights from the year 2017. So now we've grouped our flights by the state. of Variance (ANOVA), boxplots, charts, bar graphs and more. So I want to know the probability that a flight started in in New York, in California, in Wyoming, and Texas, and so on and so on. That's all that this is doing. So probability is a likelihood of some event happening. For that, we need to. So there are five out of six and so I get end up with five six so that's just kind of the complement of this, what it means for an event to have a complement. But let's get what the maxed is. So I'm gonna say this divided by total flights here. You go to pandas.pydata.org and click documentation. You might see, you might as end up seeing all of these. See some of the probabilities: Finally, to find out what the maximum probability is and its corresponding state, we run: It turns out that the state with maximum probability as origin state of a randomly picked flight of all 2017 domestic U.S. flights is, in fact, California (with its 13% probability). And that's a kind of, that's what we're gonna be working with. It is applied directly to many practical problems, and several very useful distributions are based on it. Characteristics Many empirical frequency distributions have the following characteristics: They are approximately symmetrical, and the mode is close to the centre of the distribution. We're gonna try to see if we can build a naive Bayes classifier that can predict if our flight is going to arrive late so it's a really cool application of all the probability that we're gonna be learning but we have to actually get started learning some of this probability, so I just wanna start off just introducing some concepts in probability and some notation, just so that everyone is on the same page. Okay.