≫ Mean and Variance: Statistical Measures and Dispersion

Mean and Variance

The main point of this lecture is to tie those concepts back into sigma notation, which is one of the main uses for summation. We’ll walk through this very slowly and unpack it for you. If we have a set of numbers X, with N numbers in them (XI to Xh) they’re just real numbers. Mean and variance are two important concepts in statistics. They are used to describe a distribution of numbers. The mean is the average value of all the numbers, while the variance is the average squared deviation from the mean. The mean is calculated by adding up all the values and dividing by how many there were. The variance is calculated by taking each value, subtracting the mean and squaring it, then adding those squares together and taking their sum. The mean of that set can be expressed in summation notation as. This is called the variance of x. The point of this lecture is to understand what that is and finally by the way, that’s just plain old sigma which is the square root of sigma squared Mean And Variance x={x_1…x_n} M_x=1/n [sum_(i=1)^n (xi-M_x)^2] <— Variance of x σ_x=sqrt(σ^2 x)= Standart Deviation The standard deviation is used for describing sets of numbers. We can calculate the standard deviation of a set that has three elements. In this case, the set Z has three numbers: 1, 5 and 12. We can see that the sum of these three numbers is 19; however, they can also be expressed as a weighted average using sigma notation: 1/3 * 1 + 5/3 * 5 + 12/3 * 12 = 11. If we take the mean of set Z, what that really means is that we add up all of the numbers (1+5+12) and divide by how many numbers we have (3). So if we do that, it’s never a good idea to do arithmetic in public, obviously we’ve done this in advance. So it’s 18 divided by 3 which is 6, that’s the mean. And there’s lots of notation for it. The most correct general notation might be the Greek symbol mu for mean, but most people just use an open circle or dot instead of mu when they’re writing formulas because we don’t need to distinguish between mathematical notation and regular English words anymore. Sometimes you’ll see it written as M(Z), but often you’ll just see u by itself because we know what we’re talking about already. That’s a simple example of numbers. Let’s do a slightly harder example with symbols instead. Suppose we have set y, consisting of four numbers but I don’t tell you what they are. yl, y2, y3, and y4. Z={1, 5, 12} |Z|=3 M_z (1+5+12)/3=18/3=6 M_(z) Y={y_1, y_2, y_3, y_4} Then M_y=1/4(y_1+y_2+y_3+y_4)=1/4(sum_(i=1)^4 yi) Suppose we have a set x consisting of n numbers, x1, x2 up to n. Then the mean of Y, mu sub Y, would be 1/4 times x1 + x2 + x3 +x4 and here comes the punchline: let’s express this in sigma notation. This is 1/4 times the sum from i = 1 to 4 of yi. And remember, i is a dummy index. So let’s not get too radical; let’s use i. We generalize a little bit further. In general suppose we have set x consisting of n arbitrary numbers. The mean of x, Is pretty easy to guess this now. Mu of x is equal to 1 over n times the summation from i = 1 to n of x sub i. That’s the meaning using sigma notation. By the way, it’s worth thinking a little bit about the two different philosophical functions of i and n. The variable i is called a dummy variable because it takes the place of another variable in an equation. The variable n, which represents the number of observations or data points you wish to examine, is often used in statistics. For example, if n represents 10, then you would stop computing statistics at 10. If n represents 11, then you would stop computing statistics at 11, and so on. In the previous example, we saw that n was 4. Here is an example of mean centering data: here is the friend Z with three elements in it 1, 5, and 12; previously we computed that the mean of Z equals 6. Over here we see three elements of Z in blue 1, 5 and 12; and there’s the mean 6 there. Let’s form a new set, let’s call it Z prime. We’ll subtract the mean from every element in Z. For instance, 1-6, 5-6, 12-6 = -5, -1, 6. There’s Z prime. If we compute the mean of Z prime, we get 0. This is -5 + -1 + 6 divided by 3, which works out to be 0. What we’re actually doing is essentially pretending that the red dot at 6 was zero, so we are moving it back over to zero and will shift everything over with it. In other words, if you think about it this way, the red dot at 6 is going into 0 and becomes a red dot. [Graph] The mean is a value representing the average of a set of numbers. The variance is another way of measuring how values are spread out from the mean. And when you mean-center data, you produce a new data set which has the same relationships but whose mean is 0. We’ll discuss more reasons for doing this later. [Graph] The mean of x is M_x=1/n(sum_(i=1)^n x_i) In statistics and data science, the mean is an important measure of central tendency for a data set. In this case, large is 30, but large could be 3 million. Statisticians and data scientists often do not like large amounts of numbers or information. They want to summarize sets by small sets of numbers. Summarizing a set by its mean is about the simplest thing you can do, but it gives some information. What we’re going to see here is an example of where it obviously doesn’t give complete information. Here is a set Z, which is 1, 5, 12. We’re getting bored with this; it’s our friend, we get bored with our friends. Mu sub Z is 6: fine. Here’s another set W: 5, 6 and 7; if you calculate mu sub W it turns out that’s also 6 so you can check yourself that that’s true too; so obviously it’s not the case that the mean is not a unique classifier of a set. We have two sets with the same meaning. Let’s look at these sets of numbers on the number line and see that the mean is not telling the whole story. Here is 0, here’s the mean at 6, and let’s draw a Z in blue. So here’s 1, 5, and 12, and let’s say we’ll draw W in yellow. W actually has a dot right here at 5; a dot right there at 6; and another dot at 7. Blue and yellow are similar in that they both refer to the same color, but their meanings differ. As you might say, blue is more generalized than yellow. Variance is a statistical mathematical data science concept that examines how generalized or spread out numerical values are. If X (×1, ×2, ×3…) represents these numbers, then variance is equal to sigma squared x divided by one over n or & (xi/n). This equation can be intimidating at first glance, but it simply tells us that variance is equal to 1/(n-1) times the sum of all of our xi divided by n minus 1. We ask how far a value xi is from the mean, or average. We square this distance to find the variance. Variance Z={1, 5, 12} W={5, 6, 7} M_z=6 M_w=6 [Graph] The term inside the square, xi- mu sub x, is referred to as the deviation. The reason we square it is we don’t really care if you’re to the right of the mean or the left of the mean; what we care about is how far away from it you are. So for example, if xi was 1 and the mean was 6, that would be a pretty big number. If xi was 5 and the mean was 6, then it’d not be that particularly big of a number. And then essentially what we’re doing here is we’re taking the average of those numbers. We’re taking the mean of those numbers by dividing by n which is why we divide by n. That’s essentially what this variance formula does. If we take the sigma of x, which is just the square root of sigma squared, this is called the standard deviation of x. To understand the concept of variance, it is helpful to work through a simple example. To do this, let’s first pretend that we know what the mean and standard deviation are for two populations, and then see if we can figure out which one has more dispersion. For these two examples, let’s assume that Z=1512 and W=1323. Example Z={1, 5, 12} W={5, 6, 7} M_z=6 M_w=6 σ^2_w=1/3[sum_(i=1)^3 (W_i-M_w)^2]=1/3[(5-6)^2+(6-6)^2+(7-6)^2]=1/3[(-1)^2+0^2+1^2]=2/3; σ_w sqrt(2/3) σ^2_z=1/3[(1-6)^2+(5-6)^2+(12-6)^2]=…62/3>>2/3 W is 5, 6, 7, And the mean of Z is 6 which turns out to be the mean of W because that’s how we cooked it. Let’s start with the easy one. So the sigma squared of w is going to be, in this case n is 3 so 1 over 3. Times the sum from i = 1 to 3 of, let’s call this w1, w2, w3. And this over here is z1, z2, z3. The sum of wi minus mean of w squared is equal to one-third. So now the first one, w1 is 5- 6 squared + 6- 6 squared + 7- 6 squared. And if work that out, that turns out to be one third times -1 squared + O squared + 1 squared. The variance of a standard normal distribution is two-thirds, which means that the standard deviation is the square root of two-thirds. If we do another calculation, we find that the variance of a standard t distribution (which is the same shape as Z) is one-third times 1- 6 squared + 5- 6 squared + 12- 6 squared. The point is that it justifies saying much much greater than two-thirds which justifies our intuition that Z and W have the same mean but Z has greater spread out as measured by its variance.

Mean and Variance: Statistical Measures and Dispersion – 2

Mean and Variance