The distribution is considered normal if. Normal distribution of a random variable and the three sigma rule. Normal probability distribution function

Probability theory considers a fairly large number of different distribution laws. To solve problems related to the construction of control charts, only a few of them are of interest. The most important of them is normal distribution law, which is used to construct control charts used in quantitative control, i.e. when we are dealing with a continuous random variable. The normal distribution law occupies a special position among other distribution laws. This is explained by the fact that, firstly, it is most often encountered in practice, and, secondly, it is a limiting law, to which other laws of distribution approach under very common typical conditions. As for the second circumstance, it has been proven in probability theory that the amount is sufficient large number independent (or weakly dependent) random variables, subject to any distribution laws (subject to some very loose restrictions), approximately obeys the normal law, and this is true the more accurately the greater the number of random variables is summed. Most of the random variables encountered in practice, such as, for example, measurement errors, can be represented as the sum of a very large number of relatively small terms - elementary errors, each of which is caused by a separate cause, independent of the others. The normal law appears in cases where a random variable X is the result of a large number of different factors. Each factor separately is worth X influences slightly, and it is impossible to indicate which one influences more than the others.

Normal distribution(Laplace–Gaussian distribution) – probability distribution of continuous random variable X such that the probability distribution density for - ¥<х< + ¥ принимает действительное значение:

Exp (3)

That is, the normal distribution is characterized by two parameters m and s, where m is the mathematical expectation; s is the standard deviation of the normal distribution.

Value s 2 is the variance of the normal distribution.

The mathematical expectation m characterizes the position of the distribution center, and the standard deviation s (SD) is a characteristic of dispersion (Fig. 3).

f(x) f(x)


Figure 3 – Normal distribution density functions with:

a) different mathematical expectations m; b) different standard deviations s.

Thus, the value μ determined by the position of the distribution curve on the abscissa axis. Dimension μ - the same as the dimension of the random variable X. With growth mathematical expectation mboth functions are shifted in parallel to the right. With decreasing variance s 2 the density becomes increasingly concentrated around m, while the distribution function becomes increasingly steep.

The value of σ determines the shape of the distribution curve. Since the area under the distribution curve must always remain equal to unity, as σ increases the distribution curve becomes flatter. In Fig. Figure 3.1 shows three curves for different σ: σ1 = 0.5; σ2 = 1.0; σ3 = 2.0.

Figure 3.1 – Density functions of normal distribution with different standard deviations s.

The distribution function (integral function) has the form (Fig. 4):

(4)

Figure 4 – Integral (a) and differential (b) normal distribution functions

Particularly important is the linear transformation of a normally distributed random variable X, after which a random variable is obtained Z with mathematical expectation 0 and variance 1. This transformation is called normalization:

It can be carried out for each random variable. Normalization allows all possible variants of the normal distribution to be reduced to one case: m = 0, s = 1.

The normal distribution with m = 0, s = 1 is called normalized normal distribution (standardized).

Standard normal distribution(standard Laplace–Gaussian distribution or normalized normal distribution) is the probability distribution of a standardized normal random variable Z, the distribution density of which is equal to:

at - ¥<z< + ¥

Function values Ф(z) determined by the formula:

(7)

Function values Ф(z) and density f(z) normalized normal distribution are calculated and tabulated. The table is compiled only for positive values z That's why:

F (z) = 1Ф(z) (8)

Using these tables, you can determine not only the values ​​of the function and density of the normalized normal distribution for a given z, but also the values ​​of the general normal distribution function, since:

; (9)

. 10)

In many problems involving normally distributed random variables, it is necessary to determine the probability of occurrence of a random variable X, subject to the normal law with parameters m and s, for a certain area. Such a section could be, for example, the tolerance field for a parameter from the upper value U to the bottom L.

Probability of falling within the interval from X 1 to X 2 can be determined by the formula:

Thus, the probability of hitting a random variable (parameter value) X in the tolerance field is determined by the formula

You can find the probability that a random variable X will be within μ k s . The obtained values ​​for k=1,2 and 3 are the following (also see Fig. 5):

Thus, if a value appears outside the three-sigma region, which contains 99.73% of all possible values, and the probability of such an event occurring is very small (1:270), it should be considered that the value in question was too small or too large not because of random variation, but because of a significant disturbance in the process itself, which can cause changes in the nature of the distribution.

The area lying inside the three-sigma boundaries is also called statistical tolerance area relevant machine or process.

compared to other types of distributions. The main feature of this distribution is that all other distribution laws tend to this law with an infinite repetition of the number of tests. How does this distribution come about?

Let's imagine that, having taken a hand-held dynamometer, you are located in the most crowded place in your city. And you offer everyone who passes by to measure their strength by squeezing the dynamometer with their right or left hand. You carefully write down the dynamometer readings. After some time, with a sufficiently large number of tests, you plotted the dynamometer readings on the abscissa axis, and the number of people who “squeezed out” this reading on the ordinate axis. The resulting points were connected by a smooth line. The result is the curve shown in Fig. 9.8. The appearance of this curve will not change much as the experiment time increases. Moreover, from a certain point on, new values ​​will only refine the curve without changing its shape.


Rice. 9.8.

Now let's move our dynamometer to the athletic hall and repeat the experiment. Now the maximum of the curve will shift to the right, the left end will be somewhat tightened, while its right end will be steeper (Fig. 9.9).


Rice. 9.9.

Note that the maximum frequency for the second distribution (point B) will be lower than the maximum frequency for the first distribution (point A). This can be explained by the fact that the total number of people visiting the athletic hall will be less than the number of people who passed near the experimenter in the first case (in the city center in a fairly crowded place). The maximum has shifted to the right, since athletic gyms are attended by physically stronger people compared to the general background.

And finally, we will visit schools, kindergartens and nursing homes with the same goal: to reveal the strength of the hands of visitors to these places. And again the distribution curve will have a similar shape, but now, obviously, its left end will be steeper, and its right end will be more drawn out. And as in the second case, the maximum (point C) will be below point A (Fig. 9.10).


Rice. 9.10.

This remarkable property of the normal distribution - maintaining the shape of the probability density curve (Fig. 8 - 10) was noticed and described in 1733 by Moivre, and then studied by Gauss.

In scientific research, in technology, in mass phenomena or experiments, when we are talking about repeatedly repeating random variables under constant experimental conditions, they say that the test results undergo random scattering, obeying the law of the normal distribution curve

(21)

Where is the most common event. As a rule, in formula (21) instead of the parameter, . Moreover, the longer the experimental series, the less the parameter will differ from the mathematical expectation. The area under the curve (Fig. 9.11) is assumed to be equal to one. The area corresponding to any interval of the x-axis is numerically equal to the probability of a random result falling into this interval.


Rice. 9.11.

The normal distribution function has the form


(22)

Note that the normal curve (Fig. 9.11) is symmetrical with respect to the straight line and asymptotically approaches the OX axis at .

Let's calculate the mathematical expectation for the normal law


(23)

Properties of normal distribution

Let us consider the basic properties of this important distribution.

Property 1. The normal distribution density function (21) is defined on the entire x-axis.

Property 2. The normal distribution density function (21) is greater than zero for any of the domain of definition ().

Property 3. With an infinite increase (decrease), the distribution function (21) tends to zero .

Property 4. When the distribution function given by (21) has the greatest value equal to

(24)

Property 5. The graph of the function (Fig. 9.11) is symmetrical with respect to the straight line.

Property 6. The graph of the function (Fig. 9.11) has two inflection points symmetrical with respect to the straight line:

(25)

Property 7. All odd central moments are zero. Note that using property 7, the asymmetry of the function is determined by the formula. If, then they conclude that the distribution under study is symmetrical with respect to the straight line. If , then they say that the series is shifted to the right (the right branch of the graph is flatter or tightened). If , then the series is considered to be shifted to the left (the flatter left branch of the graph in Fig. 9.12).


Rice. 9.12.

Property 8. The kurtosis of the distribution is equal to 3. In practice, it is often calculated and the degree of “compression” or “blurring” of the graph is determined by the proximity of this value to zero (Fig. 9.13). And since it is related to , it ultimately characterizes the degree of frequency dispersion of the data. And since it determines

In many problems related to normally distributed random variables, it is necessary to determine the probability of a random variable , subject to a normal law with parameters, falling on the segment from to . To calculate this probability we use the general formula

where is the distribution function of the quantity .

Let's find the distribution function of a random variable distributed according to a normal law with parameters. The distribution density of the value is equal to:

From here we find the distribution function

. (6.3.3)

Let us make a change of variable in the integral (6.3.3)

and let's put it in this form:

(6.3.4)

The integral (6.3.4) is not expressed through elementary functions, but it can be calculated through a special function expressing a certain integral of the expression or (the so-called probability integral), for which tables have been compiled. There are many varieties of such functions, for example:

;

etc. Which of these functions to use is a matter of taste. We will choose as such a function

. (6.3.5)

It is easy to see that this function is nothing more than a distribution function for a normally distributed random variable with parameters .

Let us agree to call the function a normal distribution function. The appendix (Table 1) contains tables of function values.

Let us express the distribution function (6.3.3) of the quantity with parameters and through the normal distribution function. Obviously,

Now let's find the probability of a random variable falling on the section from to . According to formula (6.3.1)

Thus, we expressed the probability of a random variable, distributed according to the normal law with any parameters, getting into the area through the standard distribution function corresponding to the simplest normal law with parameters 0.1. Note that the arguments of the function in formula (6.3.7) have a very simple meaning: there is the distance from the right end of the section to the center of scattering, expressed in standard deviations; - the same distance for the left end of the section, and this distance is considered positive if the end is located to the right of the center of dispersion, and negative if to the left.

Like any distribution function, the function has the following properties:

3. - non-decreasing function.

In addition, from the symmetry of the normal distribution with parameters relative to the origin, it follows that

Using this property, strictly speaking, it would be possible to limit the function tables to only positive argument values, but in order to avoid an unnecessary operation (subtraction from one), Appendix Table 1 provides values ​​for both positive and negative arguments.

In practice, we often encounter the problem of calculating the probability of a normally distributed random variable falling into an area that is symmetrical with respect to the center of scattering. Let's consider such a section of length (Fig. 6.3.1). Let's calculate the probability of hitting this area using formula (6.3.7):

Taking into account the property (6.3.8) of the function and giving the left side of formula (6.3.9) a more compact form, we obtain a formula for the probability of a random variable distributed according to the normal law falling into an area symmetrical with respect to the center of scattering:

. (6.3.10)

Let's solve the following problem. Let us plot successive segments of length from the center of dispersion (Fig. 6.3.2) and calculate the probability of a random variable falling into each of them. Since the normal curve is symmetrical, it is enough to plot such segments only in one direction.

Using formula (6.3.7) we find:

(6.3.11)

As can be seen from these data, the probabilities of hitting each of the following segments (fifth, sixth, etc.) with an accuracy of 0.001 are equal to zero.

Rounding the probabilities of getting into segments to 0.01 (to 1%), we get three numbers that are easy to remember:

0,34; 0,14; 0,02.

The sum of these three values ​​is 0.5. This means that for a normally distributed random variable, all dispersion (with an accuracy of fractions of a percent) fits within the area .

This allows, knowing the standard deviation and mathematical expectation of a random variable, to roughly indicate the range of its practically possible values. This method of estimating the range of possible values ​​of a random variable is known in mathematical statistics as the “three sigma rule.” The rule of three sigma also implies an approximate method for determining the standard deviation of a random variable: take the maximum practically possible deviation from the mean and divide it by three. Of course, this rough technique can only be recommended if there are no other, more accurate methods for determining.

Example 1. A random variable distributed according to a normal law represents an error in measuring a certain distance. When measuring, a systematic error is allowed in the direction of overestimation by 1.2 (m); The standard deviation of the measurement error is 0.8 (m). Find the probability that the deviation of the measured value from the true value will not exceed 1.6 (m) in absolute value.

Solution. The measurement error is a random variable subject to the normal law with parameters and . We need to find the probability of this quantity falling on the section from to . According to formula (6.3.7) we have:

Using the function tables (Appendix, Table 1), we find:

; ,

Example 2. Find the same probability as in the previous example, but provided that there is no systematic error.

Solution. Using formula (6.3.10), assuming , we find:

Example 3. A target that looks like a strip (motorway), the width of which is 20 m, is fired in a direction perpendicular to the highway. Aiming is carried out along the center line of the highway. The standard deviation in the shooting direction is equal to m. There is a systematic error in the shooting direction: the undershoot is 3 m. Find the probability of hitting a highway with one shot.

In practice, most random variables that are influenced by a large number of random factors obey the normal probability distribution law. Therefore, in various applications of probability theory, this law is of particular importance.

The random variable $X$ obeys the normal probability distribution law if its probability distribution density has the following form

$$f\left(x\right)=((1)\over (\sigma \sqrt(2\pi )))e^(-(((\left(x-a\right))^2)\over ( 2(\sigma )^2)))$$

The graph of the function $f\left(x\right)$ is shown schematically in the figure and is called “Gaussian curve”. To the right of this graph is the German 10 mark banknote, which was used before the introduction of the euro. If you look closely, you can see on this banknote the Gaussian curve and its discoverer, the greatest mathematician Carl Friedrich Gauss.

Let's return to our density function $f\left(x\right)$ and give some explanations regarding the distribution parameters $a,\ (\sigma )^2$. The parameter $a$ characterizes the center of dispersion of the values ​​of a random variable, that is, it has the meaning of a mathematical expectation. When the parameter $a$ changes and the parameter $(\sigma )^2$ remains unchanged, we can observe a shift in the graph of the function $f\left(x\right)$ along the abscissa, while the density graph itself does not change its shape.

The parameter $(\sigma )^2$ is the variance and characterizes the shape of the density graph curve $f\left(x\right)$. When changing the parameter $(\sigma )^2$ with the parameter $a$ unchanged, we can observe how the density graph changes its shape, compressing or stretching, without moving along the abscissa axis.

Probability of a normally distributed random variable falling into a given interval

As is known, the probability of a random variable $X$ falling into the interval $\left(\alpha ;\ \beta \right)$ can be calculated $P\left(\alpha< X < \beta \right)=\int^{\beta }_{\alpha }{f\left(x\right)dx}$. Для нормального распределения случайной величины $X$ с параметрами $a,\ \sigma $ справедлива следующая формула:

$$P\left(\alpha< X < \beta \right)=\Phi \left({{\beta -a}\over {\sigma }}\right)-\Phi \left({{\alpha -a}\over {\sigma }}\right)$$

Here the function $\Phi \left(x\right)=((1)\over (\sqrt(2\pi )))\int^x_0(e^(-t^2/2)dt)$ is the Laplace function . The values ​​of this function are taken from . The following properties of the function $\Phi \left(x\right)$ can be noted.

1 . $\Phi \left(-x\right)=-\Phi \left(x\right)$, that is, the function $\Phi \left(x\right)$ is odd.

2 . $\Phi \left(x\right)$ is a monotonically increasing function.

3 . $(\mathop(lim)_(x\to +\infty ) \Phi \left(x\right)\ )=0.5$, $(\mathop(lim)_(x\to -\infty ) \ Phi \left(x\right)\ )=-0.5$.

To calculate the values ​​of the function $\Phi \left(x\right)$, you can also use the function $f_x$ wizard in Excel: $\Phi \left(x\right)=NORMDIST\left(x;0;1;1\right )-0.5$. For example, let's calculate the values ​​of the function $\Phi \left(x\right)$ for $x=2$.

The probability of a normally distributed random variable $X\in N\left(a;\ (\sigma )^2\right)$ falling into an interval symmetric with respect to the mathematical expectation $a$ can be calculated using the formula

$$P\left(\left|X-a\right|< \delta \right)=2\Phi \left({{\delta }\over {\sigma }}\right).$$

Three sigma rule. It is almost certain that a normally distributed random variable $X$ will fall into the interval $\left(a-3\sigma ;a+3\sigma \right)$.

Example 1 . The random variable $X$ is subject to the normal probability distribution law with parameters $a=2,\ \sigma =3$. Find the probability of $X$ falling into the interval $\left(0.5;1\right)$ and the probability of satisfying the inequality $\left|X-a\right|< 0,2$.

Using formula

$$P\left(\alpha< X < \beta \right)=\Phi \left({{\beta -a}\over {\sigma }}\right)-\Phi \left({{\alpha -a}\over {\sigma }}\right),$$

we find $P\left(0.5;1\right)=\Phi \left(((1-2)\over (3))\right)-\Phi \left(((0.5-2)\ over (3))\right)=\Phi \left(-0.33\right)-\Phi \left(-0.5\right)=\Phi \left(0.5\right)-\Phi \ left(0.33\right)=0.191-0.129=$0.062.

$$P\left(\left|X-a\right|< 0,2\right)=2\Phi \left({{\delta }\over {\sigma }}\right)=2\Phi \left({{0,2}\over {3}}\right)=2\Phi \left(0,07\right)=2\cdot 0,028=0,056.$$

Example 2 . Suppose that during the year the price of shares of a certain company is a random variable distributed according to the normal law with a mathematical expectation equal to 50 conventional monetary units and a standard deviation equal to 10. What is the probability that on a randomly selected day of the period under discussion the price for the promotion will be:

a) more than 70 conventional monetary units?

b) below 50 per share?

c) between 45 and 58 conventional monetary units per share?

Let the random variable $X$ be the price of shares of some company. By condition, $X$ is subject to a normal distribution with parameters $a=50$ - mathematical expectation, $\sigma =10$ - standard deviation. Probability $P\left(\alpha< X < \beta \right)$ попадания $X$ в интервал $\left(\alpha ,\ \beta \right)$ будем находить по формуле:

$$P\left(\alpha< X < \beta \right)=\Phi \left({{\beta -a}\over {\sigma }}\right)-\Phi \left({{\alpha -a}\over {\sigma }}\right).$$

$$а)\ P\left(X>70\right)=\Phi \left(((\infty -50)\over (10))\right)-\Phi \left(((70-50)\ over (10))\right)=0.5-\Phi \left(2\right)=0.5-0.4772=0.0228.$$

$$b)\P\left(X< 50\right)=\Phi \left({{50-50}\over {10}}\right)-\Phi \left({{-\infty -50}\over {10}}\right)=\Phi \left(0\right)+0,5=0+0,5=0,5.$$

$$in)\ P\left(45< X < 58\right)=\Phi \left({{58-50}\over {10}}\right)-\Phi \left({{45-50}\over {10}}\right)=\Phi \left(0,8\right)-\Phi \left(-0,5\right)=\Phi \left(0,8\right)+\Phi \left(0,5\right)=$$

The law of normal distribution, the so-called Gauss law, is one of the most common laws. This is a fundamental law in probability theory and its application. The normal distribution is most often found in the study of natural and socio-economic phenomena. In other words, most statistical aggregates in nature and society obey the law of normal distribution. Accordingly, we can say that populations of a large number of large samples obey the law of normal distribution. Those populations that deviate from the normal distribution as a result of special transformations can be brought closer to normal. In this regard, it should be remembered that the fundamental feature of this law in relation to other laws of distribution is that it is the law of the boundary to which other laws of distribution approach in certain (standard) conditions.

It should be noted that the term “normal distribution” has a conventional meaning, as a term generally accepted in mathematical and statistical literature. The statement that one or another characteristic of any phenomenon obeys the law of normal distribution does not at all mean the inviolability of the norms supposedly inherent in the phenomenon under study, and classifying the latter as the second type of law does not mean some kind of abnormality of this phenomenon. In this sense, the term “normal distribution” is not entirely appropriate.

Normal distribution (Gauss-Laplace law) is a type of continuous distribution. Where Moivre (one thousand seven hundred and seventy-three, France) derived the normal law of probability distribution. The basic ideas of this discovery were first used in the theory of errors by K. Gauss (1809, Germany) and A. Laplace (1812, France), who made a significant theoretical contribution to the development of the law itself. In particular, K. Gauss in his developments proceeded from the recognition that the most probable value of a random variable is the arithmetic mean. The general conditions for the emergence of a normal distribution were established by A.M. Lyapunova. He proved that if the characteristic under study is the result of the total influence of many factors, each of which has little connection with the majority of the others, and the influence of each factor on the final result is much overlapped by the total influence of all other factors, then the distribution becomes close to normal.

The probability distribution of a continuous random variable is called normal and has the density:

1 +1 (& #) 2

/ (x, x,<т) = - ^ е 2 st2

where x is the mathematical expectation or average value. As you can see, the normal distribution is determined by two parameters: x and °. To define a normal distribution, it is enough to know the mathematical expectation or the mean and standard deviation. These two quantities determine the center of the grouping and the shape

curve on the graph. The graph of the function u (xx, b) is called a normal curve (Gaussian curve) with parameters x and b (Fig. 12).

The normal distribution curve has inflection points at X ± 1. If represented graphically, then between X = + l and 1 = -1 is 0.683 parts of the entire curve area (i.e. 68.3%). Within the boundaries of X = + 2 and X- 2. there are 0.954 areas (95.4%), and between X = + 3 and X = - 3 - 0.997 parts of the entire distribution area (99.7%). In Fig. Figure 13 illustrates the nature of the normal distribution with one-, two- and three-sigma boundaries.

With a normal distribution, the arithmetic mean, mode and median will be equal to each other. The shape of a normal curve has the form of a single-vertex symmetrical curve, the branches of which asymptotically approach the abscissa axis. The largest ordinate of the curve corresponds to x = 0. At this point on the abscissa axis the numerical value of the characteristics is placed, equal to the arithmetic mean, mode and median. On both sides of the top of the curve, its branches come, changing the shape of convexity to concavity at certain points. These points are symmetrical and correspond to the values ​​x = ± 1, that is, feature values ​​whose deviations from the average are numerically equal to the standard deviation. The ordinate, which corresponds to the arithmetic mean, divides the entire area between the curve and the abscissa in half. So, the probabilities of occurrence of values ​​of the studied characteristic greater and less than the average

arithmetic will be equal to 0.50, that is, x, (~ ^ x) = 0.50 V

Fig. 12. Normal distribution curve (Gaussian curve)

The shape and position of the normal curve determine the value of the mean and standard deviation. It has been mathematically proven that changing the value of the average (mathematical expectation) does not change the shape of the normal curve, but only leads to its displacement along the abscissa axis. The curve shifts to the right if ~ increases, and to the left if ~ comes.

Fig. 14. Normal distribution curves with different parameter valuesV

About changing the shape of a normal curve graph when changing

the standard deviation can be judged by the maximum

differential normal distribution function, equal to 1

As can be seen, as the value of ° increases, the maximum ordinate of the curve will decrease. Consequently, the normal distribution curve will compress towards the x-axis and take on a more flat-topped shape.

And, conversely, as the parameter β decreases, the normal curve stretches in the positive direction of the ordinate axis, and the “bell” shape becomes more pointed (Fig. 14). Note that, regardless of the values ​​of the parameters ~ and , the area bounded by the abscissa axis and the curve is always equal to unity (distribution density property). This is clearly illustrated by the graph (Fig. 13).

The above-mentioned features of the manifestation of “normality” of distribution allow us to identify a number of common properties that normal distribution curves have:

1) any normal curve reaches a maximum point (X= x) comes continuously to the right and left of it, gradually approaching the x-axis;

2) any normal curve is symmetrical with respect to a straight line,

parallel to the ordinate axis and passes through the maximum point (X= x)

the maximum ordinate is ^^^ i;

3) any normal curve has a “bell” shape, has a convexity that is directed upward to the maximum point. At points x ~ ° and x + b it changes convexity, and the smaller a, the sharper the “bell”, and the larger a, the sharper the top of the “bell” becomes (Fig. 14). Change in mathematical expectation (with a constant value

c) does not lead to modification of the shape of the curve.

When x = 0 and ° = 1, the normal curve is called a normalized curve or a normal distribution in canonical form.

The normalized curve is described by the following formula:

The construction of a normal curve based on empirical data is carried out using the formula:

pi 1 - "" = --- 7 = e

where and ™ is the theoretical frequency of each interval (group) of the distribution; "- Sum of frequencies equal to the volume of the population; "- interval step;

same - the ratio of the circumference of a circle to its diameter, which is

e - the base of natural logarithms, equal to 2.71828;

The second and third parts of the formula) is a function

normalized deviation CN), which can be calculated for any values ​​of X. Tables of CN values) are usually called “ordinate tables of the normal curve” (Appendix 3). When using these functions, the working formula for the normal distribution takes on a simple form:

Example. Let's consider the case of constructing a normal curve using the example of data on the distribution of 57 workers by level of daily earnings (Table 42). According to Table 42, we find the arithmetic mean:

~ = ^ = И6 54 =

We calculate the standard deviation:

For each row of the table we find the value of the normalized deviation

x and ~x | 12 g => - = - ^ 2 = 1.92

A 6.25 (dd I of the first interval, etc.).

In column 8 of table. 42 we write down the table value of the function Di) from the application, for example, for the first interval X = 1.92 we find “1.9” versus “2” (0.0632).

To calculate theoretical frequencies, that is, the ordinates of the normal distribution curve, the multiplier is calculated:

* = ^ = 36,5 a 6.25

We multiply all found table values ​​of the function / (r) by 36.5. So, for the first interval we get 0.0632x36.5 = 2.31 tons. A few

frequencies (P "<5) combine (in our example - the first two and last two intervals).

If the extreme theoretical frequencies differ significantly from zero, the discrepancy between the sums of the empirical and theoretical frequencies may be significant.

The distribution graph of empirical and theoretical frequencies (normal curve) according to the example under consideration is shown in Figure 15.

Let's consider an example of determining the frequencies of a normal distribution for the case when there is no frequency in the extreme intervals (Table 43). Here the empirical

X - normalized deviation, (c) a - standard deviation.

the frequency of the first interval is zero. The resulting sum of unspecified frequencies is not equal to the sum of their empirical values ​​(56 * 57). In this case, the theoretical frequency is calculated to wash the obtained values ​​of the center of the interval, the normalized deviation and its function.

In Table 43, these values ​​are circled by a rectangle. When plotting a normal curve, in such cases the theoretical curve is continued. In the case under consideration, the normal curve will continue towards negative deviations from the average, since the first unspecified frequency is equal to 5. The calculated theoretical frequency (clarified) for the first interval will be equal to unity. The sum of the refined frequencies coincides with the empirical ones

Table 42

Calculated values

Statistical parameters

Interval,

Number of units,

x) 2

normalized departments

theoretical

frequency of normal distribution series,

/ 0) x - A

>>

One thousand six hundred fifty four

a = 6,25

^i=36.5 A

Table 43

Calculation of frequencies of normal distribution (alignment of empirical frequencies according to the normal law)

Number of units,

Calculated values

Statistical parameters

Interval (and-2)

The median value (center) of the interval,

(je, -xf

^ x t-x) 1 n and

normalized deviation

xs- X

t= x --L

table value of the function, f (t)

theoretical

frequency of normal distribution series

clarified theoretical frequency value,

w

-

-

-

-

-

o = 2,41

Rice. 15. Empirical distribution(1) and normal curve (2)

A normal distribution curve for the population under study can be constructed in another way (unlike the one discussed above). So, if it is necessary to have an approximate idea of ​​​​the correspondence of the actual distribution to the normal one, calculations are carried out in the following sequence. The maximum ordinate is determined, which corresponds to the average size of the characteristics), then, having calculated the standard deviation, the coordinates of the points of the normal distribution curve are calculated according to the scheme outlined in tables 42 and 43. Thus, according to the initial and calculated data in table 43, the average should be ~ = 26 This value the middle one coincides with the center of the fourth interval (25-27). So, the frequency of this interval "20" can be taken (when plotting the graph) as the maximum ordinate). Having the calculated dispersion (β = 2.41 cm, Table 43), we calculate the coordinate values ​​of all the necessary points of the normal distribution curve (Tables 44, 45). Using the obtained coordinates, we draw a normal curve (Fig. 16), taking the frequency of the fourth interval as the maximum ordinate.

The consistency of the empirical distribution with the normal one can also be established through simplified calculations. Thus, if the ratio of the asymmetry degree indicator (^) to its mean square error sh a "or the ratio of the kurtosis indicator (E x) to its mean square error t & exceeds the number "3" in absolute value, a conclusion is drawn about the discrepancy between the empirical distribution and the nature of the normal distributions (that is,

A tz E X

If A>3 or w e "> 3).

There are other, non-labor-intensive methods for establishing the “normality” of a distribution: a) comparing the arithmetic mean with the mode and median; b) use of Westergard figures; c) application of a graphic image using a semi-logarithmic grid Turbine; d) calculation of special matching criteria, etc.

Table 44

Coordinates 7 points of the normal distribution curve

Table 45

Calculation of coordinates of points of a normal distribution curve

x- 1,5 (7 =

X - a = 23.6

X - 0,5 (7 = = 24,8

x + 0.5st = 27,2

X + a = 28.4

X+1.5 (7 =

Fig. 16. Normal distribution curve plotted using seven points

In practice, when studying a population in order to reconcile its distribution with the normal one, the “3cr rule” is often used.

It has been mathematically proven that the probability that the deviation from the average in absolute value will be less than triple the standard deviation is equal to 0.9973, that is, the probability that the absolute value of the deviation exceeds triple the standard deviation is 0.0027 or very small. Based on the principle of the impossibility of unlikely events, a “case of exceeding” Article 3 can be considered practically impossible. If a random variable is distributed normally, then the absolute value of its deviation from the mathematical expectation (mean) does not exceed triple the standard deviation.

In practical calculations they work this way. If, given the unknown nature of the distribution of the random variable under study, the calculated value of the deviation from the mean turns out to be less than the value of 3 ST, then there is reason to believe that the characteristic under study is distributed normally. If the specified parameter exceeds numeric value 3 ST, we can assume that the distribution of the value under study is not consistent with the normal distribution.

Calculation of theoretical frequencies for the empirical distribution series under study is usually called alignment of empirical curves according to the normal (or any other) distribution law. This process is important both theoretically and practical significance. Alignment of empirical data reveals a pattern in their distribution, which can be veiled by the random form of its manifestation. The pattern established in this way can be used to solve a number of practical problems.

The researcher encounters a distribution close to normal in various fields of science and areas of practical human activity. In economics, this kind of distribution is less common than, say, in technology or biology. This is due to the very nature of socio-economic phenomena, which are characterized by the great complexity of interrelated and interconnected factors, as well as the presence of a number of conditions that limit the free “game” of cases. But an economist must refer to the normal distribution, analyzing the structure of empirical distributions, as some kind of standard. Such a comparison makes it possible to clarify the nature of those internal conditions that determine this distribution figure.

Sphere penetration statistical research into the field of socio-economic phenomena made it possible to reveal the existence of a large number of different types of distribution curves. However, one should not assume that the theoretical concept of a normal distribution curve is generally of little use in the statistical and mathematical analysis of this type of phenomenon. It may not always be acceptable in the analysis of a specific statistical distribution, but in the field of theory and practice, the sampling method of research is of paramount importance.

Let us name the main aspects of the application of the normal distribution in statistical and mathematical analysis.

1. To determine the probability of a specific value of a characteristic. This is necessary when testing hypotheses about the correspondence of a particular empirical distribution to normal.

2. When estimating a number of parameters, for example, averages, using the maximum likelihood method. Its essence lies in the definition of a law to which the totality is subject. The estimate that gives the maximum values ​​is also determined. The best approximation to the population parameters is given by the ratio:

y = - 2 = e 2

3. To determine the probability of sample means relative to general means.

4. When determining the confidence interval in which the approximate value of the characteristics of the general population is located.

Share with friends or save for yourself:

Loading...