Students and schoolchildren - assistance in their studies. The concept of a variation series. Types of variation series Given variation series

The grouping method also allows you to measure variation(variability, fluctuation) of signs. When the number of units in a population is relatively small, variation is measured based on the ranked number of units that make up the population. The series is called ranked, if the units are arranged in ascending (descending) order of the characteristic.

However, ranked series are quite indicative when a comparative characteristic of variation is needed. In addition, in many cases we have to deal with statistical populations consisting of a large number of units, which are practically difficult to represent in the form of a specific series. In this regard, for an initial general acquaintance with statistical data and especially to facilitate the study of variation in characteristics, the phenomena and processes under study are usually combined into groups, and the grouping results are presented in the form of group tables.

If a group table has only two columns - groups according to a selected characteristic (options) and the number of groups (frequency or frequency), it is called near distribution.

Distribution range - the simplest type of structural grouping based on one characteristic, displayed in a group table with two columns containing variants and frequencies of the characteristic. In many cases, with such a structural grouping, i.e. With the compilation of distribution series, the study of the initial statistical material begins.

A structural grouping in the form of a distribution series can be turned into a genuine structural grouping if the selected groups are characterized not only by frequencies, but also by other statistical indicators. The main purpose of distribution series is to study the variation of characteristics. The theory of distribution series is developed in detail by mathematical statistics.

The distribution series are divided into attributive(grouping according to attributive characteristics, for example, dividing the population by gender, nationality, marital status, etc.) and variational(grouping by quantitative characteristics).

Variation series is a group table that contains two columns: grouping of units according to one quantitative characteristic and the number of units in each group. The intervals in the variation series are usually formed equal and closed. The variation series is the following grouping of the Russian population by average per capita monetary income (Table 3.10).

Table 3.10

Distribution of the population of Russia by average per capita income in 2004-2009.

Population groups by average per capita cash income, rub./month

Population in the group, % of the total

8 000,1-10 000,0

10 000,1-15 000,0

15 000,1-25 000,0

Over 25,000.0

Whole population

Variation series, in turn, are divided into discrete and interval. Discrete variation series combine variants of discrete characteristics that vary within narrow limits. An example of a discrete variation series is the distribution of Russian families by the number of children they have.

Interval variation series combine variants of either continuous characteristics or discrete characteristics varying over a wide range. Interval is the variation series of the distribution of the Russian population by average per capita monetary income.

Discrete variation series are not used very often in practice. Meanwhile, compiling them is not difficult, since the composition of the groups is determined by the specific variants that the studied grouping characteristics actually possess.

Interval variation series are more widespread. When compiling them, a difficult question arises about the number of groups, as well as the size of the intervals that should be established.

The principles for solving this issue are set out in the chapter on the methodology for constructing statistical groupings (see paragraph 3.3).

Variation series are a means of collapsing or compressing diverse information into a compact form; from them one can make a fairly clear judgment about the nature of the variation, and study the differences in the characteristics of the phenomena included in the set under study. But the most important significance of variation series is that on their basis the special generalizing characteristics of variation are calculated (see Chapter 7).

Grouping- this is the division of a population into groups that are homogeneous according to some characteristic.

Purpose of the service. Using the online calculator you can:

  • build a variation series, build a histogram and polygon;
  • find indicators of variation (average, mode (including graphically), median, range of variation, quartiles, deciles, quartile differentiation coefficient, coefficient of variation and other indicators);

Instructions. To group a series, you must select the type of variation series obtained (discrete or interval) and indicate the amount of data (number of rows). The resulting solution is saved in a Word file (see example of grouping statistical data).

If the grouping has already been carried out and the discrete variation series or interval series, then you need to use the online calculator Variation Indices. Testing the hypothesis about the type of distribution is carried out using the service Studying the distribution form.

Types of statistical groupings

Variation series. In the case of observations of a discrete random variable, the same value can be encountered several times. Such values ​​x i of a random variable are recorded indicating n i the number of times it appears in n observations, this is the frequency of this value.
In the case of a continuous random variable, grouping is used in practice.
  1. Typological grouping- this is the division of the qualitatively heterogeneous population under study into classes, socio-economic types, homogeneous groups of units. To build this grouping, use the Discrete variation series parameter.
  2. A grouping is called structural, in which a homogeneous population is divided into groups that characterize its structure according to some varying characteristic. To build this grouping, use the Interval series parameter.
  3. A grouping that reveals the relationships between the phenomena being studied and their characteristics is called analytical group(see analytical grouping of series).

Example No. 1. Based on the data in Table 2, construct distribution series for 40 commercial banks of the Russian Federation. Using the resulting distribution series, determine: profit on average per commercial bank, credit investments on average per commercial bank, modal and median value of profit; quartiles, deciles, range of variation, mean linear deviation, standard deviation, coefficient of variation.

Solution:
In chapter "Type of statistical series" select Discrete series. Click Insert from Excel. Number of groups: according to Sturgess formula

Principles for constructing statistical groupings

A series of observations ordered in ascending order is called variation series . Grouping feature is a characteristic by which a population is divided into separate groups. It is called the basis of the group. The grouping can be based on both quantitative and qualitative characteristics.
After determining the basis of the grouping, the question of the number of groups into which the population under study should be divided should be decided.

When using personal computers to process statistical data, grouping of object units is carried out using standard procedures.
One such procedure is based on the use of the Sturgess formula to determine the optimal number of groups:

k = 1+3.322*log(N)

Where k is the number of groups, N is the number of population units.

The length of partial intervals is calculated as h=(x max -x min)/k

Then the number of observations that fall into these intervals is counted, which are taken as frequencies n i . Few frequencies, the values ​​of which are less than 5 (n i< 5), следует объединить. в этом случае надо объединить и соответствующие интервалы.
The middle values ​​of the intervals x i =(c i-1 +c i)/2 are taken as new values.

Example No. 3. As a result of a 5% random sample, the following distribution of products by moisture content was obtained. Calculate: 1) average percentage of humidity; 2) indicators characterizing humidity variations.
The solution was obtained using a calculator: Example No. 1

Construct a variation series. Based on the found series, construct a distribution polygon, histogram, and cumulate. Determine the mode and median.
Download solution

Example. According to the results of sample observation (sample A, Appendix):
a) make a variation series;
b) calculate relative frequencies and accumulated relative frequencies;
c) build a polygon;
d) create an empirical distribution function;
e) plot the empirical distribution function;
f) calculate numerical characteristics: arithmetic mean, dispersion, standard deviation. Solution

Based on the data given in Table 4 (Appendix 1) and corresponding to your option, do:

  1. Based on the structural grouping, construct variational frequency and cumulative distribution series using equal closed intervals, taking the number of groups equal to 6. Present the results in table form and display graphically.
  2. Analyze the variation series of the distribution by calculating:
    • arithmetic mean value of the characteristic;
    • mode, median, 1st quartile, 1st and 9th decile;
    • standard deviation;
    • the coefficient of variation.
  3. Draw conclusions.

Required: rank the series, construct an interval distribution series, calculate the average value, variability of the average value, mode and median for the ranked and interval series.

Based on the initial data, construct a discrete variation series; present it in the form of a statistical table and statistical graphs. 2). Based on the initial data, construct an interval variation series with equal intervals. Choose the number of intervals yourself and explain this choice. Present the resulting variation series in the form of a statistical table and statistical graphs. Indicate the types of tables and graphs used.

In order to determine the average duration of customer service in a pension fund, the number of clients of which is very large, a survey of 100 clients was conducted using a random non-repetitive sampling scheme. The survey results are presented in the table. Find:
a) the boundaries within which, with probability 0.9946, the average service time for all clients of the pension fund is contained;
b) the probability that the share of all fund clients with a service duration of less than 6 minutes differs from the share of such clients in the sample by no more than 10% (in absolute value);
c) the volume of repeated sampling, in which with a probability of 0.9907 it can be stated that the share of all fund clients with a service duration of less than 6 minutes differs from the share of such clients in the sample by no more than 10% (in absolute value).
2. According to task 1, using Pearson’s X2 test, at the significance level α = 0.05, test the hypothesis that random value X – customer service time – is distributed according to a normal law. Construct a histogram of the empirical distribution and the corresponding normal curve in one drawing.
Download solution

A sample of 100 elements is given. Necessary:

  1. Construct a ranked variation series;
  2. Find the maximum and minimum terms of the series;
  3. Find the range of variation and the number of optimal intervals for constructing an interval series. Find the length of the interval of the interval series;
  4. Construct an interval series. Find the frequencies of sample elements falling into the composed intervals. Find the midpoints of each interval;
  5. Construct a histogram and frequency polygon. Compare with normal distribution(analytically and graphically);
  6. Plot the empirical distribution function;
  7. Calculate sample numerical characteristics: sample mean and central sample moment;
  8. Calculate approximate values ​​of standard deviation, skewness and kurtosis (using the MS Excel analysis package). Compare approximate calculated values ​​with exact ones (calculated using MS Excel formulas);
  9. Compare selected graphical characteristics with the corresponding theoretical ones.
Download solution

The following sample data are available (10% sample, mechanical) on product output and the amount of profit, million rubles. According to the original data:
Task 13.1.
13.1.1. Construct a statistical series of distribution of enterprises by the amount of profit, forming five groups with equal intervals. Construct distribution series graphs.
13.1.2. Calculate the numerical characteristics of the distribution series of enterprises by the amount of profit: arithmetic mean, standard deviation, dispersion, coefficient of variation V. Draw conclusions.
Task 13.2.
13.2.1. Determine the boundaries within which, with probability 0.997, the amount of profit of one enterprise in the general population lies.
13.2.2. Using Pearson's x2 test, at the significance level α, test the hypothesis that the random variable X - the amount of profit - is distributed according to a normal law.
Task 13.3.
13.3.1. Determine the coefficients of the sample regression equation.
13.3.2. Establish the presence and nature of the correlation between the cost of manufactured products (X) and the amount of profit per enterprise (Y). Construct a scatterplot and regression line.
13.3.3. Calculate the linear correlation coefficient. Using Student's t-test, test the significance of the correlation coefficient. Draw a conclusion about the close relationship between factors X and Y using the Chaddock scale.
Guidelines . Task 13.3 is performed using this service.
Download solution

Task. The following data represents the time spent by clients on concluding contracts. Construct an interval variation series of the presented data, a histogram, find an unbiased estimate mathematical expectation, biased and unbiased variance estimator.

Example. According to Table 2:
1) Construct distribution series for 40 commercial banks of the Russian Federation:
A) in terms of profit;
B) by the amount of credit investments.
2) Using the obtained distribution series, determine:
A) average profit per commercial bank;
B) credit investments on average per commercial bank;
C) modal and median value of profit; quartiles, deciles;
D) modal and median value of credit investments.
3) Using the distribution rows obtained in step 1, calculate:
a) range of variation;
b) average linear deviation;
c) standard deviation;
d) coefficient of variation.
Complete the necessary calculations in tabular form. Analyze the results. Draw conclusions.
Plot graphs of the resulting distribution series. Determine the mode and median graphically.

Solution:
To build a grouping with equal intervals, we will use the service Grouping statistical data.

Figure 1 – Entering parameters

Description of parameters
Number of lines: number of input data. If the row size is small, indicate its quantity. If the selection is large enough, then click the Insert from Excel button.
Number of groups: 0 – the number of groups will be determined by the Sturgess formula.
If a specific number of groups is specified, specify it (for example, 5).
Type of series: Discrete series.
Significance level: for example 0.954 . This parameter is set to determine the confidence interval of the mean.
Sample: For example, 10% mechanical sampling was carried out. We indicate the number 10. For our data we indicate 100.

A set of objects or phenomena united by some common feature or property of a qualitative or quantitative nature is called object of observation .

Every object of statistical observation consists of individual elements - observation units .

The results of statistical observation represent numerical information - data . Statistical data - this is information about what values ​​the characteristic of interest to the researcher took in the statistical population.

If the values ​​of a characteristic are expressed in numbers, then the characteristic is called quantitative .

If a sign characterizes some property or state of the elements of a population, then the sign is called high quality .

If all elements of a population are subject to study (continuous observation), then the statistical population is called general

If part of the elements of the general population is subject to research, then the statistical population is called selective (sampling) . A sample from a population is drawn at random so that each of the n elements in the sample has an equal chance of being selected.

The values ​​of a characteristic change (vary) when moving from one element of the population to another, therefore in statistics different values ​​of a characteristic are also called options . Options are usually denoted in small Latin letters x, y, z.

The serial number of the option (characteristic value) is called rank . x 1 - 1st option (1st value of the attribute), x 2 - 2nd option (2nd value of the attribute), x i - i-th option (i-th value sign).

A series of attribute values ​​(options) ordered in ascending or descending order with their corresponding weights is called variation series (distribution series).

As scales frequencies or frequencies appear.

Frequency(m i) shows how many times this or that option (attribute value) occurs in the statistical population.

Frequency or relative frequency(w i) shows what part of the population units has one or another option. Frequency is calculated as the ratio of the frequency of a particular option to the sum of all frequencies of the series.

. (6.1)

The sum of all frequencies is 1.

. (6.2)

Variation series are discrete and interval.

Discrete variation series They are usually constructed if the values ​​of the characteristic being studied may differ from each other by no less than a certain finite amount.

In discrete variation series, point values ​​of the characteristic are specified.

The general view of the discrete variation series is shown in Table 6.1.

Table 6.1

where i = 1, 2, … , l.

In interval variation series, in each interval the upper and lower boundaries of the interval are distinguished.

The difference between the upper and lower boundaries of the interval is called interval difference or length (value) of the interval .

The value of the first interval k 1 is determined by the formula:

k 1 = a 2 - a 1;

second: k 2 = a 3 - a 2; ...

last: k l = a l - a l -1 .

In general interval difference k i is calculated by the formula:

k i = x i (max) - x i (min) . (6.3)

If an interval has both boundaries, then it is called closed .

The first and last intervals can be open , i.e. have only one border.

For example, the first interval can be set as “up to 100”, the second - “100-110”, ..., the second to last - “190-200”, the last - “200 and more”. Obviously, the first interval has no lower boundary, and the last one has no upper boundary; both of them are open.

Often open intervals have to be conditionally closed. To do this, usually the value of the first interval is taken equal to the value of the second, and the value of the last - to the value of the penultimate one. In our example, the value of the second interval is 110-100=10, therefore, the lower limit of the first interval will be conditionally 100-10=90; the value of the penultimate interval is 200-190=10, therefore, the upper limit of the last interval will be conditionally 200+10=210.

In addition, in an interval variation series there may be intervals of different lengths. If the intervals in a variation series have the same length (interval difference), they are called equal in size , otherwise - unequal in size.

When constructing an interval variation series, the problem of choosing the size of the intervals (interval difference) often arises.

To determine the optimal size of intervals (in the event that a series is constructed with equal intervals), use Sturgess formula:

, (6.4)

where n is the number of units in the population,

x (max) and x (min) - the largest and smallest values ​​of the series options.

To characterize the variation series, along with frequencies and frequencies, accumulated frequencies and frequencies are used.

Accumulated frequencies (frequencies) show how many units of the population (which part of them) do not exceed a given value (variant) x.

Accumulated frequencies ( v i) based on discrete series data can be calculated using the following formula:

. (6.5)

For an interval variation series, this is the sum of the frequencies (frequencies) of all intervals not exceeding this one.

A discrete variation series can be represented graphically using frequency distribution polygon or frequencies.

When constructing a distribution polygon, the values ​​of the characteristic (variants) are plotted along the abscissa axis, and frequencies or frequencies are plotted along the ordinate axis. At the intersection of the attribute values ​​and the corresponding frequencies (frequencies), points are laid, which, in turn, are connected by segments. The resulting broken line is called a frequency (frequency) distribution polygon.

x k
x 2
x 1 x i


Rice. 6.1.

Interval variation series can be represented graphically using histograms, i.e. bar chart.

When constructing a histogram, the values ​​of the characteristic being studied (interval boundaries) are plotted along the abscissa axis.

In the event that the intervals are of the same size, frequencies or frequencies can be plotted along the ordinate axis.

If the intervals have different sizes, the values ​​of the absolute or relative distribution density must be plotted along the ordinate axis.

Absolute density- ratio of interval frequency to interval size:

; (6.6)

where: f(a) i - absolute density of the i-th interval;

m i - frequency of the i-th interval;

k i - the value of the i-th interval (interval difference).

Absolute density shows how many population units there are per unit interval.

Relative density- ratio of interval frequency to interval size:

; (6.7)

where: f(о) i - relative density of the i-th interval;

w i - frequency of the i-th interval.

Relative density shows what part of the population units falls on a unit of the interval.

a l
a 1 x i
a 2

Both discrete and interval variation series can be represented graphically in the form of cumulates and ogives.

When building cumulates according to the data of a discrete series, the values ​​of the characteristic (variants) are plotted along the x-axis, and the accumulated frequencies or frequencies are plotted along the ordinate axis. At the intersection of the values ​​of the attribute (variants) and the corresponding accumulated frequencies (frequencies), points are constructed, which, in turn, are connected by segments or a curve. The resulting broken line (curve) is called cumulate (cumulative curve).

When constructing cumulates based on data from an interval series, the boundaries of the intervals are plotted along the abscissa axis. The abscissas of the points are the upper boundaries of the intervals. The ordinates form the accumulated frequencies (frequencies) of the corresponding intervals. Often another point is added, the abscissa of which is the lower boundary of the first interval, and the ordinate is zero. By connecting the points with segments or a curve, we obtain a cumulate.

Ogiva is constructed similarly to a cumulate with the only difference being that the points corresponding to the accumulated frequencies (frequencies) are plotted on the abscissa axis, and the values ​​of the characteristic (variants) are plotted on the ordinate axis.

  • Introductory lesson for free;
  • Big number experienced teachers (native and Russian-speaking);
  • Courses are NOT for a specific period (month, six months, year), but for a specific number of lessons (5, 10, 20, 50);
  • More than 10,000 satisfied customers.
  • The cost of one lesson with a Russian-speaking teacher is from 600 rubles, with a native speaker - from 1500 rubles

The concept of a variation series. The first step in systematizing statistical observation materials is to count the number of units that have a particular characteristic. By arranging the units in ascending or descending order of their quantitative characteristic and counting the number of units with a specific value of the characteristic, we obtain a variation series. A variation series characterizes the distribution of units of a certain statistical population according to some quantitative characteristic.

The variation series consists of two columns, the left column contains the values ​​of the varying characteristic, called variants and denoted (x), and the right column contains absolute numbers showing how many times each variant occurs. The indicators in this column are called frequencies and are designated (f).

The variation series can be schematically presented in the form of Table 5.1:

Table 5.1

Type of variation series

Options (x)

Frequencies (f)

In the right column, relative indicators can also be used, characterizing the share of the frequency of individual options in the total sum of frequencies. These relative indicators are called frequencies and are conventionally denoted by , i.e. . The sum of all frequencies is equal to one. Frequencies can also be expressed as percentages, and then their sum will be equal to 100%.

Varying signs may be of different nature. Variants of some characteristics are expressed in integers, for example, the number of rooms in an apartment, the number of books published, etc. These signs are called discontinuous or discrete. Variants of other characteristics can take on any values ​​within certain limits, such as the fulfillment of planned tasks, wages, etc. These characteristics are called continuous.

Discrete variation series. If the variants of the variation series are expressed in the form discrete quantities, then such a variation series is called discrete; its appearance is presented in table. 5.2:

Table 5.2

Distribution of students according to exam grades

Ratings (x)

Number of students (f)

In % of total ()

The nature of the distribution in discrete series is depicted graphically in the form of a distribution polygon, Fig. 5.1.

Rice. 5.1. Distribution of students according to grades obtained in the exam.

Interval variation series. For continuous characteristics, variation series are constructed as interval ones, i.e. the values ​​of the characteristic in them are expressed in the form of intervals “from and to”. In this case, the minimum value of the characteristic in such an interval is called the lower limit of the interval, and the maximum is called the upper limit of the interval.

Interval variation series are constructed both for discontinuous characteristics (discrete) and for those varying over a large range. Interval rows can be with equal or unequal intervals. In economic practice, most unequal intervals are used, progressively increasing or decreasing. This need arises especially in cases where the fluctuation of a characteristic occurs unevenly and within large limits.

Let's consider the type of interval series with equal intervals, table. 5.3:

Table 5.3

Distribution of workers by production

Output, t.r. (X)

Number of workers (f)

Cumulative frequency (f´)

The interval distribution series is graphically depicted in the form of a histogram, Fig. 5.2.

Fig.5.2. Distribution of workers by production

Accumulated (cumulative) frequency. In practice, there is a need to transform distribution series into cumulative series, built according to accumulated frequencies. With their help, you can determine structural averages that facilitate the analysis of distribution series data.

Cumulative frequencies are determined by sequentially adding to the frequencies (or frequencies) of the first group these indicators of subsequent groups of the distribution series. Cumulates and ogives are used to illustrate distribution series. To construct them, the values ​​of the discrete characteristic (or the ends of the intervals) are marked on the abscissa axis, and the cumulative totals of frequencies (cumulates) are marked on the ordinate axis, Fig. 5.3.

Rice. 5.3. Cumulative distribution of workers by production

If the scales of frequencies and options are reversed, i.e. the abscissa axis reflects the accumulated frequencies, and the ordinate axis shows the values ​​of the variants, then the curve characterizing the change in frequencies from group to group will be called the distribution ogive, Fig. 5.4.

Rice. 5.4. Ogiva of distribution of workers by production

Variation series with equal intervals provide one of the most important requirements for statistical distribution series, ensuring their comparability in time and space.

Distribution density. However, the frequencies of individual unequal intervals in the named series are not directly comparable. In such cases, to ensure the necessary comparability, the distribution density is calculated, i.e. determine how many units in each group are per unit of interval value.

When constructing a graph of the distribution of a variation series with unequal intervals, the height of the rectangles is determined in proportion not to the frequencies, but to the density indicators of the distribution of the values ​​of the characteristic being studied in the corresponding intervals.

Drawing up a variation series and its graphical representation is the first step in processing the initial data and the first stage in the analysis of the population being studied. The next step in the analysis of variation series is to determine the main general indicators, called the characteristics of the series. These characteristics should give an idea of ​​the average value of the characteristic among population units.

average value. The average value is a generalized characteristic of the characteristic being studied in the population under study, reflecting its typical level per unit of the population under specific conditions of place and time.

The average value is always named and has the same dimension as the characteristic of individual units of the population.

Before calculating average values, it is necessary to group the units of the population under study, identifying qualitatively homogeneous groups.

The average calculated for the population as a whole is called the overall average, and for each group - group averages.

There are two types of averages: power (arithmetic mean, harmonic mean, geometric mean, quadratic mean); structural (mode, median, quartiles, deciles).

The choice of average for calculation depends on the purpose.

Types of power averages and methods for their calculation. In the practice of statistical processing of collected material, various problems arise, the solution of which requires different averages.

Mathematical statistics derives various averages from power average formulas:

where is the average value; x – individual options (feature values); z – exponent (with z = 1 – arithmetic mean, z = 0 geometric mean, z = - 1 – harmonic mean, z = 2 – square mean).

However, the question of what type of average should be applied in each individual case is resolved by specific analysis the population being studied.

The most common type of average in statistics is arithmetic mean. It is calculated in cases where the volume of the averaged characteristic is formed as the sum of its values ​​for individual units of the statistical population being studied.

Depending on the nature of the source data, the arithmetic mean is determined in various ways:

If the data is ungrouped, then the calculation is carried out using the simple average formula

Calculation of the arithmetic mean in a discrete series occurs according to formula 3.4.

Calculation of the arithmetic mean in an interval series. In an interval variation series, where the value of a characteristic in each group is conventionally taken to be the middle of the interval, the arithmetic mean may differ from the mean calculated from ungrouped data. Moreover, the larger the interval in the groups, the greater the possible deviations of the average calculated from grouped data from the average calculated from ungrouped data.

When calculating the average over an interval variation series, to perform the necessary calculations, one moves from the intervals to their midpoints. And then the average is calculated using the weighted arithmetic average formula.

Properties of the arithmetic mean. The arithmetic mean has some properties that make it possible to simplify calculations; let’s consider them.

1. The arithmetic mean of constant numbers is equal to this constant number.

If x = a. Then .

2. If the weights of all options are changed proportionally, i.e. increase or decrease by the same number of times, then the arithmetic mean of the new series will not change.

If all weights f are reduced by k times, then .

3. The sum of positive and negative deviations of individual options from the average, multiplied by the weights, is equal to zero, i.e.

If, then. From here.

If all options are reduced or increased by any number, then the arithmetic mean of the new series will decrease or increase by the same amount.

Let's reduce all options x on a, i.e. x´ = xa.

Then

The arithmetic mean of the original series can be obtained by adding to the reduced mean the number previously subtracted from the options a, i.e. .

5. If all options are reduced or increased in k times, then the arithmetic mean of the new series will decrease or increase by the same amount, i.e. V k once.

Let it be then .

Hence, i.e. to obtain the average of the original series, the arithmetic average of the new series (with reduced options) must be increased by k once.

Harmonic mean. The harmonic mean is the reciprocal of the arithmetic mean. It is used when statistical information does not contain frequencies for individual variants of the population, but is presented as their product (M = xf). The harmonic mean will be calculated using formula 3.5

The practical application of the harmonic mean is to calculate some indices, in particular, the price index.

Geometric mean. When using geometric mean, individual values ​​of a characteristic are, as a rule, relative values ​​of dynamics, constructed in the form of chain values, as a ratio to the previous level of each level in a series of dynamics. The average thus characterizes the average growth rate.

The geometric mean value is also used to determine the equidistant value from the maximum and minimum values ​​of the characteristic. For example, Insurance Company concludes contracts for the provision of auto insurance services. Depending on the specific insured event, the insurance payment can range from 10,000 to 100,000 dollars per year. The average amount of insurance payments will be USD.

The geometric mean is a quantity used as the average of ratios or in distribution series presented in the form of a geometric progression when z = 0. This mean is convenient to use when attention is paid not to absolute differences, but to the ratios of two numbers.

The formulas for calculation are as follows

where are the variants of the characteristic being averaged; – product of options; f– frequency of options.

The geometric mean is used in calculations of average annual growth rates.

Mean square. The mean square formula is used to measure the degree of fluctuation of individual values ​​of a characteristic around the arithmetic mean in the distribution series. Thus, when calculating variation indicators, the average is calculated from the squared deviations of individual values ​​of a characteristic from the arithmetic mean.

The root mean square value is calculated using the formula

IN economic research the mean square in a modified form is widely used in calculating indicators of variation of a characteristic, such as dispersion, standard deviation.

Majority rule. There is the following relationship between power averages - the larger the exponent, the greater the value of the average, Table 5.4:

Table 5.4

Relationship between averages

z value

Relationship between averages

This relationship is called the majorancy rule.

Structural averages. To characterize the structure of the population, special indicators are used, which can be called structural averages. These indicators include mode, median, quartiles and deciles.

Fashion. Mode (Mo) is the most frequently occurring value of a characteristic among population units. The mode is the value of the attribute that corresponds to the maximum point of the theoretical distribution curve.

Fashion is widely used in commercial practice when studying consumer demand (when determining the sizes of clothes and shoes that are in wide demand), and recording prices. There may be several mods in total.

Calculation of mode in a discrete series. In a discrete series, mode is the variant with the highest frequency. Let's consider finding a mode in a discrete series.

Calculation of mode in an interval series. In an interval variation series, the mode is approximately considered to be the central variant of the modal interval, i.e. the interval that has the highest frequency (frequency). Within the interval, you need to find the value of the attribute that is the mode. For an interval series, the mode will be determined by the formula

where is the lower limit of the modal interval; – the value of the modal interval; – frequency corresponding to the modal interval; – frequency preceding the modal interval; – frequency of the interval following the modal one.

Median. Median () is the value of the attribute of the middle unit of the ranked series. A ranked series is a series in which the attribute values ​​are written in ascending or descending order. Or the median is a value that divides the number of an ordered variation series into two equal parts: one part has a value of the varying characteristic that is less than the average option, and the other has a value that is greater.

To find the median, first determine its ordinal number. To do this, if the number of units is odd, one is added to the sum of all frequencies and everything is divided by two. With an even number of units, the median is found as the value of the attribute of a unit, the serial number of which is determined by the total sum of frequencies divided by two. Knowing the serial number of the median, it is easy to find its value using the accumulated frequencies.

Calculation of the median in a discrete series. According to the sample survey, data on the distribution of families by number of children was obtained, table. 5.5. To determine the median, we first determine its ordinal number

=

Then we will construct a series of accumulated frequencies (, using the serial number and the accumulated frequency we will find the median. The accumulated frequency of 33 shows that in 33 families the number of children does not exceed 1 child, but since the number of the median is 50, the median will be in the range from 34 to 55 families.

Table 5.5

Distribution of the number of families based on the number of children

Number of children in the family

Number of families, – the value of the median interval;

All considered forms of power averages have an important property (unlike structural averages) - the formula for determining the average includes all values ​​of the series, i.e. the size of the average is influenced by the value of each option.

On the one hand, this is a very positive property because in this case, the effect of all causes affecting all units of the population under study is taken into account. On the other hand, even one observation included in the source data by chance can significantly distort the idea of ​​the level of development of the trait being studied in the population under consideration (especially in short series).

Quartiles and deciles. By analogy with finding the median in variation series, you can find the value of a characteristic for any unit of the ranked series. So, in particular, you can find the value of the attribute for units dividing a series into 4 equal parts, into 10, etc.

Quartiles. The options that divide the ranked series into four equal parts are called quartiles.

In this case, they distinguish: the lower (or first) quartile (Q1) - the value of the attribute for a unit of the ranked series, dividing the population in the ratio of ¼ to ¾ and the upper (or third) quartile (Q3) - the value of the attribute for the unit of the ranked series, dividing the population in the ratio ¾ to ¼.

The second quartile is the median Q2 = Me. The lower and upper quartiles in an interval series are calculated using a formula similar to the median.

where is the lower limit of the interval containing the lower and upper quartiles, respectively;

– accumulated frequency of the interval preceding the interval containing the lower or upper quartile;

– frequencies of quartile intervals (lower and upper)

The intervals containing Q1 and Q3 are determined by the accumulated frequencies (or frequencies).

Deciles. In addition to quartiles, deciles are calculated - options that divide the ranked series into 10 equal parts.

They are designated by D, the first decile D1 divides the series in the ratio of 1/10 and 9/10, the second D2 - 2/10 and 8/10, etc. They are calculated according to the same scheme as the median and quartiles.

Both the median, quartiles, and deciles belong to the so-called ordinal statistics, which is understood as an option that occupies a certain ordinal place in the ranked series.

RUSSIAN ACADEMY OF NATIONAL ECONOMY AND PUBLIC SERVICE under the PRESIDENT OF THE RUSSIAN FEDERATION

ORYOL BRANCH

Department of Mathematics and mathematical methods in management

Independent work

Mathematics

on the topic “Variation series and its characteristics”

for students full-time department Faculty of Economics and Management

areas of training "Human Resources Management"


Goal of the work: Mastering concepts mathematical statistics and methods of primary data processing.

An example of solving typical problems.

Task 1.

The following data was obtained through the survey ():

1 2 3 2 2 4 3 3 5 1 0 2 4 3 2 2 3 3 1 3 2 4 2 4 3 3 3 2 0 6

3 3 1 1 2 3 1 4 3 1 7 4 3 4 2 3 2 3 3 1 4 3 1 4 5 3 4 2 4 5

3 6 4 1 3 2 4 1 3 1 0 0 4 6 4 7 4 1 3 5

Necessary:

1) Compose a variation series ( statistical distribution samples), having previously written down a ranked discrete series of options.

2) Construct a frequency polygon and cumulate.

3) Compile a series of distributions of relative frequencies (frequencies).

4) Find the main numerical characteristics of the variation series (use simplified formulas to find them): a) arithmetic mean, b) median Meh and fashion Mo, c) dispersion s 2, d) standard deviation s, e) coefficient of variation V.

5) Explain the meaning of the results obtained.

Solution.

1) To compile ranked discrete series of options Let's sort the survey data by size and arrange them in ascending order

0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

5 5 5 5 6 6 6 7 7.

Let’s compose a variation series by writing the observed values ​​(variants) in the first row of the table, and the corresponding frequencies in the second (Table 1)

Table 1.

2) A frequency polygon is a broken line connecting points ( x i; n i), i=1, 2,…, m, Where m X.

Let us depict the polygon of frequencies of the variation series (Fig. 1).

Fig.1. Frequency polygon

The cumulative curve (cumulate) for a discrete variation series represents a broken line connecting the points ( x i; n i nak), i=1, 2,…, m.

Let's find the accumulated frequencies n i nak(the accumulated frequency shows how many variants were observed with a characteristic value less X). We enter the found values ​​in the third row of Table 1.



Let's build a cumulate (Fig. 2).

Fig.2. Cumulates

3) Let's find the relative frequencies (frequencies), where , where m– number of different characteristic values X, which we will calculate with equal accuracy.

Let us write down the distribution series of relative frequencies (frequencies) in the form of Table 2

table 2

4) Let's find the main numerical characteristics of the variation series:

a) Find the arithmetic mean using a simplified formula:

,

where are conditional options

Let's put With= 3 (one of the average observed values), k= 1 (the difference between two neighboring options) and draw up a calculation table (Table 3).

Table 3.

x i n i u i u i n i u i 2 n i
-3 -12
-2 -26
-1 -14
Sum -11

Then the arithmetic mean

b) Median Meh variation series is the value of the characteristic that falls in the middle of the ranked series of observations. This discrete variation series contains an even number of terms ( n=80), which means that the median is equal to half the sum of the two middle options.

Fashion Mo variation series is called the option that corresponds to the highest frequency. For a given variation series, the highest frequency n max = 24 corresponds to option X= 3, means fashion Mo=3.

c) Variance s 2, which is a measure of the dispersion of possible values ​​of the indicator X around its average value, we find it using a simplified formula:

, Where u i– conditional options

We will also include intermediate calculations in Table 3.

Then the variance

d) Standard deviation s we find it using the formula:

.

e) Coefficient of variation V: (),

The coefficient of variation is an immeasurable quantity, so it is suitable for comparing the dispersion of variation series, the variants of which have different dimensions.

The coefficient of variation

.

5) The meaning of the results obtained is that the value characterizes the average value of the characteristic X within the sample under consideration, that is, the average value was 2.86. Standard deviation s describes the absolute spread of indicator values X and in in this case amounts to s≈ 1.55. The coefficient of variation V characterizes the relative variability of the indicator X, that is, the relative spread around its average value, and in this case is .

Answer: ; ; ; .

Task 2.

The following data is available on the equity capital of the 40 largest banks in Central Russia:

12,0 49,4 22,4 39,3 90,5 15,2 75,0 73,0 62,3 25,2
70,4 50,3 72,0 71,6 43,7 68,3 28,3 44,9 86,6 61,0
41,0 70,9 27,3 22,9 88,6 42,5 41,9 55,0 56,9 68,1
120,8 52,4 42,0 119,3 49,6 110,6 54,5 99,3 111,5 26,1

Necessary:

1) Construct an interval variation series.

2) Calculate the sample mean and sample variance

3) Find the standard deviation and coefficient of variation.

4) Construct a histogram of frequency distributions.

Solution.

1) Let's choose an arbitrary number of intervals, for example, 8. Then the width of the interval is:

.

Let's create a calculation table:

Interval option, x k –x k +1 Frequency, n i Middle of the interval x i Conditional option, and i and i n i and i 2 n i (and i+ 1) 2 n i
10 – 25 17,5 – 3 – 12
25 – 40 32,5 – 2 – 10
40 – 55 47,5 – 1 – 11
55 – 70 62,5
70 – 85 77,5
85 – 100 92,5
100 – 115 107,5
115 – 130 122,5
Sum – 5

The value selected as false zero is c= 62.5 (this option is located approximately in the middle of the variation series) .

Conditional options are determined by the formula

Share with friends or save for yourself:

Loading...