Statistical study of the relationship between phenomena statistics. Statistical study of the relationships between socio-economic phenomena. If the relationship between two characteristics is studied, this is a pairwise correlation. If the relationship between many characteristics is studied - correlation

The study of objectively existing connections between phenomena is the most important task general theory statistics. In the process of statistical study of dependencies, cause-and-effect relationships between phenomena are revealed, which makes it possible to identify factors (signs) that have a significant impact on the variation of the phenomena and processes being studied. A cause-and-effect relationship is a connection between phenomena and processes in which a change in one of them - the cause - leads to a change in the other - the effect.

A cause is a set of conditions, circumstances, the action of which leads to the appearance of an effect. If there really are cause-and-effect relationships between phenomena, then these conditions must necessarily be realized along with the action of the causes. Causal relationships are universal and diverse, and to detect cause-and-effect relationships, it is necessary to select individual phenomena and study them in isolation.

Of particular importance when studying cause-and-effect relationships is the identification of the time sequence: the cause must always precede the effect, but not every previous event should be considered a cause, and the subsequent one - a consequence.

In real socio-economic reality, cause and effect must be considered as related phenomena, the appearance of which is due to a complex of accompanying simpler causes and effects. Between complex groups of causes and effects, multivalued connections are possible, in which one cause will be followed by one or another action, or one action will have several different causes. To establish an unambiguous causal relationship between phenomena or to predict the possible consequences of a specific cause, complete abstraction from all other phenomena in the temporal or spatial environment under study is required. Theoretically, such an abstraction is reproduced. Abstraction techniques are often used when studying the relationship between two characteristics (pairwise correlation). But the more complex the phenomena being studied, the more difficult it is to identify cause-and-effect relationships between them. The interweaving of various internal and external factors inevitably leads to some errors in determining cause and effect.

A feature of cause-and-effect relationships in socio-economic phenomena is their transitivity, i.e. cause and effect are related by correlation, not directly. However, intermediate factors are usually omitted in the analysis.

So, for example, when using indicators of the international calculation methodology, the factor of gross profit is considered to be the gross accumulation of fixed and working capital, but such factors as gross output, wages, etc. are allowed. Correctly uncovered cause-and-effect relationships make it possible to establish the strength of the influence of individual factors on the results of economic activity.

Socio-economic phenomena are the result of the simultaneous influence of a large number of causes. Consequently, when studying these phenomena, it is necessary, abstracting from secondary ones, to identify the main, fundamental causes.

At the first stage of statistical study of communication, a qualitative analysis of the phenomenon being studied is carried out using methods economic theory, sociology, concrete economics.

At the second stage, a communication model is built based on statistical methods: groupings, averages, tables, etc.

In the third and final stage, the results are interpreted; analysis is again related to the qualitative features of the phenomenon being studied.

Statistics has developed many methods for studying relationships, the choice of which depends on the goals of the study and the tasks set. The connections between signs and phenomena, due to their wide variety, are classified on a number of grounds. Signs according to their meaning for studying the relationship are divided into two classes. Traits that cause changes in other related traits are called factorial, or simply factors. Signs that change under the influence of factor signs are effective. Connections between phenomena and their characteristics are classified according to the degree of closeness of the connection, direction and analytical expression.

In statistics, a distinction is made between functional connection and stochastic dependence. A functional relationship is one in which a certain value of a factor characteristic corresponds to one and only one value of the resulting characteristic. The functional connection is manifested in all cases of observation and for each specific unit of the population under study.

If a causal dependence does not appear in each individual case, but in general, on average over a large number of observations, then such a dependence is called stochastic. A special case of stochastic is a correlation relationship, in which a change in the average value of the resulting characteristic is due to a change in factor characteristics.

Based on the degree of connection closeness, quantitative criteria for assessing the closeness of connection are distinguished (Table 1).

Table 1 Quantitative criteria for assessing the closeness of connections

By direction, direct and reverse connections are distinguished. In a direct connection with an increase or decrease in the values ​​of a factor characteristic, an increase or decrease in the values ​​of the resultant characteristic occurs. For example, an increase in labor productivity helps to increase the level of profitability of production. When feedback the values ​​of the resulting characteristic change under the influence of the factor characteristic, but in the opposite direction compared to the change in the factor characteristic. Thus, with an increase in the level of capital productivity, the cost per unit of production decreases.

According to the analytical expression, connections are distinguished between linear (or simply linear) and nonlinear. If a statistical relationship between phenomena can be approximately expressed by the equation of a straight line, then it is called a linear relationship; if it is expressed by the equation of any curved line (parabola, hyperbola, power, exponential, exponential, etc.), then such a relationship is called nonlinear or curvilinear.

Statistics do not always require quantitative assessments of the relationship; often it is important to determine only its direction and nature, to identify the form of influence of some factors on others. To identify the presence of a relationship, its nature and direction in statistics, methods of bringing parallel data are used; analytical groups; graphic; correlation, regression.

The method of bringing parallel data is based on comparing two or more series of statistical values. Such a comparison allows us to establish the existence of a connection and get an idea of ​​its nature. Let's compare the changes in two quantities and as the value increases, the value also increases. Therefore, the connection between them is direct, and it can be described either by a straight line equation or a second-order parabola equation.

The relationship between two features is depicted graphically using the correlation field. In the coordinate system, the values ​​of the factor characteristic are plotted on the abscissa axis, and the resultant characteristic is plotted on the ordinate axis. Each intersection of lines drawn through these axes is indicated by a dot. In the absence of close connections, a random arrangement of points on the graph is observed. The stronger the connection between the features, the more closely the points will be grouped around a certain line expressing the form of the connection.

It is characteristic of socio-economic phenomena that, along with the significant factors that form the level of the resulting characteristic, it is influenced by many other unaccounted and random factors. This indicates that the relationships between the phenomena studied by statistics are correlational in nature and are analytically expressed by a function of the form.

The correlation method has as its task the quantitative determination of the closeness of the connection between two characteristics (in a pairwise connection) and between the resultant and many factor characteristics (in a multifactorial connection).

Correlation is a statistical dependence between random variables that do not have a strictly functional nature, in which a change in one of random variables leads to change mathematical expectation another.

In statistics, the following dependency options are distinguished:

  • -pair correlation - a connection between two characteristics (resultative and factor or two factor);
  • -partial correlation - the dependence between the resultant and one factor characteristics with a fixed value of other factor characteristics;
  • -multiple correlation - the dependence of the resultant and two or more factor characteristics included in the study.

The closeness of the connection is quantitatively expressed by the magnitude of the correlation coefficients. Correlation coefficients, representing a quantitative characteristic of the close relationship between characteristics, make it possible to determine the “usefulness” of factor characteristics in constructing multiple regression equations. The value of the correlation coefficient also serves as an assessment of the consistency of the regression equation with the identified cause-and-effect relationships.

Initially, correlation studies were carried out in biology, and later spread to other areas, including socio-economics. Simultaneously with correlation, regression began to be used. Correlation and regression are closely related: correlation evaluates the strength (closeness) of a statistical relationship, regression examines its form. Both serve to establish the relationship between phenomena, to determine the presence or absence of a connection.

Correlation and regression analysis as general concept includes measuring the tightness, direction of the connection and establishing an analytical expression (form) of the connection (regression analysis).

The regression method consists in determining the analytical expression of a relationship in which a change in one value (called a dependent or resultant characteristic) is due to the influence of one or more independent values ​​(factors), and the set of all other factors that also influence the dependent value is taken as constant and average meanings. Regression can be single-factor (paired) or multi-factor (multiple).

Depending on the form of dependence there are:

Linear regression, which is expressed by a straight line equation (linear function) of the form:

Yx = a0 + a1x;

Nonlinear regression, which is expressed by equations of the form:

Yx = a0 + a1x + a2 x 2 - parabola; Yx = a0 ++ a1/x - hyperbola

According to the direction of communication there are:

  • -direct regression (positive), which occurs if, with an increase or decrease in the independent value, the values ​​of the dependent value also increase or decrease accordingly;
  • -inverse (negative) regression, which appears under the condition that with an increase or decrease in the independent value, the dependent value decreases or increases accordingly.

Positive and negative regressions can be more easily understood if they are represented graphically.

For simple (paired) regression, in conditions where cause-and-effect relationships are sufficiently fully established, only the last provision acquires practical meaning; With a multiplicity of causal connections, it is impossible to clearly distinguish some causal phenomena from others.

seasonal fluctuation regression

9.1. Causality, regression, correlation

In the process of statistical study of dependencies, cause-and-effect relationships between phenomena are revealed, which makes it possible to identify factors (signs) that have a major influence on the variation of the phenomena and processes being studied. A cause-and-effect relationship is a connection between phenomena and processes, when a change in one of them, the cause, leads to a change in the other, the effect.

According to their significance for studying the relationship, signs are divided into two types: factorial and effective.

Socio-economic phenomena are the result of the simultaneous influence of a large number of causes. Consequently, when studying these phenomena, it is necessary to identify the main, main causes, abstracting from the secondary ones.

The first stage of statistical study of the relationship is based on a qualitative analysis of the phenomenon being studied, i.e. study of its nature using the methods of economic theory, sociology, and concrete economics. The second stage is building a communication model. The third and final stage, interpretation of the results, is again associated with the qualitative features of the phenomenon being studied.

In statistics, a distinction is made between functional and stochastic relationships. A functional relationship is one in which a certain value of a factor characteristic corresponds to one and only one value of the resulting characteristic. This connection is manifested in all cases of observation and for each specific unit of the population under study. If a causal dependence does not appear in each individual case, but in general, on average over a large number of observations, then such a dependence is called stochastic. A special case of a stochastic relationship is a correlation relationship, in which a change in the average value of an effective characteristic is due to a change in factor characteristics.

The connections between signs and phenomena, due to their wide variety, are classified on a number of grounds: according to the degree of closeness of the connection, direction and analytical expression.

The degree of closeness of the correlation connection can be quantitatively assessed using the correlation coefficient, the value of which determines the nature of the relationship (Table 1).

Table 1 - Quantitative criteria for tightness of connection

Towards distinguish direct and reverse connections.

In a direct connection with an increase or decrease in the values ​​of a factor characteristic, an increase or decrease in the values ​​of the resultant characteristic occurs. In the case of feedback, as the values ​​of the factor attribute increase, the values ​​of the resultant attribute decrease, and vice versa.

According to the analytical expression, connections are distinguished: linear(or just linear) and nonlinear. If the statistical relationship between phenomena can be approximately expressed by the equation of a straight line, then it is called linear; if it is expressed by the equation of any curved line (parabola, hyperbola, exponential, exponential, etc.), then such a relationship is called nonlinear or curvilinear.

To identify the presence of a connection, its nature and direction in statistics, the following methods are used: bringing parallel data; analytical groups; statistical graphs; correlations.

Parallel Data Reduction Method is based on a comparison of two or more series of statistical values. Such a comparison allows us to establish the existence of a connection and get an idea of ​​its nature. For example, the change in two quantities is represented by the following data.

Graphically, the relationship between two characteristics is depicted using the correlation field. In the coordinate system, the values ​​of the factor characteristic are plotted on the abscissa axis, and the resultant characteristic is plotted on the ordinate axis. The stronger the connection between the characteristics, the more closely the points will be grouped around a certain line expressing the form of the connection (Fig.).

In the absence of close connections, there is a random arrangement of points on the graph.

It is typical for socio-economic phenomena that, along with the significant factors that form the level of the effective attribute, it is influenced by many other unaccounted and random factors. This indicates that the relationships between the phenomena studied by statistics are correlational in nature.

Correlation is a statistical relationship between random variables that do not have a strictly functional nature, in which a change in one of the random variables leads to a change in the mathematical expectation (average value) of the other.

In statistics it is customary to distinguish between the following types of dependencies.

1. Pair correlation – a connection between two characteristics (resultative and factor or two factor).

2. Partial correlation - the dependence between the resultant and one factor characteristics with a fixed value of other factor characteristics.

3. Multiple correlation - the dependence of the resultant and two or more factor characteristics included in the study.

The task of correlation analysis is a quantitative determination of the closeness of the connection between two characteristics (in a pairwise connection) and between the resulting and multiple factor characteristics (in a multifactorial connection).

The closeness of the connection is quantitatively expressed by the magnitude of the correlation coefficients, which make it possible to determine the “usefulness” of factor characteristics when constructing multiple regression equations. In addition, the value of the correlation coefficient serves as an assessment of the consistency of the regression equation with the identified cause-and-effect relationships.

9.2. Assessing the tightness of the connection

The closeness of the correlation between factor and performance characteristics can be calculated using the following coefficients: empirical correlation coefficient (Fechner coefficient); association coefficient; Pearson and Chuprov mutual contingency coefficient; contingent factor; Spearman and Kendal rank correlation coefficients; linear correlation coefficient; correlation ratio, etc.

The linear correlation coefficient characterizes the tightest connection: , where is the average of the products of feature values xy; – average values ​​of features X And at; - standard deviations of characteristics X And u. It is used if the relationship between the characteristics is linear

The linear correlation coefficient can be positive or negative.

Its positive value indicates a direct connection, and its negative value indicates an inverse connection. The closer to ±1, the closer the connection. With a functional connection between characteristics = ±1. Closeness to 0 means that the relationship between the features is weak.

9.3. Regression Analysis Methods

Closely related to the concept of correlation is the concept regression. The first serves to assess the closeness of the connection, the second examines its form. Correlation and regression analysis, as a general concept, includes measuring the tightness and direction of the connection (correlation analysis) and establishing an analytical expression (form) of the connection (regression analysis).

After the presence of statistical relationships between variables has been identified using correlation analysis and the degree of their closeness has been assessed, we move on to a mathematical description of a specific type of dependency using regression analysis. To do this, select a class of functions that relates the effective indicator at and arguments x 1 , x 2 ,… xk, select the most informative arguments, calculate estimates of unknown values ​​of communication parameters and analyze the properties of the resulting equation.

Function describing the dependence of the average value of the resulting characteristic at from the given argument values ​​is called regression function (equation). Regression is a line, a type of dependence of the average effective characteristic on the factor one.

The most developed in the theory of statistics is the methodology of pair correlation, which considers the influence of variation of a factor characteristic x on the resultant y

The linear correlation equation has the form: .

Options a 0 And a 1 are called parameters of the regression equation.

To determine the parameters of the regression equation, the method is used least squares, which gives a system of two normal equations:

.

By solving this system in general form, we can obtain formulas for determining the parameters of the regression equation: ,

EXERCISES

Problem 9.1. 15 factories are ranked in order of increasing profitability of production.

Enterprise No.

Production profitability, %

Output per worker, t/person

Cost per unit of production, rub.

Establish the presence and form of a correlation between production profitability and output, production profitability and unit cost of production using the methods of statistical graphs and regression analysis.

1. A course in the theory of statistics for training specialists in financial and economic profiles: textbook / Salin V. N. - M.: Finance and Statistics, 2006. - 480 p.

2. General theory of statistics: a textbook for university students / M. R. Efimova, E. V. Petrova, V. N. Rumyantsev. - 2nd ed., rev. and additional - M.: INFRA-M, 2006. - 414 p.

3. Workshop on the general theory of statistics: tutorial/ M.R. Efimova, O.I. Ganchenko, E.V. Petrova. - Ed. 3rd, revised and additional - M. Finance and Statistics, 2007. - 368 p.

4. Workshop on statistics / A.P. Zinchenko, A.E., Shibalkin, O.B. Tarasova, E.V. Shaikina; Ed. A.P. Zinchenk. – M.: KolosS, 2003. – 392 p.

5. Statistics: Textbook for students. institutions prof. education / V.S. Mkhitaryan, T.A. Dubrova, V.G. Minashkin et al.; Ed. V.S. Mkhitaryan. – 3rd ed., erased. – M.: Publishing Center “Academy”, 2004. -272 p.

6. Statistics: textbook for university students / St. Petersburg. state University of Economics and Finance; edited by I. I. Eliseeva. - M.: Higher education, 2008. - 566 p.

7. Theory of statistics: a textbook for students of economic specialties at universities / R. A. Shmoilova [et al.]; ed. R. A. Shmoilova. - 5th ed. - M.: Finance and Statistics, 2008. - 656 p.

The study of modern production shows that each phenomenon is closely interconnected and interacting.

When studying specific dependencies, some characteristics act as factors that determine changes in other characteristics. The characteristics of this group are called factor characteristics (factor characteristics), and the characteristics that are the result of the influence of these factors are called effective (as the volume of output is influenced by the technical equipment of production, then the volume of production is effective, and the technical equipment is a factor characteristic). There are two types of dependencies between economic phenomena – functional and stochastic. With a functional connection, each defined system of values ​​of factor characteristics corresponds to one or several strictly defined values ​​of the resulting characteristic. Examples of functional dependence can be given from the field of physical phenomena (S = v·t).

Stochastic (probabilistic) connection manifests itself only in mass phenomena. In this regard, each specific system of values ​​of factor characteristics corresponds to a certain set of values ​​of the resulting characteristic. A change in factor characteristics does not lead to a strictly defined change in the resulting characteristic, but to a change only in the distribution of its values. This is due to the fact that the dependent variable, in addition to the selected variable, is influenced by a number of uncontrolled or unaccounted factors, and also because the measurement of variables is inevitably accompanied by some random errors. Since the values ​​of the dependent variable are subject to random scatter, they cannot be predicted with sufficient accuracy, but are only indicated with a certain probability (the number of defective parts per shift, the number of downtime per shift, etc.).

Stochastic communication is called correlation. Correlation in the broad sense of the word means a connection, a relationship between objectively existing phenomena and processes. Regression is a special case of correlation. While correlation analysis evaluates the strength of a stochastic relationship, regression analysis examines its form, i.e. the correlation equation (regression equation) is found.

Let's consider different kinds correlations and regressions.

Regression is classified according to the number of variables:

1) paired – regression between two variables (profit and labor productivity);

2) multiple – regression between the dependent variable y and several variables (labor productivity, level of production mechanization, worker qualifications).

Regarding the form of dependence, there are:

linear regression; nonlinear regression.

Depending on the nature of the regression, there are:

1) direct regression. It occurs if, with an increase or decrease in the values ​​of factor variables, the values ​​of the resultant variable also increase or decrease;

2) reverse regression. In this case, with an increase or decrease in the values ​​of the factor characteristic, the resulting characteristic decreases or increases.

Regarding the type of connection of phenomena, they distinguish:

1) direct regression. In this case, the phenomena are directly connected to each other (profit costs);

2) indirect regression. It occurs if the factor and outcome variables are not directly in a cause-and-effect relationship and the factor variable acts on the outcome variable through some other variable (the number of fires and grain yield (meteorological conditions));

3) false or absurd regression. It arises with a formal approach to the phenomena under study. As a result, you can come to false and even meaningless dependencies (the number of imported fruits and the increase in fatal road accidents).

The classification and correlations are similar.

The study of interdependencies in economics is of great importance. Statistics not only answers the question of the real existence of a connection between phenomena, but also provides a quantitative description of this relationship. Knowing the nature of the dependence of one phenomenon on another, it is possible to explain the causes and extent of changes in the phenomenon, as well as plan the necessary measures for its further change. In order for the results of correlation analysis to be found practical use and gave the desired result, certain requirements must be met:

1) homogeneity of units subject to correlation analysis (enterprises produce the same type of products, the same nature of the technological process and type of equipment);

2) a sufficient number of observations;

3) the factors included in the study must be independent of each other.

To study functional relationships, balance and index methods are used. To study stochastic relationships, the method of parallel series, the method of analytical groupings, analysis of variance and analysis of regressions and correlations are used.

The simplest method for detecting connections is to compare two parallel series. The essence of the method is that first the indicators characterizing the factor characteristic are ranked, and then the corresponding indicators of the resultant characteristic are placed parallel to them. Comparison of series constructed in this way makes it possible not only to confirm the very presence of a connection, but also to identify its direction.

In the case when the compared series consist of a large number of units, the direction of communication for different units may be different. In this case, it is more advisable to use correlation tables. In a correlation table, the factor characteristic (x) is placed in rows, and the resultant characteristic (y) is placed in columns. The numbers located at the intersection of the rows and columns of the table show the frequency of repetition of a given combination of x and y. The construction of a correlation table begins with grouping observation units according to the values ​​of the factor and resultant characteristics. If the frequencies in the correlation table are located diagonally from the upper left corner to the lower right corner, then we can assume the presence of a direct correlation. If the frequencies are located diagonally from right to left, then the presence of feedback between the signs is assumed.

Another method for detecting connections is to build a group table (analytical grouping method). The set of values ​​of factor x is divided into groups and for each group the average value of the resulting characteristic is calculated. It is assumed that with a sufficiently large number of observations in each group, the influence of other random factors when calculating the group average will cancel out and the dependence of the effective characteristic on the factor characteristic will become clearer and, therefore, differences in the value of the means will be associated only with differences in the value of this factor characteristic. If there were no connection between the factor and the resultant attribute, then all group means would be approximately the same in size.

The simplest indicator of the closeness of a connection is the sign correlation coefficient (H. Fechner coefficient):

,

where is the number of coincidences of signs of deviations of an individual value from the average;

– the number of discrepancies in the signs of deviations of an individual value from the average.

This coefficient allows you to get an idea of ​​the direction of the connection and an approximate characteristic of its tightness. To calculate it, the average values ​​of the resultant and factor characteristics are calculated, and then the deviation signs are assigned for all values ​​of the interrelated characteristics Kf = [-1;+1]. If the signs of all deviations coincide, then Kf = 1 – a direct connection; if the signs of all deviations are different, then Kf = - 1, which indicates the presence of feedback.

Table 28

Number of workers and balance sheet profit

Number of workers, people

Balance sheet profit, thousand rubles.

Sign of deviations of individual trait values ​​from the average

Match (a), mismatch (b)

Thousand rub.

, thus, there is a weak feedback between the signs.

To approximate the direction and strength of the relationship between the characteristics represented by two series, you can also use the rank correlation coefficient. When determining the rank correlation coefficient, the x values ​​are ranked, and then the corresponding y values ​​are ranked. As a result, we get ranks, i.e. places, numbers of units of a population in an ordered series. Moreover, if there are identical options, each of them is assigned the arithmetic mean of their ranks.

Spearman's rank correlation coefficient:

,

where d is the difference between the ranks of the corresponding values ​​of two characteristics;

n – number of units in a row.

The rank correlation coefficient takes values ​​[-1; 1]. If – close direct connection, – close feedback, – there is no connection. The rank correlation coefficient has certain advantages over other characteristics of the direction and closeness of the connection: it can be determined when studying data that cannot be numbered, but is ranked (shades, quality).

To numerically characterize the closeness of the connection, indicators of variation of the resulting characteristic can be used: its total dispersion and intergroup dispersion ().

Kendal rank correlation coefficient:

,

where q is the number of ranks arranged in reverse order.

In the practice of statistical research, it is often necessary to analyze alternative distributions, when the population is distributed for each characteristic into two groups with opposite characteristics. The closeness of the connection in this case can be assessed using the contingent coefficient:

.

Table 29

Dependence of student performance on gender

Student population

passed the exams

those who did not pass the exams

.

Consequently, there is practically no connection between the student’s gender and his academic performance.

The association coefficient is calculated as follows:

Previously reviewed statistical methods Studies of relationships often turn out to be insufficient, because they do not allow the existing connection to be expressed in the form of a specific mathematical equation. Methods of parallel series and analytical groupings are effective only with a small number of factor characteristics, while socio-economic phenomena usually develop under the influence of many reasons. These limitations are eliminated by the method of analyzing correlations and regressions.

The method of analyzing correlations and regressions consists in constructing and analyzing an economic-mathematical model in the form of a regression equation expressing the dependence of a phenomenon on its determining factors. For example, the dependence of production volume (y) (million rubles) on its technical equipment (x) (%) is expressed by the following dependence:

.

It can be assumed that with an increase in technical equipment by 1%, production volume will increase by an average of 21.4 million rubles.

The method of analyzing correlations and regressions consists of the following steps:

preliminary analysis; collection of information and its primary processing; building a model (regression equations); evaluation and analysis of the model.

At the first stage, it is necessary to formulate in general terms the research problem (studying the influence of various factors on the level of labor productivity). Next, you should determine the methodology for measuring the performance indicator (labor productivity can be determined by natural, labor or cost methods). It is also necessary to determine the number of factors that have the most significant impact on the formation of the effective characteristic.

At the stage of collecting and processing information, the researcher must remember that the population being studied must be large enough in volume. The source data must be qualitatively and quantitatively homogeneous.

When constructing a correlation model (regression equation), the question arises about the type of analytical function that characterizes the mechanism of relationship between characteristics. This relationship can be expressed:

straight line ; second order parabola ; hyperbole; exponential function, etc.

That is, the question arises about choosing the form of communication. The type of empirical regression suggests what type of curve can be described. Next, the regression equation is solved. Then, using special criteria, their adequacy is assessed and the form of connection is selected that provides the best approximation and sufficient statistical reliability. Having chosen the form of the connection and constructed a regression equation in general form, it is necessary to find the numerical value of its parameters. To find the parameters, the least squares method is used. Its essence is as follows.

The study of dependencies is a daunting task, since socio-economic phenomena themselves are complex and diverse. In addition, the conclusions drawn are probabilistic in nature, since they are drawn from data that is a sample in time or space.

Statistical methods for studying dependence are built taking into account the characteristics of the patterns being studied. Statistics studies primarily stochastic relationships, when one value of a factor characteristic corresponds to a group of values ​​of the resulting characteristic. If, with a change in the values ​​of a factor characteristic, the group average values ​​of the resulting characteristic change, then such connections are called correlation. Not every stochastic dependence is correlational. If each value of a factor characteristic corresponds to a strictly defined value of the resultant characteristic, then such a dependence is functional. It is also called complete correlation. Ambiguous correlations are called incomplete correlation.

According to the mechanism of interaction, they are distinguished:

· Direct connections - when the cause directly affects the effect;

· Indirect connections - when there are a number of intermediate signs between cause and effect (for example, the effect of age on earnings).

The following areas are distinguished:

· Direct connections - when the value of the factor and resultant characteristics change in the same direction;

· Feedback - when the values ​​of factor and resultant characteristics change in different directions.

· Straight-line (linear) connections - expressed by a straight line;

· Curvilinear connections - expressed by parabola, hyperbola.

Based on the number of interrelated characteristics, they are distinguished:

· Paired connections - when the relationship between two characteristics (factorial and resultant) is analyzed;

· Multiple connections - characterize the influence of several characteristics on one effective one.

Based on the strength of interaction, they are distinguished:

· Weak (noticeable) connections;

· Strong (close) connections.

The task of statistics is to determine the presence, direction, form and closeness of the relationship.

Various statistical methods are used to study the dependence. Since dependencies in statistics are manifested through the variation of characteristics, the methods mainly measure and compare the variation of factor and resultant characteristics.

If we plot the grouping results on a graph, we get an empirical regression line. Intervals of factor characteristic values ​​are replaced by average group indicators.

In addition to the empirical regression line, which directly determines the form and direction of relationships, there is a correlation field on which parametric data are reflected.

The correlation field can also be used to judge the nature of the relationship. If the points are concentrated near the diagonal running from left to right, from bottom to top, then the connection is direct. If near another diagonal - the opposite. If the points are scattered throughout the graph field, there is no connection.

When constructing an analytical grouping, it is important to correctly determine the size of the interval. If, as a result of the initial grouping, the connection does not clearly appear, you can enlarge the interval. However, by enlarging the intervals, it is sometimes possible to detect a connection even where there is none. Therefore, when constructing an analytical grouping, we are guided by the rule: the more groups we can identify without encountering a single exception, the more reliable our hypothesis about the presence and form of the connection.

Non-mathematical methods provide an approximate estimate of the presence, form and direction of the connection. Deeper analysis is carried out using mathematical methods that have developed on the basis of methods used by non-mathematical statisticians:

· Regression analysis, which allows you to express the form of a relationship using an equation.

· Correlation analysis used to determine the closeness or strength of the relationship between features. Correlation methods are divided into:

- Parametric methods that provide an assessment of the closeness of the relationship directly based on the values ​​of the factor and resultant characteristics;

- Nonparametric methods - provide an estimate based on conditional estimates of characteristics.

An assessment of the tightness of curvilinear dependencies is given after calculating the parameter of the regression equation. Therefore, this method is called correlation-regression.

If the dependence of one factor and resultant characteristics is analyzed, then in this case we are dealing with pair correlation and regression. If several factor and performance characteristics are analyzed, this is multiple correlation and regression.

Regression is a line that characterizes the most general trend in the relationship between factor and resultant characteristics.

It is assumed that the analytical equation expresses the true form of the dependence, and all deviations from this function are due to the action of various random causes. Since correlations are studied, a change in a factor characteristic corresponds to a change in the average level of the resulting characteristic. When constructing analytical groupings, we considered the empirical regression line. However, this line is not suitable for economic modeling and its shape depends on the arbitrariness of the researcher. Theoretically, the regression line is less dependent on the subjectivity of the researcher, however, there may also be arbitrariness in choosing the form or function of the relationship. It is believed that the choice of function should be based on deep knowledge specifics of the subject of research.

In practice, the following forms of regression models are most often used:

· Linear;

· Semi-logarithmetic curve;

· Hyperbole;

· Second order parabola;

· Exponential function;

· Power function.


This property of the average, which states that the sum of the squared deviations of all variants of a series from the arithmetic mean is less than the sum of the squared deviations from any other number, is the basis of the least squares method, which allows you to calculate the parameters of the selected regression equation in such a way that the regression line is, on average, the least distant from empirical data.

Nonparametric methods for measuring the closeness of relationships between quantitative traits were the first methods for measuring the closeness of relationships. The French scientist Guirriy first tried to measure the closeness of the connection in the 30s of the 19th century. He compared the average group values ​​of the factor and resultant characteristics. In this case, absolute values ​​were replaced by their ratios to certain constants. The results obtained were ranked in ascending order. Girriy judged the presence or absence of a connection by comparing earlier by groups and counting the number of matches and discrepancies of ranks. If the number of matches prevailed, the connection was considered direct. Mismatch - reverse. If there were equal matches and mismatches, there was no connection.

The Girri method was used by Fechner when developing his coefficient, as well as by Spearman when developing the rank correlation coefficient.

The coefficient indicates the presence of a very close feedback relationship.

Along with the Fechner coefficient, rank correlation coefficients are used to measure the relationship of quantitative characteristics. The most common among them is the Spearman rank correlation coefficient.

Nonparametric methods are used to measure the closeness of the relationship between qualitative and alternative characteristics, as well as quantitative characteristics, the distribution of which differs from the normal distribution.

To measure the connection between alternative characteristics, David Yule's coefficient of association and Karl Pearson's coefficient of contingency are used. To calculate these indicators, the following matrix of mutual frequency distribution is used:

a, b, c, d - frequencies of mutual distribution of features.

With direct connection, the frequencies are concentrated along the a-d diagonal, with feedback along the b-c diagonal, with no connection, the frequencies are almost evenly distributed throughout the entire field of the table.

Association coefficient

The association coefficient is not suitable for calculation if one of the frequencies along the diagonal is 0. In this case, the contingent coefficient is used, which is calculated by the formula:

The contingent coefficient also indicates the practical absence of connection between the characteristics (its value is always less than K ac).

To measure the closeness of a linear relationship, a correlation coefficient is used. The basic form of the correlation coefficient is as follows:


In fact, the correlation coefficient is the average of the product of standard deviations:

If there is no connection between the characteristics, then the resulting characteristic does not vary when the factor characteristic changes, therefore. The same result is obtained when the sums of negative and positive products are balanced.

Usually, to calculate the correlation coefficient, formulas are used that use those indicators that were already calculated when determining the parameters of the regression equation.

Multiple correlation and regression are used to study the influence of two or more factors on an outcome characteristic. The research process includes several stages.

First, the form of the relationship equation is selected; most often, an n-dimensional linear formula is selected:

Because the calculations are important and time-consuming, the selection of factors to include in the regression model is critical. Based on qualitative analysis, it is necessary to select the most significant factors. At the factor selection stage, a unit matrix of paired correlation coefficients between the characteristics of the factors selected for inclusion in the regression equation is also calculated.

The study of objectively existing connections between socio-economic phenomena and processes is the most important task of the theory of statistics. In progress

Statistical research of dependencies reveals cause-and-effect relationships between phenomena, which makes it possible to identify factors (signs) that have a major influence on the variation of the phenomena and processes being studied. Cause-and-effect relationships are such a connection between phenomena and processes when a change in one of them - the cause - leads to a change in the other - the effect.

Financial and economic processes are the result of the simultaneous influence of a large number of causes. Consequently, when studying these processes, it is necessary to identify the main, main causes, abstracting from the secondary ones.

The first stage of the statistical study of communication is based on qualitative analysis associated with the analysis of the nature of a social or economic phenomenon using the methods of economic theory, sociology, and concrete economics. The second stage - building a communication model, is based on statistical methods: groupings, average values, and so on. The third and final stage, interpretation of the results, is again associated with the qualitative features of the phenomenon being studied. Statistics has developed many methods for studying relationships. The choice of communication study method depends on cognitive purpose and research objectives.

Signs, according to their essence and significance for studying the relationship, are divided into two classes. Signs that cause changes in other associated signs are called factorial, or simply factors. Characteristics that change under the influence of factor characteristics are called effective.

In statistics, a distinction is made between functional and stochastic dependencies. Functional is a relationship in which a certain value of a factor characteristic corresponds to one and only one value of the resultant characteristic.

If a causal dependence does not appear in each individual case, but in general, on average, with a large number of observations, then such a dependence is called stochastic. A special case of stochastic coupling is correlation a relationship in which a change in the average value of a resultant characteristic is due to a change in factor characteristics.

Connections between phenomena and their characteristics are classified according to the degree of closeness,

direction and analytical expression.

According to the degree of closeness of the connection, they are distinguished:

With an increase or decrease in the values ​​of the factor characteristic, there is an increase or decrease in the values ​​of the resulting characteristic. Thus, an increase in production volumes contributes to an increase in the profit of the enterprise. When reverse connections, the values ​​of the resulting characteristic change under the influence of the factor characteristic, but in the opposite direction compared to the change in the factor characteristic, that is reverse– this is a relationship in which, with an increase or decrease in the values ​​of one characteristic, there is a decrease or increase in the values ​​of another characteristic. Thus, a reduction in the cost per unit of production entails an increase in profitability.

According to the analytical expression, connections are distinguished straight(or simply whether-

neynye) And nonlinear. If a statistical relationship between phenomena can be applied

is approximately expressed by the equation of a straight line, it is called linear type connection.

Share with friends or save for yourself:

Loading...