Statistical study of the relationship between phenomena statistics. Statistical study of the relationships between socio-economic phenomena. If the relationship between two characteristics is studied, this is a pairwise correlation. If the relationship between many characteristics is studied - correlation

The study of objectively existing connections between phenomena is the most important task general theory statistics. In the process of statistical study of dependencies, cause-and-effect relationships between phenomena are revealed, which makes it possible to identify factors (signs) that have a significant impact on the variation of the phenomena and processes being studied. A cause-and-effect relationship is a connection between phenomena and processes in which a change in one of them - the cause - leads to a change in the other - the effect.

A cause is a set of conditions, circumstances, the action of which leads to the appearance of an effect. If there really are cause-and-effect relationships between phenomena, then these conditions must necessarily be realized along with the action of the causes. Causal relationships are universal and diverse, and to detect cause-and-effect relationships, it is necessary to select individual phenomena and study them in isolation.

Of particular importance when studying cause-and-effect relationships is the identification of the time sequence: the cause must always precede the effect, but not every previous event should be considered a cause, and the subsequent one - a consequence.

In real socio-economic reality, cause and effect must be considered as related phenomena, the appearance of which is due to a complex of accompanying simpler causes and effects. Between complex groups of causes and effects, multivalued connections are possible, in which one cause will be followed by one or another action, or one action will have several different causes. To establish an unambiguous causal relationship between phenomena or to predict the possible consequences of a specific cause, complete abstraction from all other phenomena in the temporal or spatial environment under study is required. Theoretically, such an abstraction is reproduced. Abstraction techniques are often used when studying the relationship between two characteristics (pairwise correlation). But the more complex the phenomena being studied, the more difficult it is to identify cause-and-effect relationships between them. The interweaving of various internal and external factors inevitably leads to some errors in determining cause and effect.

A feature of cause-and-effect relationships in socio-economic phenomena is their transitivity, i.e. cause and effect are related by correlation, not directly. However, intermediate factors are usually omitted in the analysis.

So, for example, when using indicators of the international calculation methodology, the factor of gross profit is considered to be the gross accumulation of fixed and working capital, but such factors as gross output, wages, etc. are allowed. Correctly uncovered cause-and-effect relationships make it possible to establish the strength of the influence of individual factors on the results of economic activity.

Socio-economic phenomena are the result of the simultaneous influence of a large number of causes. Consequently, when studying these phenomena, it is necessary, abstracting from secondary ones, to identify the main, fundamental causes.

At the first stage of statistical study of communication, a qualitative analysis of the phenomenon being studied is carried out using methods economic theory, sociology, concrete economics.

At the second stage, a communication model is built based on statistical methods: groupings, averages, tables, etc.

In the third and final stage, the results are interpreted; analysis is again related to the qualitative features of the phenomenon being studied.

Statistics has developed many methods for studying relationships, the choice of which depends on the goals of the study and the tasks set. The connections between signs and phenomena, due to their wide variety, are classified on a number of grounds. Signs according to their meaning for studying the relationship are divided into two classes. Traits that cause changes in other related traits are called factorial, or simply factors. Signs that change under the influence of factor signs are effective. Connections between phenomena and their characteristics are classified according to the degree of closeness of the connection, direction and analytical expression.

In statistics, a distinction is made between functional connection and stochastic dependence. A functional relationship is one in which a certain value of a factor characteristic corresponds to one and only one value of the resulting characteristic. The functional connection is manifested in all cases of observation and for each specific unit of the population under study.

If a causal dependence does not appear in each individual case, but in general, on average over a large number of observations, then such a dependence is called stochastic. A special case of stochastic is a correlation relationship, in which a change in the average value of the resulting characteristic is due to a change in factor characteristics.

Based on the degree of connection closeness, quantitative criteria for assessing the closeness of connection are distinguished (Table 1).

Table 1 Quantitative criteria for assessing the closeness of connections

By direction, direct and reverse connections are distinguished. In a direct connection with an increase or decrease in the values ​​of a factor characteristic, an increase or decrease in the values ​​of the resultant characteristic occurs. For example, an increase in labor productivity helps to increase the level of profitability of production. When feedback the values ​​of the resulting characteristic change under the influence of the factor characteristic, but in the opposite direction compared to the change in the factor characteristic. Thus, with an increase in the level of capital productivity, the cost per unit of production decreases.

According to the analytical expression, connections are distinguished between linear (or simply linear) and nonlinear. If a statistical relationship between phenomena can be approximately expressed by the equation of a straight line, then it is called a linear relationship; if it is expressed by the equation of any curved line (parabola, hyperbola, power, exponential, exponential, etc.), then such a relationship is called nonlinear or curvilinear.

Statistics do not always require quantitative assessments of the relationship; often it is important to determine only its direction and nature, to identify the form of influence of some factors on others. To identify the presence of a relationship, its nature and direction in statistics, methods of bringing parallel data are used; analytical groups; graphic; correlation, regression.

The method of bringing parallel data is based on comparing two or more series of statistical values. Such a comparison allows us to establish the existence of a connection and get an idea of ​​its nature. Let's compare the changes in two quantities and as the value increases, the value also increases. Therefore, the connection between them is direct, and it can be described either by a straight line equation or a second-order parabola equation.

The relationship between two features is depicted graphically using the correlation field. In the coordinate system, the values ​​of the factor characteristic are plotted on the abscissa axis, and the resultant characteristic is plotted on the ordinate axis. Each intersection of lines drawn through these axes is indicated by a dot. In the absence of close connections, a random arrangement of points on the graph is observed. The stronger the connection between the features, the more closely the points will be grouped around a certain line expressing the form of the connection.

It is characteristic of socio-economic phenomena that, along with the significant factors that form the level of the resulting characteristic, it is influenced by many other unaccounted and random factors. This indicates that the relationships between the phenomena studied by statistics are correlational in nature and are analytically expressed by a function of the form.

The correlation method has as its task the quantitative determination of the closeness of the connection between two characteristics (in a pairwise connection) and between the resultant and many factor characteristics (in a multifactorial connection).

Correlation is a statistical dependence between random variables that do not have a strictly functional nature, in which a change in one of random variables leads to change mathematical expectation another.

In statistics, the following dependency options are distinguished:

  • -pair correlation - a connection between two characteristics (resultative and factor or two factor);
  • -partial correlation - the dependence between the resultant and one factor characteristics with a fixed value of other factor characteristics;
  • -multiple correlation - the dependence of the resultant and two or more factor characteristics included in the study.

The closeness of the connection is quantitatively expressed by the magnitude of the correlation coefficients. Correlation coefficients, representing a quantitative characteristic of the close relationship between characteristics, make it possible to determine the “usefulness” of factor characteristics in constructing multiple regression equations. The value of the correlation coefficient also serves as an assessment of the consistency of the regression equation with the identified cause-and-effect relationships.

Initially, correlation studies were carried out in biology, and later spread to other areas, including socio-economics. Simultaneously with correlation, regression began to be used. Correlation and regression are closely related: correlation evaluates the strength (closeness) of a statistical relationship, regression examines its form. Both serve to establish the relationship between phenomena, to determine the presence or absence of a connection.

Correlation and regression analysis as general concept includes measuring the tightness, direction of the connection and establishing an analytical expression (form) of the connection (regression analysis).

The regression method consists in determining the analytical expression of a relationship in which a change in one value (called a dependent or resultant characteristic) is due to the influence of one or more independent values ​​(factors), and the set of all other factors that also influence the dependent value is taken as constant and average meanings. Regression can be single-factor (paired) or multi-factor (multiple).

Depending on the form of dependence there are:

Linear regression, which is expressed by a straight line equation (linear function) of the form:

Yx = a0 + a1x;

Nonlinear regression, which is expressed by equations of the form:

Yx = a0 + a1x + a2 x 2 - parabola; Yx = a0 ++ a1/x - hyperbola

According to the direction of communication there are:

  • -direct regression (positive), which occurs if, with an increase or decrease in the independent value, the values ​​of the dependent value also increase or decrease accordingly;
  • -inverse (negative) regression, which appears under the condition that with an increase or decrease in the independent value, the dependent value decreases or increases accordingly.

Positive and negative regressions can be more easily understood if they are represented graphically.

For simple (paired) regression, in conditions where cause-and-effect relationships are sufficiently fully established, only the last provision acquires practical meaning; With a multiplicity of causal connections, it is impossible to clearly distinguish some causal phenomena from others.

seasonal fluctuation regression

