Solve a system of equations using the least squares method in excel. Applying the least squares method in Excel. A few words about the correctness of the initial data used for prediction

Method least squares is a mathematical procedure for constructing a linear equation that would most closely match a set of two series of numbers. The purpose of this method is to minimize the total squared error. Excel has tools that can be used to apply this method in calculations. Let's see how it's done.

Using the Method in Excel

o Enabling the Solver add-on

o Task conditions

o Decision

Using a Method in Excel

The method of least squares (LSM) is a mathematical description of the dependence of one variable on another. It can be used for forecasting.

Enable the Solver add-in

In order to use OLS in Excel, you need to enable the add-in "Search for a Solution", which is disabled by default.

1. Go to the tab "File".

2. Click on the name of the section "Options".

3. In the window that opens, stop the selection on the subsection "Add-ons".

4. In the block "Control", which is located at the bottom of the window, set the switch to the position "Excel Add-Ins"(if it has a different value) and click on the button "Go...".

5. A small window opens. Put a checkmark next to the option "Search for a Solution". Click on the button OK.

Now the function Finding a Solution in Excel is activated, and its tools appear on the ribbon.

Lesson: Finding a Solution in Excel

Conditions of the problem

Let us describe the application of the least squares on specific example. We have two rows of numbers x And y, the sequence of which is shown in the image below.

This dependence can most accurately be described by the function:

At the same time, it is known that x=0 y also equal 0 . Therefore, this equation can be described by the dependence y=nx.

We have to find the minimum sum of squares of the difference.

Solution

Let us proceed to the description of the direct application of the method.

1. To the left of the first value x put a number 1 . This will be the approximate value of the first value of the coefficient n.

2. To the right of the column y add another column nx. In the first cell of this column we write the formula for multiplying the coefficient n to the cell of the first variable x. At the same time, we make the link to the field with the coefficient absolute, since this value will not change. We click on the button Enter.

3. Using the fill handle, copy this formula to the entire range of the table in the column below.

4. In a separate cell, we calculate the sum of the differences of the squares of the values y And nx. To do this, click on the button "Insert Function".



5. In the opened "Function Wizard" looking for an entry "SUMMKVRAZN". Select it and click on the button OK.

6. The arguments window opens. In field "Array_x" y. In field "Array_y" enter a range of column cells nx. In order to enter values, simply place the cursor in the field and select the appropriate range on the sheet. After entering, click on the button OK.

7. Go to the tab "Data". On the ribbon in the toolbox "Analysis" click on the button "Search for a Solution".

8. The tool's parameters window opens. In field "Optimize objective function" specify the address of the cell with the formula "SUMMKVRAZN". In parameter "Before" be sure to set the switch to the position "Minimum". In field "Changing Cells" specify the address with the value of the coefficient n. Click on the button "Find a solution".

9. The solution will be displayed in the coefficient cell n. It is this value that will be the least square of the function. If the result satisfies the user, then click on the button OK in an additional window.

As you can see, the application of the least squares method is a rather complicated mathematical procedure. We have shown it in action with the simplest example, but there are much more complex cases. However, the Microsoft Excel toolkit is designed to simplify the calculations as much as possible.

http://multitest.semico.ru/mnk.htm

General provisions

The smaller the number in absolute value, the better the straight line (2) is chosen. As a characteristic of the accuracy of the selection of a straight line (2), we can take the sum of squares

The minimum conditions for S will be

(6)
(7)

Equations (6) and (7) can be written in the following form:

(8)
(9)

From equations (8) and (9) it is easy to find a and b from the experimental values ​​x i and y i . The line (2) defined by equations (8) and (9) is called the line obtained by the least squares method (this name emphasizes that the sum of squares S has a minimum). Equations (8) and (9), from which the straight line (2) is determined, are called normal equations.

It is possible to indicate a simple and general way of compiling normal equations. Using experimental points (1) and equation (2), we can write down the system of equations for a and b

y 1 \u003d ax 1 +b,
y2=ax2+b, ... (10)
yn=axn+b,

We multiply the left and right parts of each of these equations by the coefficient at the first unknown a (i.e. x 1 , x 2 , ..., x n) and add the resulting equations, as a result we get the first normal equation (8).

We multiply the left and right sides of each of these equations by the coefficient of the second unknown b, i.e. by 1, and add the resulting equations, resulting in the second normal equation (9).

This method of obtaining normal equations is general: it is suitable, for example, for the function

is a constant value and it must be determined from experimental data (1).

The system of equations for k can be written:

Find the line (2) using the least squares method.

Solution. We find:

X i =21, y i =46.3, x i 2 =91, x i y i =179.1.

We write equations (8) and (9)91a+21b=179.1,

21a+6b=46.3, from here we find
a=0.98 b=4.3.

Which finds the widest application in various fields of science and practice. It can be physics, chemistry, biology, economics, sociology, psychology and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will arrange for you a ticket to an amazing country called Econometrics=) … How do you not want that?! It's very good there - you just have to decide! …But what you probably definitely want is to learn how to solve problems least squares. And especially diligent readers will learn to solve them not only accurately, but also VERY FAST ;-) But first general statement of the problem+ related example:

Let indicators be studied in some subject area that have a quantitative expression. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption can be scientific hypothesis and be based on elementary common sense. Let's leave science aside, however, and explore more appetizing areas - namely, grocery stores. Denote by:

– retail space of a grocery store, sq.m.,
- annual turnover of a grocery store, million rubles.

It is quite clear that the larger the area of ​​the store, the greater its turnover in most cases.

Suppose that after conducting observations / experiments / calculations / dancing with a tambourine, we have at our disposal numerical data:

With grocery stores, I think everything is clear: - this is the area of ​​the 1st store, - its annual turnover, - the area of ​​the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate assessment of the turnover can be obtained using mathematical statistics. However, do not be distracted, the course of commercial espionage is already paid =)

Tabular data can also be written in the form of points and depicted in the usual way for us. Cartesian system .

Let's answer an important question: how many points are needed for a qualitative study?

The bigger, the better. The minimum admissible set consists of 5-6 points. In addition, with a small amount of data, “abnormal” results should not be included in the sample. So, for example, a small elite store can help out orders of magnitude more than “their colleagues”, thereby distorting the general pattern that needs to be found!

If it’s quite simple, we need to choose a function , schedule which passes as close as possible to the points . Such a function is called approximating (approximation - approximation) or theoretical function . Generally speaking, here immediately appears the obvious "applicant" - the polynomial high degree, whose graph passes through ALL points. But this option is complicated, and often simply incorrect. (because the chart will “wind” all the time and poorly reflect the main trend).

Thus, the desired function must be sufficiently simple and at the same time reflect the dependence adequately. As you might guess, one of the methods for finding such functions is called least squares. First, let's analyze its essence in a general way. Let some function approximate the experimental data:


How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how big the sum is, but the problem is that the differences can be negative. (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it suggests itself to take the sum modules deviations:

or in folded form: (suddenly, who doesn’t know: is the sum icon, and is an auxiliary variable-“counter”, which takes values ​​from 1 to ).

By approximating the experimental points with different functions, we will obtain different values ​​of , and it is obvious that where this sum is smaller, that function is more accurate.

Such a method exists and is called least modulus method. However, in practice it has become much more widespread. least square method, in which possible negative values ​​are eliminated not by the modulus, but by squaring the deviations:

, after which efforts are directed to the selection of such a function that the sum of the squared deviations was as small as possible. Actually, hence the name of the method.

And now we're back to another important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic, exponential, logarithmic, quadratic etc. And, of course, here I would immediately like to "reduce the field of activity." What class of functions to choose for research? Primitive but effective technique:

- The easiest way to draw points on the drawing and analyze their location. If they tend to be in a straight line, then you should look for straight line equation with optimal values ​​and . In other words, the task is to find SUCH coefficients - so that the sum of the squared deviations is the smallest.

If the points are located, for example, along hyperbole, then it is clear that the linear function will give a poor approximation. In this case, we are looking for the most “favorable” coefficients for the hyperbola equation - those that give the minimum sum of squares .

Now notice that in both cases we are talking about functions of two variables, whose arguments are searched dependency options:

And in essence, we need to solve a standard problem - to find minimum of a function of two variables.

Recall our example: suppose that the "shop" points tend to be located in a straight line and there is every reason to believe the presence linear dependence turnover from the trading area. Let's find SUCH coefficients "a" and "be" so that the sum of squared deviations was the smallest. Everything as usual - first partial derivatives of the 1st order. According to linearity rule you can differentiate right under the sum icon:

If you want to use this information for an essay or a term paper - I will be very grateful for the link in the list of sources, you will find such detailed calculations in few places:

Let's make a standard system:

We reduce each equation by a “two” and, in addition, “break apart” the sums:

Note : independently analyze why "a" and "be" can be taken out of the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in an "applied" form:

after which the algorithm for solving our problem begins to be drawn:

Do we know the coordinates of the points? We know. Sums can we find? Easily. We compose the simplest system of two linear equations with two unknowns("a" and "beh"). We solve the system, for example, Cramer's method, resulting in a stationary point . Checking sufficient condition for an extremum, we can verify that at this point the function reaches precisely minimum. Verification is associated with additional calculations and therefore we will leave it behind the scenes. (if necessary, the missing frame can be viewed). We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer . Roughly speaking, its graph passes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In the situation with our example, the equation allows you to predict what kind of turnover ("yig") will be at the store with one or another value of the selling area (one or another meaning of "x"). Yes, the resulting forecast will be only a forecast, but in many cases it will turn out to be quite accurate.

I will analyze just one problem with "real" numbers, since there are no difficulties in it - all calculations are at the level school curriculum 7-8 grade. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations for the optimal hyperbola, exponent, and some other functions.

In fact, it remains to distribute the promised goodies - so that you learn how to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which, in a Cartesian rectangular coordinate system, plot experimental points and a graph of the approximating function . Find the sum of squared deviations between empirical and theoretical values. Find out if the function is better (in terms of the least squares method) approximate experimental points.

Note that "x" values ​​are natural values, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can be fractional. In addition, depending on the content of a particular task, both "X" and "G" values ​​can be fully or partially negative. Well, we have been given a “faceless” task, and we start it solution:

We find the coefficients of the optimal function as a solution to the system:

For the purposes of a more compact notation, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

It is more convenient to calculate the required amounts in a tabular form:


Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we get the following system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term by term. But this is luck - in practice, systems are often not gifted, and in such cases it saves Cramer's method:
, so the system has a unique solution.

Let's do a check. I understand that I don’t want to, but why skip mistakes where you can absolutely not miss them? Substitute the found solution into the left side of each equation of the system:

The right parts of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the desired approximating function: – from all linear functions experimental data is best approximated by it.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle "the more - the less"), and this fact is immediately revealed by the negative angular coefficient. Function informs us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less sold.

To plot the approximating function, we find two of its values:

and execute the drawing:


The constructed line is called trend line (namely, a linear trend line, i.e. in the general case, a trend is not necessarily a straight line). Everyone is familiar with the expression "to be in trend", and I think that this term does not need additional comments.

Calculate the sum of squared deviations between empirical and theoretical values. Geometrically, this is the sum of the squares of the lengths of the "crimson" segments (two of which are so small you can't even see them).

Let's summarize the calculations in a table:


They can again be carried out manually, just in case I will give an example for the 1st point:

but it is much more efficient to do the already known way:

Let's repeat: what is the meaning of the result? From all linear functions function the exponent is the smallest, that is, it is the best approximation in its family. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function will it be better to approximate the experimental points?

Let's find the corresponding sum of squared deviations - to distinguish them, I will designate them with the letter "epsilon". The technique is exactly the same:


And again for every fire calculation for the 1st point:

In Excel, we use the standard function EXP (Syntax can be found in Excel Help).

Conclusion: , so the exponential function approximates the experimental points worse than the straight line .

But it should be noted here that "worse" is doesn't mean yet, what is wrong. Now I built a graph of this exponential function - and it also passes close to the points - so much so that without an analytical study it is difficult to say which function is more accurate.

This completes the solution, and I return to the question of the natural values ​​of the argument. In various studies, as a rule, economic or sociological, months, years or other equal time intervals are numbered with natural "X". Consider, for example, such a problem.

The method of least squares (LSM) belongs to the field of regression analysis. It has many uses as it allows for approximate representation given function others are simpler. LSM can be extremely useful in processing observations, and it is actively used to estimate some quantities from the results of measurements of others containing random errors. In this article, you will learn how to implement least squares calculations in Excel.

Statement of the problem on a specific example

Suppose there are two indicators X and Y. Moreover, Y depends on X. Since OLS is of interest to us from the point of view of regression analysis (in Excel, its methods are implemented using built-in functions), we should immediately proceed to consider a specific problem.

So, let X be the selling area of ​​a grocery store, measured in square meters, and Y is the annual turnover, defined in millions of rubles.

It is required to make a forecast of what turnover (Y) the store will have if it has one or another retail space. Obviously, the function Y = f (X) is increasing, since the hypermarket sells more goods than the stall.

A few words about the correctness of the initial data used for prediction

Let's say we have a table built with data for n stores.

According to mathematical statistics, the results will be more or less correct if the data on at least 5-6 objects are examined. Also, "anomalous" results cannot be used. In particular, an elite small boutique can have a turnover many times greater than the turnover of large outlets of the “masmarket” class.

The essence of the method

The table data can be displayed on the Cartesian plane as points M 1 (x 1, y 1), ... M n (x n, y n). Now the solution of the problem will be reduced to the selection of an approximating function y = f (x), which has a graph passing as close as possible to the points M 1, M 2, .. M n .

Of course, you can use a high degree polynomial, but this option is not only difficult to implement, but simply incorrect, since it will not reflect the main trend that needs to be detected. The most reasonable solution is to search for a straight line y = ax + b, which best approximates the experimental data, and more precisely, the coefficients - a and b.

Accuracy score

For any approximation, the assessment of its accuracy is of particular importance. Denote by e i the difference (deviation) between the functional and experimental values ​​for the point x i , i.e. e i = y i - f (x i).

Obviously, to assess the accuracy of the approximation, you can use the sum of deviations, i.e., when choosing a straight line for an approximate representation of the dependence of X on Y, preference should be given to the one that has the smallest value of the sum e i at all points under consideration. However, not everything is so simple, since along with positive deviations, there will practically be negative ones.

You can solve the problem using the deviation modules or their squares. The latter method is the most widely used. It is used in many areas, including regression analysis (in Excel, its implementation is carried out using two built-in functions), and has long been proven to be effective.

Least square method

In Excel, as you know, there is a built-in autosum function that allows you to calculate the values ​​of all values ​​located in the selected range. Thus, nothing will prevent us from calculating the value of the expression (e 1 2 + e 2 2 + e 3 2 + ... e n 2).

In mathematical notation, this looks like:

Since the decision was initially made to approximate using a straight line, we have:

Thus, the task of finding a straight line that best describes a specific relationship between X and Y amounts to calculating the minimum of a function of two variables:

This requires equating to zero partial derivatives with respect to new variables a and b, and solving a primitive system consisting of two equations with 2 unknowns of the form:

After simple transformations, including dividing by 2 and manipulating the sums, we get:

Solving it, for example, by Cramer's method, we obtain a stationary point with certain coefficients a * and b * . This is the minimum, i.e. to predict what turnover the store will have with a certain area, the straight line y = a * x + b * is suitable, which is regression model for the example in question. Of course, it will not allow you to find the exact result, but it will help you get an idea of ​​\u200b\u200bwhether buying a store on credit for a particular area will pay off.

How to implement the least squares method in Excel

Excel has a function for calculating the value of the least squares. It has the following form: TREND (known Y values; known X values; new X values; constant). Let's apply the formula for calculating the OLS in Excel to our table.

To do this, in the cell in which the result of the calculation using the least squares method in Excel should be displayed, enter the “=” sign and select the “TREND” function. In the window that opens, fill in the appropriate fields, highlighting:

  • range of known values ​​for Y (in this case data for trade turnover);
  • range x 1 , …x n , i.e. the size of retail space;
  • and known and unknown values ​​of x, for which you need to find out the size of the turnover (for information about their location on the worksheet, see below).

In addition, there is a logical variable "Const" in the formula. If you enter 1 in the field corresponding to it, then this will mean that calculations should be carried out, assuming that b \u003d 0.

If you need to know the forecast for more than one x value, then after entering the formula, you should not press "Enter", but you need to type the combination "Shift" + "Control" + "Enter" ("Enter") on the keyboard.

Some Features

Regression analysis can be accessible even to dummies. The Excel formula for predicting the value of an array of unknown variables - "TREND" - can be used even by those who have never heard of the least squares method. It is enough just to know some features of its work. In particular:

  • If you place the range of known values ​​of the variable y in one row or column, then each row (column) with known values ​​of x will be perceived by the program as a separate variable.
  • If the range with known x is not specified in the TREND window, then in case of using the function in Excel, the program will consider it as an array consisting of integers, the number of which corresponds to the range with the given values ​​of the variable y.
  • To output an array of "predicted" values, the trend expression must be entered as an array formula.
  • If no new x values ​​are specified, then the TREND function considers them equal to the known ones. If they are not specified, then array 1 is taken as an argument; 2; 3; 4;…, which is commensurate with the range with already given parameters y.
  • The range containing the new x values ​​must have the same or more rows or columns as the range with the given y values. In other words, it must be proportionate to the independent variables.
  • An array with known x values ​​can contain multiple variables. However, if we are talking about only one, then it is required that the ranges with the given values ​​of x and y be commensurate. In the case of several variables, it is necessary that the range with the given y values ​​fit in one column or one row.

FORECAST function

Regression analysis in Excel is implemented using several functions. One of them is called "PREDICTION". It is similar to TREND, i.e. it gives the result of calculations using the least squares method. However, only for one X, for which the value of Y is unknown.

Now you know the Excel formulas for dummies that allow you to predict the value of the future value of an indicator according to a linear trend.

The least squares method (LSM) is based on minimizing the sum of squared deviations of the selected function from the data under study. In this article, we approximate the available data using a linear functiony = a x + b .

Least square method(English) Ordinary Least Squares , OLS) is one of the basic methods of regression analysis in terms of estimating unknown parameters regression models according to sample data.

Consider approximation by functions depending on only one variable:

  • Linear: y=ax+b (this article)
  • : y=a*Ln(x)+b
  • : y=a*x m
  • : y=a*EXP(b*x)+c
  • : y=ax 2 +bx+c

Note: Cases of approximation by a polynomial from the 3rd to the 6th degree are considered in this article. Approximation by a trigonometric polynomial is considered here.

Linear dependency

We are interested in the relationship of 2 variables X And y. There is an assumption that y depends on X according to the linear law y = ax + b. To determine the parameters of this relationship, the researcher made observations: for each value of x i, a measurement of y i was made (see example file). Accordingly, let there be 20 pairs of values ​​(х i ; y i).

Note: If the change step by X is constant, then to build scatterplots can be used, if not, then you need to use the chart type dotted .

It is obvious from the diagram that the relationship between the variables is close to linear. To understand which of the many straight lines most "correctly" describes the relationship between variables, it is necessary to determine the criterion by which the lines will be compared.

As such a criterion, we use the expression:

Where ŷ i = a * x i + b ; n – number of pairs of values ​​(in our case n=20)

The above expression is the sum of the squared distances between the observed values ​​of y i and ŷ i and is often denoted as SSE ( sum of squared Errors (Residuals), sum of squared errors (residuals)) .

Least square method is to select such a line ŷ = ax + b, for which the above expression takes the minimum value.

Note: Any line in two-dimensional space is uniquely determined by the values ​​of 2 parameters: a (slope) and b (shift).

It is believed that the smaller the sum of the squared distances, the better the corresponding line approximates the available data and can be further used to predict the values ​​of y from the variable x. It is clear that even if in reality there is no relationship between the variables or the relationship is nonlinear, then the least squares will still select the “best” line. Thus, the LSM does not say anything about the presence of a real relationship of variables, the method simply allows you to choose such function parameters a And b , for which the above expression is minimal.

Having done not very complex mathematical operations (see for more details), you can calculate the parameters a And b :

As can be seen from the formula, the parameter a is the ratio of covariance and , so in MS EXCEL to calculate the parameter A You can use the following formulas (see example file sheet Linear):

= COVAR(B26:B45;C26:C45)/ VAR.G(B26:B45) or

= COVARIATION.B(B26:B45;C26:C45)/VAR.B(B26:B45)

Also to calculate the parameter A you can use the formula = SLOPE(C26:C45;B26:B45). For parameter b use formula = INTERCUT(C26:C45;B26:B45) .

And finally, the LINEST() function allows you to calculate both parameters at once. To enter a formula LINEST(C26:C45;B26:B45) select 2 cells in a row and press CTRL + SHIFT + ENTER(see article about). The left cell will return the value A , on the right b .

Note: To not mess with input array formulas you will need to additionally use the INDEX() function. Formula = INDEX(LINEST(C26:C45,B26:B45),1) or just = LINEST(C26:C45;B26:B45) will return the parameter responsible for the slope of the line, i.e. A . Formula = INDEX(LINEST(C26:C45,B26:B45),2) will return the parameter responsible for the intersection of the line with the Y axis, i.e. b .

After calculating the parameters, scatterplot line can be drawn.

Another way to draw a straight line using the least squares method is the chart tool trend line. To do this, select the diagram, select from the menu Layout tab, V group Analysis click trend line, then Linear approximation .

By checking the box "show equation in the diagram" in the dialog box, you can make sure that the parameters found above match the values ​​in the diagram.

Note: In order for the parameters to match, the chart type must be . The fact is that when constructing a diagram Schedule x-axis values ​​cannot be set by the user (the user can only specify labels that do not affect the location of the points). Instead of X values, the sequence 1 is used; 2; 3; … (for category numbering). Therefore, if building trend line on the type diagram Schedule, then the values ​​of this sequence will be used instead of the actual values ​​of X, which will lead to an incorrect result (unless, of course, the actual values ​​of X do not match the sequence 1; 2; 3; ...).

It has many applications, as it allows an approximate representation of a given function by other simpler ones. LSM can be extremely useful in processing observations, and it is actively used to estimate some quantities from the results of measurements of others containing random errors. In this article, you will learn how to implement least squares calculations in Excel.

Statement of the problem on a specific example

Suppose there are two indicators X and Y. Moreover, Y depends on X. Since OLS is of interest to us from the point of view of regression analysis (in Excel, its methods are implemented using built-in functions), we should immediately proceed to consider a specific problem.

So, let X be the selling area of ​​a grocery store, measured in square meters, and Y be the annual turnover, defined in millions of rubles.

It is required to make a forecast of what turnover (Y) the store will have if it has one or another retail space. Obviously, the function Y = f (X) is increasing, since the hypermarket sells more goods than the stall.

A few words about the correctness of the initial data used for prediction

Let's say we have a table built with data for n stores.

According to mathematical statistics, the results will be more or less correct if the data on at least 5-6 objects are examined. Also, "anomalous" results cannot be used. In particular, an elite small boutique can have a turnover many times greater than the turnover of large outlets of the “masmarket” class.

The essence of the method

The table data can be displayed on the Cartesian plane as points M 1 (x 1, y 1), ... M n (x n, y n). Now the solution of the problem will be reduced to the selection of an approximating function y = f (x), which has a graph passing as close as possible to the points M 1, M 2, .. M n .

Of course, you can use a high degree polynomial, but this option is not only difficult to implement, but simply incorrect, since it will not reflect the main trend that needs to be detected. The most reasonable solution is to search for a straight line y = ax + b, which best approximates the experimental data, and more precisely, the coefficients - a and b.

Accuracy score

For any approximation, the assessment of its accuracy is of particular importance. Denote by e i the difference (deviation) between the functional and experimental values ​​for the point x i , i.e. e i = y i - f (x i).

Obviously, to assess the accuracy of the approximation, you can use the sum of deviations, i.e., when choosing a straight line for an approximate representation of the dependence of X on Y, preference should be given to the one that has the smallest value of the sum e i at all points under consideration. However, not everything is so simple, since along with positive deviations, there will practically be negative ones.

You can solve the problem using the deviation modules or their squares. The latter method is the most widely used. It is used in many areas, including regression analysis (in Excel, its implementation is carried out using two built-in functions), and has long been proven to be effective.

Least square method

In Excel, as you know, there is a built-in autosum function that allows you to calculate the values ​​of all values ​​located in the selected range. Thus, nothing will prevent us from calculating the value of the expression (e 1 2 + e 2 2 + e 3 2 + ... e n 2).

In mathematical notation, this looks like:

Since the decision was initially made to approximate using a straight line, we have:

Thus, the task of finding a straight line that best describes a specific relationship between X and Y amounts to calculating the minimum of a function of two variables:

This requires equating to zero partial derivatives with respect to new variables a and b, and solving a primitive system consisting of two equations with 2 unknowns of the form:

After simple transformations, including dividing by 2 and manipulating the sums, we get:

Solving it, for example, by Cramer's method, we obtain a stationary point with certain coefficients a * and b * . This is the minimum, i.e. to predict what turnover the store will have for a certain area, the straight line y = a * x + b * is suitable, which is a regression model for the example in question. Of course, it will not allow you to find the exact result, but it will help you get an idea of ​​\u200b\u200bwhether buying a store on credit for a particular area will pay off.

How to implement the least squares method in Excel

Excel has a function for calculating the value of the least squares. It has the following form: TREND (known Y values; known X values; new X values; constant). Let's apply the formula for calculating the OLS in Excel to our table.

To do this, in the cell in which the result of the calculation using the least squares method in Excel should be displayed, enter the “=” sign and select the “TREND” function. In the window that opens, fill in the appropriate fields, highlighting:

  • range of known values ​​for Y (in this case data for turnover);
  • range x 1 , …x n , i.e. the size of retail space;
  • and known and unknown values ​​of x, for which you need to find out the size of the turnover (for information about their location on the worksheet, see below).

In addition, there is a logical variable "Const" in the formula. If you enter 1 in the field corresponding to it, then this will mean that calculations should be carried out, assuming that b \u003d 0.

If you need to know the forecast for more than one x value, then after entering the formula, you should not press "Enter", but you need to type the combination "Shift" + "Control" + "Enter" ("Enter") on the keyboard.

Some Features

Regression analysis can be accessible even to dummies. The Excel formula for predicting the value of an array of unknown variables - "TREND" - can be used even by those who have never heard of the least squares method. It is enough just to know some features of its work. In particular:

  • If you place the range of known values ​​of the variable y in one row or column, then each row (column) with known values ​​of x will be perceived by the program as a separate variable.
  • If the range with known x is not specified in the TREND window, then in case of using the function in Excel, the program will consider it as an array consisting of integers, the number of which corresponds to the range with the given values ​​of the variable y.
  • To output an array of "predicted" values, the trend expression must be entered as an array formula.
  • If no new x values ​​are specified, then the TREND function considers them equal to the known ones. If they are not specified, then array 1 is taken as an argument; 2; 3; 4;…, which is commensurate with the range with already given parameters y.
  • The range containing the new x values ​​must have the same or more rows or columns as the range with the given y values. In other words, it must be proportionate to the independent variables.
  • An array with known x values ​​can contain multiple variables. However, if we are talking about only one, then it is required that the ranges with the given values ​​of x and y be commensurate. In the case of several variables, it is necessary that the range with the given y values ​​fit in one column or one row.

FORECAST function

It is implemented using several functions. One of them is called "PREDICTION". It is similar to TREND, i.e. it gives the result of calculations using the least squares method. However, only for one X, for which the value of Y is unknown.

Now you know the Excel formulas for dummies that allow you to predict the value of the future value of an indicator according to a linear trend.

Share with friends or save for yourself:

Loading...