Testing types of open test tasks examples. Types of tests and forms of test tasks - lecture. Classification of test tasks

TYPES OF TESTS AND FORMS OF TEST TASKS

1. Main types of pedagogical tests.

2. Forms of test tasks.

3. Empirical verification and statistical processing of results.

4. Principles of content selection. Criteria for assessing test content.

5. The relationship between the form of the task and the type of knowledge, skills and abilities being tested.


1. Main types of pedagogical tests

There are two main types of tests: traditional and non-traditional.

The test has composition, integrity and structure. It consists of tasks, rules for their application, grades for completing each task and recommendations for interpreting test results. The integrity of the test means the interrelation of tasks, their belonging to a common measured factor. Each test task fulfills its assigned role and therefore none of them can be removed from the test without loss of measurement quality. The structure of the test is formed by the way the tasks are connected to each other. Basically, this is the so-called factor structure, in which each item is related to others through common content and common variation in test scores.

A traditional test is a unity of at least three systems:

A formal system of tasks of increasing difficulty;

Statistical characteristics of tasks and test subjects’ results.

The traditional pedagogical test must be considered in two significant senses: - as a method of pedagogical measurement and as a result of using the test. It is surprising that texts in Russian gravitate towards the meaning of the method, while in most works of Western authors the concept of test is more often considered in the sense of results. Meanwhile, both of these meanings characterize the test from different sides, because the test must be understood simultaneously both as a method and as a result of a pedagogical measurement. One complements the other. A test, as a method, cannot be imagined without results confirming the quality of itself and the quality of measurement assessments of subjects of various levels of preparedness.

Several ideas are developed in the above definition of a traditional test.

The first idea is that the test is considered not as an ordinary set or set of questions, tasks, etc., but in the form of the concept of a “system of tasks.” Such a system is not formed by any totality, but only by that which determines the emergence of a new integrative quality that distinguishes the test from an elementary set of tasks and from other means of pedagogical control. Of the many possible systems, the best is formed by that integral set in which the quality of the test is manifested to a relatively greater extent. Hence the idea of ​​identifying the first of the two main system-forming factors - the best composition of test tasks that form the integrity. Based on this, we can give one of the shortest definitions: a test is a system of tasks that form the best methodological integrity. The integrity of the test is the stable interaction of tasks that form the test as a developing system.

The second idea is that in this definition of a test there is a departure from the deep-rooted tradition of viewing a test as a simple means of checking, testing, testing. Every test includes an element of testing; it is not all about it. For a test is also a concept, content, form, results and interpretation - everything that requires justification. This implies that the test is a qualitative means of pedagogical measurement. In accordance with the provisions of the theory, test scores are not accurate assessments of the subjects. It is correct to say that they only represent these meanings with some accuracy.

The third idea developed in our definition of a traditional test is the inclusion of a new concept - test effectiveness, which has not previously been considered in the test literature as a criterion for analysis and test creation. The leading idea of ​​a traditional test is to compare the knowledge of as many students as possible with a minimum number of tasks, in a short time, quickly, efficiently and at the lowest cost.

Essentially, this reflects the idea of ​​efficiency pedagogical activity in the field of knowledge control. I would like to think that there is no one and there is no need to object to this very idea. If our teacher can clarify educational material no worse than his foreign colleague, then it is good to test the required knowledge, for all students, for all the material studied, he is not able to due to the prevailing class-lesson system of classes in our country, the lack computer equipment, tests and programs for organizing automated self-control - the most humane form of knowledge control. He is physically unable to do this either. Due to, to put it mildly, erroneous social policy The salaries of our teachers have long been unable to compensate for the expenditure of even the physical energy necessary for good teaching, not to mention the increased expenditure of intellectual energy, which can only be achieved by thinking that is relaxed and not preoccupied with the search for bread. As noted in the literature, a qualified worker receives three to four times less than the salary level beyond which normal life activity is disrupted and the destruction of labor potential begins.

Although there are hundreds of examples of test definitions in the literature that are either difficult or impossible to agree with, this does not mean at all that this definition traditional test - the ultimate truth. Like all other concepts, it needs constant improvement. It just seems to the author that so far it is more reasoned than some other well-known concepts of the pedagogical test. However, the desire to improve concepts is a completely normal phenomenon and necessary for normally developing practice and science. Constructive attempts to give other definitions of the test or challenge existing ones are always useful, but this is precisely what we lack.

Traditional tests include homogeneous and heterogeneous tests. A homogeneous test is a system of tasks of increasing difficulty, specific form and specific content - a system created for the purpose of an objective, high-quality, and effective method for assessing the structure and measuring the level of preparedness of students in one academic discipline. It is easy to see that, at its core, the definition of a homogeneous test coincides with the definition of a traditional test.

Homogeneous tests are more common than others. In pedagogy, they are created to control knowledge in one academic discipline or in one section, such as, for example, a voluminous academic discipline like physics. In a homogeneous pedagogical test, the use of tasks that reveal other properties is not allowed. The presence of the latter violates the requirement of disciplinary purity of the pedagogical test. After all, every test measures something predetermined.

For example, a test in physics measures the test takers' knowledge, skills, and perceptions in this science. One of the difficulties of such a measurement is that physical knowledge is heavily coupled with mathematical knowledge. Therefore, the physics test expertly establishes the level of mathematical knowledge used in solving physics problems. Exceeding the accepted level leads to a bias in the results; as they are exceeded, the latter increasingly begin to depend not so much on knowledge of physics, but on knowledge of another science, mathematics. Another important aspect- the desire of some authors to include in tests not so much a test of knowledge as the ability to solve physical problems, thereby involving the intellectual component in measuring preparedness in physics.

A heterogeneous test is a system of tasks of increasing difficulty, specific form and specific content - a system created for the purpose of an objective, high-quality, and effective method for assessing the structure and measuring the level of preparedness of students in several academic disciplines. Often such tests also include psychological tasks to assess the level of intellectual development.

Typically, heterogeneous tests are used for a comprehensive assessment of school graduates, personality assessment when applying for a job, and for selecting the most prepared applicants for admission to universities. Since each heterogeneous test consists of homogeneous tests, the interpretation of test results is carried out based on the answers to the tasks of each test (here they are called scales) and, in addition, through various methods of aggregating scores, attempts are made to give overall assessment test subject's preparedness.

Let us recall that a traditional test is a method of diagnosing subjects in which they answer the same tasks, at the same time, under the same conditions and with the same score. With this orientation, the task of determining the exact volume and structure of the mastered educational material recedes, of necessity, into the background. The test selects a minimum sufficient number of tasks that allows one to relatively accurately determine, figuratively speaking, not “who knows what,” but “who knows more.” Interpretation of test results is carried out primarily in the language of testology, based on the arithmetic mean, mode or median and on the so-called percentile norms, which show what percentage of subjects have a test result worse than that of any subject taken for analysis with his test score. This interpretation is called normative-oriented. Here the conclusion is complemented by a rating: tasks answers conclusions about the knowledge of the subject rating, understood as a conclusion about the place or rank of the subject.

Integrative tests. An integrative test can be called a test consisting of a system of tasks that meet the requirements of integrative content, a test form, and increasing difficulty of tasks aimed at a generalized final diagnosis of the graduate’s preparedness educational institution. Diagnostics is carried out by presenting such tasks, the correct answers to which require integrated (generalized, clearly interrelated) knowledge of two or more academic disciplines. The creation of such tests is given only to those teachers who have knowledge of a number of academic disciplines, understand the important role of interdisciplinary connections in learning, and are able to create tasks, the correct answers to which require students to have knowledge of various disciplines and the ability to apply such knowledge.

Integrative testing is preceded by the organization of integrative training. Unfortunately, the current class-lesson form of conducting classes, combined with excessive fragmentation of academic disciplines, together with the tradition of teaching individual disciplines (rather than generalized courses), will for a long time hinder the implementation of an integrative approach in the processes of learning and monitoring preparedness. The advantage of integrative tests over heterogeneous ones lies in the greater informative content of each task and in the smaller number of tasks themselves. The need to create integrative tests increases as the level of education and the number of academic disciplines studied increases. Therefore, attempts to create such tests are noted mainly in higher school. Integrative tests are especially useful for increasing the objectivity and efficiency of the final state certification of students.

The methodology for creating integrative tests is similar to the methodology for creating traditional tests, with the exception of the work of determining the content of tasks. To select the content of integrative tests, the use of expert methods is mandatory. This is due to the fact that only experts can determine the adequacy of the content of the tasks for the purposes of the test. But, first of all, it will be important for the experts themselves to decide on the goals of education and study of certain educational programs, and then agree among themselves on fundamental issues, leaving for examination only variations in the understanding of the degree of importance of individual elements in the overall structure of preparedness. A selected composition of experts in foreign literature, agreed upon on fundamental issues, is often a panel. Or, given the differences in the meaning of the last word in the Russian language, such a composition can be called a representative expert group. The group is selected to adequately represent the approach used to create the test in question.

Adaptive tests. The feasibility of adaptive control arises from the need to rationalize traditional testing. Every teacher understands that a well-prepared student does not need to give easy or very easy tasks. Because the probability of making the right decision is too high. In addition, lightweight materials do not have noticeable development potential. Symmetrically, due to the high probability of a wrong decision, there is no point in giving difficult tasks to a weak student. It is known that difficult and very difficult tasks reduce the learning motivation of many students. It was necessary to find a comparable, on one scale, measure of the difficulty of tasks and a measure of the level of knowledge. This measure was found in pedagogical measurement theory. The Danish mathematician G. Rask called this measure the word “logit”. After the advent of computers, this measure formed the basis of the adaptive knowledge control methodology, which uses methods to regulate the difficulty and number of tasks presented, depending on the students’ response. If the answer is successful, the computer selects the next task more difficult; if the answer is unsuccessful, the next task will be easier. Naturally, this algorithm requires preliminary testing of all tasks, determining their degree of difficulty, as well as creating a bank of tasks and a special program.

The use of tasks that correspond to the level of preparedness significantly increases the accuracy of measurements and minimizes the time of individual testing to approximately 5 - 10 minutes. Adaptive testing allows for computer-based issuance of tasks at the optimal, approximately 50% level of probability of the correct answer for each student.

In Western literature, three options for adaptive testing are distinguished. The first is called pyramid testing. In the absence of preliminary assessments, all subjects are given a task of average difficulty and only then, depending on the answer, each subject is given a task easier or more difficult; At each step it is useful to use the rule of dividing the difficulty scale in half. In the second option, control begins with any level of difficulty desired by the test subject, with a gradual approach to the real level of knowledge. The third option is when testing is carried out through a bank of tasks divided by difficulty levels.

Thus, an adaptive test is a variant of an automated testing system in which the parameters of difficulty and differentiating ability of each task are known in advance. This system is created in the form of a computer bank of tasks, ordered in accordance with the characteristics of the tasks of interest. The most main characteristic adaptive test tasks is their level of difficulty, obtained empirically, which means: before getting to the bank, each task undergoes empirical testing on a sufficiently large number of typical students of the population of interest. The words “contingent of interest” are intended to represent here the meaning of the more rigorous concept of the “general population” known in science.

Our widespread educational model of the adaptive school E.A. Yamburg, proceeds, essentially, from the general ideas of adaptive learning and adaptive knowledge control. The origins of this approach can be traced back to the emergence of the pedagogical works of Comenius, Pestalozzi and Disterweg, who were united by the ideas of conformity to nature and the humanity of teaching. The Student was at the center of their pedagogical systems. For example, in A. Disterweg’s little-known work “Didactic Rules” you can read the following words: “Teach in accordance with nature... Teach without gaps... Start teaching where the student left off... Before you start teaching, one must examine the starting point... Without knowing where the student stopped, it is impossible to teach him properly.” Lack of awareness about the real level of knowledge of students and natural differences in their abilities to assimilate the proposed knowledge have become the main reason for the emergence of adaptive systems based on the principle of individualization of learning. This principle is difficult to implement in a traditional, class-lesson form.

Before the advent of the first computers, most known system, close to adaptive learning, was the so-called “System of complete assimilation of knowledge.”

Criterion-based tests. With a criterion-based approach, tests are created to compare the educational achievements of each student with the amount of knowledge, skills or abilities planned to be acquired. In this case, a specific content area rather than a particular sample of students is used as the interpretative frame of reference. The emphasis is on what the student can do and what he knows, rather than on how he compares to others.

There are also difficulties with the criterion-oriented approach. As a rule, they are associated with the selection of test content. Within the framework of the criterion-referenced approach, the test tries to reflect the entire content of the controlled course, or at least what can be taken as this full volume. The percentage of correct completion of tasks is considered as the level of preparation or as the degree of mastery of the total volume of course content. Of course, within the framework of a criterion-oriented approach, there is every reason for the latter interpretation, since the test includes everything that can be conditionally accepted as 100%.

Criteria-based tests cover a fairly wide range of tasks. In particular, they help to collect complete and objective information about the educational achievements of each student individually and a group of students; compare the student’s knowledge, skills and abilities with the requirements laid down in state educational standards; select students who have reached the planned level of preparedness; evaluate effectiveness professional activity individual teachers and groups of teachers; evaluate the effectiveness of various training programs.

An emphasis on a content-based approach may have a beneficial effect on teacher testing as a whole. This approach benefits, for example, the interpretation of test scores during ongoing monitoring. The student receives information not about how he looks compared to others, but about what he can do and what he knows in comparison with the given requirements for the level of training in the subject. Of course, such an interpretation does not exclude a combination with the attribution of results to norms, which, as a rule, occurs during the ongoing monitoring of students’ knowledge in everyday life. educational process. In this case, testing is integrated with learning and helps the student identify possible difficulties, as well as timely correct errors in mastering the content of educational material.



During the student’s reasoning, the chain is interrupted (inconsistency of the concept or explanation), then the number of significant operations is determined before the logical chain is broken. The peculiarity of compiling test tasks at this level of mastery is that it is almost impossible to create an unambiguous standard. The standard can be created in the form of a problem solution diagram. Example: logical chain. ...

Plan

    Main types of pedagogical tests.

    Forms of test tasks.

    Empirical verification and statistical processing of results.

    Principles of content selection. Criteria for assessing test content.

    The relationship between the form of the task and the type of knowledge, skills and abilities being tested.

  1. Main types of pedagogical tests

There are two main types of tests: traditional and non-traditional.

The test has composition, integrity and structure. It consists of tasks, rules for their application, grades for completing each task and recommendations for interpreting test results. The integrity of the test means the interrelation of tasks, their belonging to a common measured factor. Each test task fulfills its assigned role and therefore none of them can be removed from the test without loss of measurement quality. The structure of the test is formed by the way the tasks are connected to each other. Basically, this is the so-called factor structure, in which each item is related to others through common content and common variation in test scores.

A traditional test is a unity of at least three systems:

A formal system of tasks of increasing difficulty;

Statistical characteristics of tasks and test subjects’ results.

The traditional pedagogical test must be considered in two significant senses: - as a method of pedagogical measurement and as a result of using the test. It is surprising that texts in Russian gravitate towards the meaning of the method, while in most works of Western authors the concept of test is more often considered in the sense of results. Meanwhile, both of these meanings characterize the test from different sides, because the test must be understood simultaneously both as a method and as a result of a pedagogical measurement. One complements the other. A test, as a method, cannot be imagined without results confirming the quality of itself and the quality of measurement assessments of subjects of various levels of preparedness.

Several ideas are developed in the above definition of a traditional test.

The first idea is that the test is considered not as an ordinary set or set of questions, tasks, etc., but in the form of the concept of a “system of tasks.” Such a system is not formed by any totality, but only by that which determines the emergence of a new integrative quality that distinguishes the test from an elementary set of tasks and from other means of pedagogical control. Of the many possible systems, the best is formed by that integral set in which the quality of the test is manifested to a relatively greater extent. Hence the idea of ​​identifying the first of the two main system-forming factors - the best composition of test tasks that form the integrity. Based on this, we can give one of the shortest definitions: a test is a system of tasks that form the best methodological integrity. The integrity of the test is the stable interaction of tasks that form the test as a developing system.

The second idea is that in this definition of a test there is a departure from the deep-rooted tradition of viewing a test as a simple means of checking, testing, testing. Every test includes an element of testing; it is not all about it. For a test is also a concept, content, form, results and interpretation - everything that requires justification. This implies that the test is a qualitative means of pedagogical measurement. According to the theory, test scores are not accurate assessments of subjects. It is correct to say that they only represent these meanings with some accuracy.

The third idea developed in our definition of a traditional test is the inclusion of a new concept - test effectiveness, which has not previously been considered in the test literature as a criterion for analysis and test creation. The leading idea of ​​a traditional test is to compare the knowledge of as many students as possible with a minimum number of tasks, in a short time, quickly, efficiently and at the lowest cost.

Essentially, this reflects the idea of ​​the effectiveness of pedagogical activities in the field of knowledge control. I would like to think that there is no one and there is no need to object to this very idea. If our teacher can explain the educational material no worse than his foreign colleague, then it is good to check the required knowledge, for all students, for all the material studied, he is not able to due to the prevailing class-lesson system of classes in our country, the lack of computer equipment, tests and programs for organizing automated self-control - the most humane form of knowledge control. He is physically unable to do this either. Due to, to put it mildly, erroneous social policy, the salaries of our teachers have long been unable to compensate for the expenditure of even the physical energy necessary for good teaching, not to mention the increased expenditure of intellectual energy, which can only be accomplished by thinking that is uninhibited and not preoccupied with the search for bread. As noted in the literature, a qualified worker receives three to four times less than the salary level beyond which normal life activity is disrupted and the destruction of labor potential begins.

Although there are hundreds of examples of test definitions in the literature that are either difficult or impossible to agree with, this does not mean at all that this definition of a traditional test is the ultimate truth. Like all other concepts, it needs constant improvement. It just seems to the author that so far it is more reasoned than some other well-known concepts of the pedagogical test. However, the desire to improve concepts is a completely normal phenomenon and necessary for normally developing practice and science. Constructive attempts to give other definitions of the test or challenge existing ones are always useful, but this is precisely what we lack.

Traditional tests include homogeneous and heterogeneous tests. A homogeneous test is a system of tasks of increasing difficulty, specific form and specific content - a system created for the purpose of an objective, high-quality, and effective method for assessing the structure and measuring the level of preparedness of students in one academic discipline. It is easy to see that, at its core, the definition of a homogeneous test coincides with the definition of a traditional test.

Homogeneous tests are more common than others. In pedagogy, they are created to control knowledge in one academic discipline or in one section of such, for example, a voluminous academic discipline as physics. In a homogeneous pedagogical test, the use of tasks that reveal other properties is not allowed. The presence of the latter violates the requirement of disciplinary purity of the pedagogical test. After all, every test measures something predetermined.

For example, a test in physics measures the test takers' knowledge, skills, and perceptions in this science. One of the difficulties of such a measurement is that physical knowledge is heavily coupled with mathematical knowledge. Therefore, the physics test expertly establishes the level of mathematical knowledge used in solving physics problems. Exceeding the accepted level leads to a bias in the results; as they are exceeded, the latter increasingly begin to depend not so much on knowledge of physics, but on knowledge of another science, mathematics. Another important aspect is the desire of some authors to include in tests not so much a test of knowledge as the ability to solve physical problems, thereby involving the intellectual component in measuring preparedness in physics.

A heterogeneous test is a system of tasks of increasing difficulty, specific form and specific content - a system created for the purpose of an objective, high-quality, and effective method for assessing the structure and measuring the level of preparedness of students in several academic disciplines. Often such tests also include psychological tasks to assess the level of intellectual development.

Typically, heterogeneous tests are used for a comprehensive assessment of school graduates, personality assessment when applying for a job, and for selecting the most prepared applicants for admission to universities. Since each heterogeneous test consists of homogeneous tests, the interpretation of test results is carried out based on the answers to the tasks of each test (here they are called scales) and, in addition, through various methods of aggregating scores, attempts are made to give an overall assessment of the test taker's preparedness.

Let us recall that a traditional test is a method of diagnosing subjects in which they answer the same tasks, at the same time, under the same conditions and with the same score. With this orientation, the task of determining the exact volume and structure of the mastered educational material recedes, of necessity, into the background. The test selects a minimum sufficient number of tasks that allows one to relatively accurately determine, figuratively speaking, not “who knows what,” but “who knows more.” Interpretation of test results is carried out primarily in the language of testology, based on the arithmetic mean, mode or median and on the so-called percentile norms, which show what percentage of subjects have a test result worse than that of any subject taken for analysis with his test score. This interpretation is called normative-oriented. Here the conclusion is complemented by a rating: tasks answers conclusions about the knowledge of the subject rating, understood as a conclusion about the place or rank of the subject.

Integrative tests. An integrative test can be called a test consisting of a system of tasks that meet the requirements of integrative content, a test form, and increasing difficulty of tasks aimed at a generalized final diagnosis of the preparedness of a graduate of an educational institution. Diagnostics is carried out by presenting such tasks, the correct answers to which require integrated (generalized, clearly interrelated) knowledge of two or more academic disciplines. The creation of such tests is given only to those teachers who have knowledge of a number of academic disciplines, understand the important role of interdisciplinary connections in learning, and are able to create tasks, the correct answers to which require students to have knowledge of various disciplines and the ability to apply such knowledge.

Integrative testing is preceded by the organization of integrative training. Unfortunately, the current class-lesson form of conducting classes, combined with excessive fragmentation of academic disciplines, together with the tradition of teaching individual disciplines (rather than generalized courses), will for a long time hinder the implementation of an integrative approach in the processes of learning and monitoring preparedness. The advantage of integrative tests over heterogeneous ones lies in the greater informative content of each task and in the smaller number of tasks themselves. The need to create integrative tests increases as the level of education and the number of academic disciplines studied increases. Therefore, attempts to create such tests are noted mainly in higher education. Integrative tests are especially useful for increasing the objectivity and efficiency of the final state certification of students.

The methodology for creating integrative tests is similar to the methodology for creating traditional tests, with the exception of the work of determining the content of tasks. To select the content of integrative tests, the use of expert methods is mandatory. This is due to the fact that only experts can determine the adequacy of the content of the tasks for the purposes of the test. But, first of all, it will be important for the experts themselves to decide on the goals of education and study of certain educational programs, and then agree among themselves on fundamental issues, leaving for examination only variations in the understanding of the degree of importance of individual elements in the overall structure of preparedness. A selected composition of experts in foreign literature, agreed upon on fundamental issues, is often a panel. Or, given the differences in the meaning of the last word in the Russian language, such a composition can be called a representative expert group. The group is selected to adequately represent the approach used to create the test in question.

Adaptive tests. The feasibility of adaptive control arises from the need to rationalize traditional testing. Every teacher understands that a well-prepared student does not need to give easy or very easy tasks. Because the probability of making the right decision is too high. In addition, lightweight materials do not have noticeable development potential. Symmetrically, due to the high probability of a wrong decision, there is no point in giving difficult tasks to a weak student. It is known that difficult and very difficult tasks reduce the learning motivation of many students. It was necessary to find a comparable, on one scale, measure of the difficulty of tasks and a measure of the level of knowledge. This measure was found in pedagogical measurement theory. The Danish mathematician G. Rask called this measure the word “logit”. After the advent of computers, this measure formed the basis of the adaptive knowledge control methodology, which uses methods to regulate the difficulty and number of tasks presented, depending on the students’ response. If the answer is successful, the computer selects the next task more difficult; if the answer is unsuccessful, the next task will be easier. Naturally, this algorithm requires preliminary testing of all tasks, determining their degree of difficulty, as well as creating a bank of tasks and a special program.

The use of tasks that correspond to the level of preparedness significantly increases the accuracy of measurements and minimizes the time of individual testing to approximately 5 - 10 minutes. Adaptive testing allows for computer-based issuance of tasks at the optimal, approximately 50% level of probability of the correct answer for each student.

In Western literature, three options for adaptive testing are distinguished. The first is called pyramid testing. In the absence of preliminary assessments, all subjects are given a task of average difficulty and only then, depending on the answer, each subject is given a task easier or more difficult; At each step it is useful to use the rule of dividing the difficulty scale in half. In the second option, control begins with any level of difficulty desired by the test subject, with a gradual approach to the real level of knowledge. The third option is when testing is carried out through a bank of tasks divided by difficulty levels.

Thus, an adaptive test is a variant of an automated testing system in which the parameters of difficulty and differentiating ability of each task are known in advance. This system is created in the form of a computer bank of tasks, ordered in accordance with the characteristics of the tasks of interest. The most important characteristic of adaptive test tasks is their level of difficulty, obtained empirically, which means: before getting to the bank, each task undergoes empirical testing on a sufficiently large number of typical students of the population of interest. The words “contingent of interest” are intended to represent here the meaning of the more rigorous concept of the “general population” known in science.

Our widespread educational model of the adaptive school E.A. Yamburg, proceeds, essentially, from the general ideas of adaptive learning and adaptive knowledge control. The origins of this approach can be traced back to the emergence of the pedagogical works of Comenius, Pestalozzi and Disterweg, who were united by the ideas of conformity to nature and the humanity of teaching. The Student was at the center of their pedagogical systems. For example, in A. Disterweg’s little-known work “Didactic Rules” you can read the following words: “Teach in accordance with nature... Teach without gaps... Start teaching where the student left off... Before you start teaching, one must examine the starting point... Without knowing where the student stopped, it is impossible to teach him properly.” Lack of awareness about the real level of knowledge of students and natural differences in their abilities to assimilate the proposed knowledge have become the main reason for the emergence of adaptive systems based on the principle of individualization of learning. This principle is difficult to implement in a traditional, class-lesson form.

Before the advent of the first computers, the most famous system close to adaptive learning was the so-called “Complete Knowledge Assimilation System.”

Criterion-based tests. With a criterion-based approach, tests are created to compare the educational achievements of each student with the amount of knowledge, skills or abilities planned to be acquired. In this case, a specific content area rather than a particular sample of students is used as the interpretative frame of reference. The emphasis is on what the student can do and what he knows, rather than on how he compares to others.

There are also difficulties with the criterion-oriented approach. As a rule, they are associated with the selection of test content. Within the framework of the criterion-referenced approach, the test tries to reflect the entire content of the controlled course, or at least what can be taken as this full volume. The percentage of correct completion of tasks is considered as the level of preparation or as the degree of mastery of the total volume of course content. Of course, within the framework of a criterion-oriented approach, there is every reason for the latter interpretation, since the test includes everything that can be conditionally accepted as 100%.

Criteria-based tests cover a fairly wide range of tasks. In particular, they help to collect complete and objective information about the educational achievements of each student individually and a group of students; compare the student’s knowledge, skills and abilities with the requirements laid down in state educational standards; select students who have reached the planned level of preparedness; assess the effectiveness of the professional activities of individual teachers and groups of teachers; evaluate the effectiveness of various training programs.

An emphasis on a content-based approach may have a beneficial effect on teacher testing as a whole. This approach benefits, for example, the interpretation of test scores during ongoing monitoring. The student receives information not about how he looks compared to others, but about what he can do and what he knows in comparison with the given requirements for the level of training in the subject. Of course, such an interpretation does not exclude a combination with the attribution of results to norms, which, as a rule, occurs during the ongoing monitoring of students’ knowledge in the everyday educational process. In this case, testing is integrated with learning and helps the student identify possible difficulties, as well as timely correct errors in mastering the content of educational material.

  1. Forms of test tasks

In modern testing (Avanesov V.S., Chelyshkova M.B., Mayorov A.N., etc.) there are 4 types of tasks in test form: tasks for choosing one or more correct answers, tasks in open form or for addition, tasks to establish the correct sequence and tasks to establish correspondences. The most common is the first form.

Let us consider in detail each form of tasks according to V.S.’s classification. Avanesova.

Tasks for choosing one or more correct answers are most suitable for computer testing of knowledge. It is convenient to divide such tasks into the following types: tasks with two, three, four, five and more answers. The instruction for this form of tasks is the sentence: “Circle (check, indicate) the number of the correct answer.”

Example 1. Mark the number of the correct answer.

The place occupied by a digit in a number is called

    position;

    discharge;

    position;

    acquaintance.

The task should be formulated briefly and clearly, so that its meaning is clear upon first reading.

The content of the task is formulated as clearly and as briefly as possible. Brevity is ensured by a careful selection of words, symbols, and graphics, allowing the minimum of means to achieve maximum clarity of the meaning of the task. It is necessary to completely eliminate repetitions of words, the use of obscure, rarely used, as well as symbols unknown to students, and foreign words that make it difficult to perceive the meaning. It is good when the task contains no more than one subordinate clause.

To achieve brevity, it is better to ask about one thing in each task. Making tasks heavier with demands to find something, solve it, and then explain it has a negative impact on the quality of the task, although from a pedagogical point of view it is easy to understand the reason for this formulation.

It’s even better when both the task and the answer are short. An incorrect but plausible answer in American test literature is called a distractor (from the English verb to distract - to distract). In general, the better the distractors are selected, the better the task. The developer's talent manifests itself primarily in the development of effective distractors. It is usually believed that the higher the percentage of incorrect answer choices, the better formulated it is. It should be noted that this is only true up to a certain extent; In pursuit of the attractiveness of distractors, a sense of proportion is often lost. The attractiveness of each answer is empirically tested.

Tasks with a choice of one or more answers are the most criticized form. Proponents of conventional approaches argue that knowledge can only be truly tested in the process of direct communication with the student, asking him clarifying questions, which helps to better clarify the true depth, strength and validity of knowledge. One must agree with such statements. However, there are also issues of saving the living labor of teachers and students, saving time costs and the problem of increasing the efficiency of the educational process.

It is often believed that finding the right answer is much easier than formulating it yourself. However, in well-done tasks, incorrect answers often seem more plausible to an unknowing student than correct ones. The test developer's talent is revealed in the process of creating precisely incorrect, but very plausible answers. Another objection is that a test task with a choice of one or more correct answers is only suitable for assessing knowledge at the so-called lower level.

A variant of tasks with the choice of one, the most correct answer from among those proposed, is highlighted. The instructions for such tasks are written accordingly: “Circle the number of the most correct answer.” Naturally, it is assumed that all other answers to the tasks are correct, but to varying degrees.

There are three reasons for introducing such tasks into practice.

The first is the old idea of ​​excluding incorrect answers from tasks, which weak students can supposedly remember. If we follow this very controversial thesis, then incorrect answers cannot be given during testing at all.

The second reason for introducing such tasks into practice is more realistic. It concerns the need to develop in students not only the ability to distinguish correct answers from incorrect ones, but also the ability to differentiate the measure of correctness of answers. This is really important, both in general secondary and higher vocational education.

The third reason for using tasks with choosing the most correct answer is the desire to use them to check the completeness of knowledge.

No matter how convincing the reasons for introducing such tasks into practice are, the latter are unlikely to find wide application.

In open-form tasks, ready-made answers are not given: the test taker must come up with or receive them himself. Sometimes, instead of the term “open-form tasks,” the terms “tasks for addition” or “tasks with a constructed answer” are used. For an open form, it is customary to use instructions consisting of one word: “Add”.

Example 2. Add.

In the binary number system 10-1=_________.

Addition tasks come in two noticeably different types:

1) with restrictions imposed on answers, the possibility of obtaining which is appropriately determined by the content and form of presentation;

2) tasks with a freely constructed answer, in which it is necessary to compose a detailed answer in the form of a complete solution to the problem or give an answer in the form of a micro-essay.

In tasks with restrictions, it is determined in advance what is clearly considered the correct answer, and the degree of completeness of the answer is set. Usually it is quite short - one word, number, symbol, etc. Sometimes - longer, but not exceeding two or three words. Naturally, the regulated brevity of answers puts forward certain requirements for the scope of application, so tasks of the first type are mainly used to assess a fairly narrow range of skills.

A distinctive feature of tasks with restrictions on complementary answers is that they must generate only one correct answer, planned by the developer.

Tasks of the second type with a freely constructed answer do not have any restrictions on the content and form of presentation of answers. For a certain time, the student can write whatever and however he wants. However, the careful formulation of such tasks presupposes the presence of a standard, which is usually the most correct answer with characteristics and signs of quality that describe it.

In assignments to establish correspondence, the teacher checks knowledge of the connections between elements of two sets. Elements for comparison are written in two columns: on the left are usually the elements of the defining set containing the statement of the problem, and on the right are the elements to be selected.

The tasks are given standard instructions: “Match the correspondence.”


Example 3: Match

a B C) - _____________.

It should be noted that it is desirable that there be more elements in the right column than in the left. In this situation, certain difficulties arise associated with the selection of plausible redundant elements. Sometimes for one element of the left set it is necessary to select several correct answers from the right column. In addition, the correspondences can be extended to three or more sets. The effectiveness of the task is significantly reduced if implausible options are easily distinguished even by ignorant students.

The effectiveness of the task is also reduced in cases where the number of elements in the left and right columns is the same and there is simply nothing to choose from when establishing a match for the last element on the left. The last correct or incorrect match is established automatically by sequentially eliminating elements for previous matches.

Test tasks to establish the correct sequence are designed to assess the level of proficiency in the sequence of actions, processes, etc. In tasks, actions, processes, and elements related to a specific task are given in an arbitrary, random order. The standard instructions for these tasks are as follows: “Establish the correct sequence of actions.”

Example 4: Get the sequence right

The full branch command on UAY has the format:

    otherwise<серия 2>

    That<серия 1>

    If<условие>

Tasks on establishing the correct sequence receive friendly support from many teachers, which is explained by the important role of ordered thinking and activity algorithms.

The purpose of introducing such tasks into the educational process is the formation of algorithmic thinking, algorithmic knowledge, skills and abilities.

Algorithmic thinking can be defined as an intellectual ability that manifests itself in determining the best sequence of actions when solving educational and practical problems. Typical examples of such thinking are the successful completion of various tasks in a short time, the development of the most effective computer program, etc.

The choice of task forms is determined by many very contradictory factors, including the specifics of the content, testing goals, and also the specifics of the test population. Checking is easier when using closed-form tasks, however, such tasks are less informative. Open-form tasks are more informative, but it is more difficult to organize their verification. An even more difficult task is to create computer programs to check the correctness of answers to such tasks. This is due to the richness of the subjects’ vocabulary (synonyms can be used when answering), attentiveness (typos, case mismatch), etc.

To successfully navigate the task forms, you can use a special table (see Table 1) for a comparative analysis of tasks, proposed by M.B. Chelyshkova.

According to the developer, this table is purely indicative; however, its use can facilitate the process of selecting test items of various forms to solve certain diagnostic problems.


Table 1

Comparative analysis of test task characteristics

Characteristics Closed form tasks Complementation tasks Compliance tasks Sequencing tasksChecking your knowledge of facts Passable Passable Passable PassableApplication of knowledge according to the model Passed Passed Passed PassedApplication of knowledge in non-standard situations Unsuitable Passable Unsuitable PassableEase of design Yes Yes No NoException guessing Not excluded Not excluded Not excluded Not excludedObjectivity of assessment Yes No Yes YesElimination of typos No Yes No NoPossibility of original answer No Yes Yes/No No

Compliance of tasks in test form with the requirements of pedagogical correctness of content and form are necessary but not sufficient conditions for calling them tests.

The transformation of tasks in test form into test tasks begins from the moment of statistical verification of each task for the presence of test-forming properties.

  1. Empirical verification and statistical processing of results

The presence of a sufficient number of test tasks allows us to move on to developing the test as a system with integrity, composition and structure. At the third stage, tasks are selected and tests are created, the quality and effectiveness of the test is improved.

The integrity of the test is formed by the relationship between the test takers’ responses to the test tasks and the presence of a common measurable factor that influences the quality of knowledge.

The composition of the test forms the correct selection of tasks, allowing the minimum required number to reflect the essential elements of the language competence of the test takers.

The level and structure of knowledge are revealed by analyzing the answers of each test taker to all test items. The more correct answers, the higher the individual test score of the subjects. Typically, this test score is associated with the concept of “level of knowledge” and undergoes a clarification procedure based on one or another model of pedagogical measurement. The same level of knowledge can be obtained by answering different tasks. For example, in a test of thirty items, the subject received ten points. These points are most likely obtained through correct answers to the first ten, relatively easy tasks. The sequence of ones and then zeros inherent in such a case can be called the correct structure of the subject’s preparedness. If the opposite picture is revealed, when the subject answers correctly to difficult tasks and incorrectly to easy ones, then this contradicts the logic of the test and therefore such a knowledge profile can be called inverted. It occurs rarely, and most often, due to the error of the test, in which the tasks are arranged in violation of the requirements of increasing difficulty. Provided that the test is done correctly, each profile indicates the structure of knowledge. This structure can be called elementary (since there are also factor structures that are identified using factor analysis methods).

To determine the level of structuredness of preparedness, you can use the L. Gutman coefficient, previously inaccurately called a measure of “test reliability”.


where r g structuring coefficient;.

The level of knowledge largely depends on personal efforts and abilities, while the structure of knowledge significantly depends on the correct organization of the educational process, on the individualization of training, on the skill of the teacher, on the objectivity of control - in general, on everything that is usually lacking. The path to achieving this ideal lies through the difficulties of creating quality tests.

The development of tests begins with an analysis of the content of the taught knowledge and mastery of the principles of formulating test tasks. Unfortunately, tests are still looked at as something that is easy to come up with, while the strength of tests is their effectiveness, which stems from theoretical and empirical validity.

At the third stage, the developers of the new generation of tests will need some mathematical and statistical training and knowledge of test theory. Test theory can be defined as a set of consistent concepts, forms, methods, axioms, formulas and statements that help improve the efficiency and quality of the test process. In addition, some experience in using multivariate statistical analysis methods and experience in correctly interpreting test results may be required.

The question often arises: “How will the deleted tasks behave in other groups of subjects?” The answer depends on the quality of the selection of groups, or more precisely on the statistical plan for forming sample populations. The correct answer to this question should be sought in the sense of the concept of “target group”; this is the set of subjects in the population for whom the test being developed is intended.

Accordingly, if the tasks of the designed test behave differently in different groups, then this is most likely an indication of errors in the formation of samples of subjects. The latter should be as homogeneous as the subjects in the target group. In statistical language, this means that subjects in the target and experimental groups must belong to the same general population.

Logarithmic estimates, called logits, of such seemingly truly disparate phenomena as the level of knowledge of the subject with the level of difficulty of each task, were used to directly compare the level of difficulty with the level of preparedness of the subject.

According to Bespalko V.P. and Tatur Yu.G., testing should be a measurement of the quality of assimilation of knowledge, skills and abilities. Comparing the rules for completing a task (task) proposed in the text with the standard answer allows us to determine the coefficient of knowledge assimilation (K us). It should be noted that , where A is the number of correct answers, and P is the number of tasks in the proposed tests.

Definition K us is an operation for measuring the quality of knowledge acquisition. K us can be normalized (0< К us < 1), процедура же контроля усвоения легко автоматизируется. По коэффициенту судят о завершенности процесса обучения: если К us >0.7, then the learning process can be considered complete. When mastering knowledge with K us ≤ 0.7, a student systematically makes mistakes in his professional activities and is unable to correct them due to his inability to find them. The lower acceptable limit for completing the training process is increased to the value necessary from the point of view of operational safety.

  1. Principles of content selection. Criteria for assessing test content

When creating a test, the developer’s attention is primarily drawn to the issues of content selection, which can be defined as the optimal reflection of the content of an academic discipline in the system of test tasks. The requirement of optimality presupposes the use of a certain selection methodology, including issues of goal setting, planning and assessment of the quality of the test content.

The goal-setting stage is the most difficult and at the same time the most important: the quality of the test content primarily depends on the results of its implementation. In the process of goal setting, the teacher needs to decide what student results he wants to evaluate using the test.

The reasons for errors in a teacher’s conclusions are not always related to the technological shortcomings of traditional means of control. Sometimes they are caused by shortcomings of the teacher at the goal-setting stage, when the center of gravity of the test shifts to secondary learning goals, and sometimes the goal-setting stage is absent altogether, since some teachers are confident in the infallibility of their experience and intuition, especially if they have worked in school for many years. However, no even very advanced control methods and no experience will provide grounds for reliable conclusions about the achievement of learning goals until there is confidence in the correct setting of control goals and in their correct, unbiased display in the content of the test.

When creating a test, the task is to reflect in its content the main thing that students should know as a result of learning, so it is impossible to limit oneself to a simple listing of learning goals. I would like to include everything in the test, but, unfortunately, this is impossible, so some of the goals have to be simply discarded and the degree to which students have achieved them is not checked. In order not to lose the most important thing, it is necessary to structure the goals and introduce a certain hierarchy in their relative arrangement. Without a doubt, there are not and cannot be ready-made general recipes, since each discipline has its own priorities. In addition, individual goals are noticeably interconnected, and therefore a simple idea of ​​a system of goals as an ordered set without considering the connections between elements is clearly not enough.

Once the test objectives have been determined and specified, a test plan and specification must be developed.

When developing a plan, an approximate breakdown of the percentage of content of the sections is made and the required number of tasks is determined for each section of the discipline based on the importance of the section and the number of hours allocated for its study in the program.

The layout begins by calculating the planned initial number of tasks in the test, which will then be repeatedly changed in the direction of increasing or decreasing during the process of working on the test. Typically, the maximum number does not exceed 60 - 80 tasks, since the testing time is chosen in the range of 1.5 - 2 hours, and on average no more than 2 minutes are allocated to complete one task.

After completing the first step of content planning, a test specification is developed, which fixes the structure, content of the test and the percentage of tasks in the test. Sometimes the specification is made in a detailed form, containing indications of the type of tasks that will be used to assess student achievements in accordance with the intended purposes of creating the test, test completion time, number of tasks, features of testing that may affect the characteristics of the test, etc.

The specification in expanded form includes:

    the purpose of creating the test, justification for the choice of approach to its creation, description of possible areas of application of the test;

    a list of normative documents used when planning the content of the test;

    description of the general structure of the test, including a list of subtests (if any) indicating approaches to their development;

    the number of tasks of various forms, indicating the number of answers to closed tasks, the total number of tasks in the test;

    the number of parallel test options or a link to a cluster containing the number and numbers of cluster tasks;

    the ratio of tasks in various sections and types of educational activities of schoolchildren;

    coverage of standards requirements (for certification tests);

    list of requirements not included in the test (for certification tests);

Knowledge and skills are divided as follows:

A – knowledge of concepts, definitions, terms;

B – knowledge of laws and formulas;

C – ability to apply laws and formulas to solve problems;

D – ability to interpret results on graphs and diagrams;

E – ability to make value judgments.

The following proportions are often established:


A – 10%, B – 20%, C – 30%, D – 30%, E – 10%.

In addition to the criteria, there are general principles that contribute to a certain extent to the correct selection of test content.

The principle of representativeness regulates not only the completeness of the display, but also the significance of the content elements of the test. The content of the tasks should be such that the answers to them can be used to draw a conclusion about knowledge or ignorance of the entire program of the section or course being tested.

The principle of consistency involves the selection of content elements that meet the requirements of consistency and are interconnected by the general structure of knowledge. If the principle of consistency is observed, the test can be used to identify not only the amount of knowledge, but also to assess the quality of the structure of students’ knowledge.

After selecting the test content, the most important stage of creating pre-test tasks begins. This work is usually entrusted to the most experienced teachers with extensive experience in the school. However, experience alone is not enough to create tasks. Special knowledge of the theory and methodology of developing pedagogical tests is also required, providing a professional approach to the creation of pre-test tasks.

V.S. Avanesov identified 3 criteria for selecting the content of test tasks:

1) certainty of the content of the test;

2) consistency of the content of tasks;

3) validity of the content of test tasks.

1. The certainty of the content of the test forms the subject of pedagogical measurement. In the case of a homogeneous test, the question arises of confidence that all test items test knowledge in a particular academic discipline, and not in some other. Quite often it happens that the correct answers to some tasks require knowledge not only of the discipline of interest, but also of a number of other, usually related and previous academic disciplines. The proximity and connectedness of which makes it difficult to accurately determine the subject matter of the knowledge being measured.

For example, in physical calculations a lot of mathematical knowledge is used and therefore the mathematics that is used in solving physical problems is usually included in the system of physical knowledge. Failure in math calculations results in failure in answering physics test items. A negative score is given, accordingly, for ignorance of physics, although the subject made mathematical errors. If such a test includes many tasks that, for correct solution, require not so much physical knowledge as the ability to perform complex calculations, then this may be an example of an inaccurately defined content of a physics test. The less the overlap between the knowledge of one academic discipline and the knowledge of another, the more clearly the content of the academic discipline is expressed in the test. Specificity of content is required in all other tests. In a heterogeneous test, this is achieved by explicitly separating tasks from one academic discipline into a separate scale. At the same time, there are often tasks that work well not only on one, but also on two, three and even more scales.

In any test task, it is determined in advance what is clearly considered an answer to the task, and with what degree of completeness the correct answer must be. It is not allowed to define a concept by listing elements that are not included in it.

2. The consistency of the content of tasks requires that judgments that simultaneously affirm and deny it do not arise regarding the same thought. The existence of two exclusive answers to the same test item is unacceptable. If test takers are instructed to “circle the number of the correct answer,” and then one of the answers states that there is no correct answer, this creates an example of inconsistency in the thinking of the test designer. In some tests there are answers that are not at all related to the content of the task. Such answers are quite easily recognized by subjects as erroneous, and therefore the test turns out to be ineffective. To increase efficiency, the test is first tested on a typical sample of subjects. And if such answers are discovered to tasks that the subjects do not choose at all, then such answers are removed from the test. Because they do not perform the function of so-called distractors, designed to divert the attention of unaware subjects from the correct answer. In addition, such distractors are harmful to the test, because they reduce the accuracy of measurements (but this will be discussed in articles where issues of test reliability will be discussed).

3. The validity of the content of test items means that they have a basis for truth. Validity is related to the arguments that can be given in favor of one or another formulation of test items. If there are no evidentiary arguments in favor of the correctness of the formulated task, it is not included in the test, under any pretext. The same happens if during the expert discussion at least one counterargument arises, or a condition is allowed under which a given statement may turn out to be ambiguous or false. The idea of ​​validity of test content is closely intertwined with the principle of substantive correctness of test items, as already discussed in the previous article. Let us recall that the test includes only that content of the academic discipline that is objectively true and that lends itself to some rational argumentation. Accordingly, controversial points of view, which are quite acceptable in science, are not recommended to be included in the content of test tasks.

The falsity of the content of test items differs from the incorrectness of their formulation. Untruth, as noted above, is determined by the corresponding answer, while an incorrectly formulated task can produce both correct and incorrect answers, and even cause confusion. This also includes inaccurately or ambiguously formulated tasks that generate several correct or conditionally correct answers. Hence the need arises to introduce additional truth conditions, which lengthens the task itself and complicates its semantics. Incorrectness of the formulation is usually clarified in the process of discussing the content of assignments with experienced expert teachers. The success of such a discussion is possible by creating an appropriate cultural environment where only constructive and tactful judgments are acceptable. Alas, experience convinces us that this does not happen often. Meanwhile, only a joint and friendly discussion of materials by developers and experts can create an atmosphere of searching for the best options for test content. This search is almost endless, and there is no ultimate truth here.

  1. The relationship between the form of the task and the type of knowledge, skills and abilities being tested

As mentioned in previous articles, for testing purposes, knowledge can be divided into three types: offered, acquired and tested. Now let's look at this issue in a little more detail.

The knowledge offered is given to students in the form of textbooks, materials, texts, lectures, stories, etc., reflecting the main part of the educational program. This knowledge is also formulated in a system of tasks, according to which students themselves can check the degree of their preparedness.

The knowledge acquired by students is usually only a part of the knowledge offered, more or less, depending on the learning activity of the students. With the development of computer training, conditions have emerged for the volume of acquired knowledge to exceed the volume of knowledge offered. This is a new situation associated with the possibilities of mass immersion of students in the global educational space, in which the leading role of tasks in the process of acquiring knowledge is already quite well understood. Solving educational tasks is the main incentive for intensifying learning and students’ own activities. This activity can take place in the form of work with a teacher, in a group or independently. Discussions about levels of assimilation common in the literature refer exclusively to acquired knowledge.

The knowledge being tested forms the main content of the document, which may be called an exam or testing program, depending on the chosen form of knowledge control. The main feature of the knowledge being tested is its relevance, which means the test subjects’ readiness for the practical application of knowledge to solve tasks used at the time of testing. In higher education, this same feature is sometimes called the efficiency of knowledge.

In the process of testing schoolchildren and applicants, only knowledge that is in RAM is usually tested, that which does not require reference to reference books, dictionaries, maps, tables, etc. Among the knowledge being tested, one can also highlight normative knowledge, which is subject to mandatory assimilation by students and subsequent control by educational authorities through an expertly selected and approved by the governing body system of assignments, tasks and other control materials.

In addition, the properties of knowledge are highlighted. IN AND. Ginetsinsky identifies the following properties of knowledge:

 reflexivity (I not only know something, but also know that I know it);

 transitivity (if I know that someone knows something, then it follows that I know this something);

 antisymmetry (if I know someone, this does not mean that he knows me).

Classification of types and levels of knowledge

Classifications of types and levels of knowledge formulated by Bloom to solve practical problems of pedagogical measurement.

    Knowledge of names. Socrates said: whoever comprehends names will also comprehend what these names belong to. As the famous foreign philosopher J. Austin notes, knowledge of an object or phenomenon is largely determined by whether we know its name, or more precisely, its correct name.

    Knowing the meaning of titles and names. It has long been known that as we understand, so we act. Understanding the meaning of names and titles helps them to be remembered and used correctly. For example, with the name "Baikal" some of junior schoolchildren may not think about the famous lake, the pearl of Russia, but about the fruit water sold under the same name. Another example can be taken from the field of political consciousness. As Yu.N. rightly notes in his book. Afanasyev, A.S. Stroganov and S.G. Shekhovtsev, the consciousness of former Soviet people turned out to be unable to see the various meanings of such abstractions of language as “freedom”, “power”, “democracy”, “state”, “people”, “society”, considering them as if clear by default. Which was one of the reasons that made it possible, with the active complicity of these people, to destroy their own life support system.

    Factual knowledge. Knowing the facts allows you to avoid repeating mistakes, your own and those of others, and to enrich the evidence base of knowledge. They are often recorded in the form of scientific texts, observational results, recommendations such as safety precautions, worldly wisdom, sayings, sayings. For example, from Ancient China came the saying of the Chinese thinker Ju Xi: do not boil sand in the hope of getting porridge.

    Knowledge of definitions. The weakest point in school education is because definitions cannot be taught; they can be understood and assimilated only as a result of independent efforts to master the required concepts. Knowledge of the definition system is one of the best evidence of theoretical preparedness. In the educational process, all four types of knowledge considered can be combined into a group of reproductive knowledge. As noted by I.Ya. Lerner, over the years of schooling, students complete over 10 thousand tasks. The teacher is forced to organize reproductive activity, without which the content is not initially absorbed.

This is knowledge that does not require noticeable transformation when assimilated, and therefore it is reproduced in the same form in which it was perceived. They can, with some convention, be called first-level knowledge.

    Comparative, comparative knowledge. They are widespread in practice and in science, and are characteristic mainly of intellectually developed individuals, especially specialists. They are able to analyze and choose the best options for achieving a particular goal. As N. Kuzansky noted, “all researchers judge the unknown by means of a commensurate comparison with something already familiar, so everything is studied in comparison.”

    Knowledge of opposites, contradictions, antonyms, etc. objects. Such knowledge is valuable in training, especially at the very beginning. In some areas, such knowledge is essential. For example, in a school life safety course, you need to know exactly what students can do and what they cannot do, under any circumstances.

    Associative knowledge. They are characteristic of an intellectually developed and creative person. The richer the associations, the more conditions and higher the likelihood for creativity. To a large extent, it is on the wealth of associations that the linguistic culture of the individual, writing, and the work of artists, designers, and workers in other creative professions are built.

    Classification knowledge. Mainly used in science; Examples - Linnaeus' classifications, D.I.'s periodic system of elements. Mendeleev, test classifications, etc. Classification knowledge is generalized, systemic knowledge. This type of knowledge is inherent only to persons with sufficient intellectual development, as it requires developed abstract thinking, a holistic and interconnected vision of the totality of phenomena and processes. A knowledge system is, first of all, possession of effective definitions of the basic concepts of the sciences being studied.

Knowledge pp. 5-8 can be classified as the second level. Such knowledge allows students to solve standard tasks as a result of subsuming each specific task under the known classes of phenomena and methods being studied.

    Causal knowledge, knowledge of cause-and-effect relationships, knowledge of foundations. As W. Shakespeare wrote, the time for the inexplicable is over; everything has to be found for reasons. In modern science, causal analysis is the main direction of research. As L. Wittgenstein noted, they say “I know” when they are ready to give undeniable reasons.

    Procedural, algorithmic, procedural knowledge. They are fundamental in practical activities. Mastery of this knowledge is an essential sign of professional preparedness and culture. This group also includes technological knowledge that makes it possible to inevitably obtain the planned result.

    Technological knowledge. This knowledge represents a special type of knowledge that manifests itself in different levels preparedness. This can be relatively simple knowledge about a separate operation of the technological chain, or a set of knowledge that will certainly allow you to achieve your goals at the lowest possible cost.

Knowledge pp. 9-11 can be classified as knowledge of a higher, third level. They are acquired mainly in the system of secondary and higher vocational education.

The highest, fourth level of knowledge includes the following types of knowledge:

    Probabilistic knowledge. Such knowledge is needed in cases of uncertainty, lack of available knowledge, inaccuracy of available information, and, if necessary, to minimize the risk of error when making decisions. This is knowledge about the patterns of data distribution, the reliability of differences, and the degree of validity of hypotheses.

    Abstract knowledge. These special kind knowledge, in which they operate with idealized concepts and objects that do not exist in reality. There are many such objects in geometry, natural science, and in those social sciences that in the West are called behavioral - psychology, sociology, pedagogy. Probabilistic, abstract and special scientific knowledge in each individual discipline, knowledge forms the basis of theoretical knowledge. This is the level of theoretical knowledge.

    Methodological knowledge. This is knowledge about methods of transforming reality, scientific knowledge about building effective activities. This is knowledge of the highest, fifth level.

The listed types of knowledge do not yet form a complete classification system and therefore allow for the possibility of a noticeable expansion of the presented nomenclature, replacing some types of knowledge with others, and combining them into various groups.

Each of the listed types of knowledge is expressed by the corresponding form of test tasks.

To determine the degree of training in each academic discipline, the amount of knowledge that is necessary for mastering according to the curriculum is identified, which constitutes the basic amount of knowledge. Basic knowledge represents the minimum state educational standard. However, among the basic knowledge, those that must remain in memory in any discipline are distinguished; together they form worldview knowledge. BOO. Rodionov and A.O. Tatur (MEPhI testing center) distinguishes several parts of worldview knowledge: basic knowledge, program knowledge, super-program knowledge. Pedagogical tests are the only tool that allows not only to measure learning, but also the ability to use knowledge. If we talk only about skills, then at all levels of knowledge acquisition we can distinguish four types of skills:

1) the ability to recognize objects, concepts, facts, laws, models;

2) the ability to act according to a model, according to a known algorithm, rule;

3) the ability to analyze a situation, isolate the main thing and build procedures from mastered operations that make it possible to obtain a solution to a test task;

4) the ability and ability to find original solutions.

Four types of skills, named by B.U. Rodionov and A.O. Tatur, do not contradict the theory of the gradual formation of mental actions, which is based on the method of developing automated testing for the purpose of assessing the assimilation of knowledge, the acquisition of skills and abilities. This makes it possible to create not only expert systems for assessing the degree of student learning, but also to build a flexible, dynamic rating system for monitoring knowledge.

According to the most common classification of pretest tasks in domestic and foreign literature, there are:

Multiple choice items in which students select the correct answer from a given set of answers;

Constructed response tasks that require the student to independently obtain answers;

Tasks to establish correspondence, the implementation of which is associated with identifying correspondence between elements of two sets;

Correct sequencing tasks in which the student is required to indicate the order of elements, actions or processes listed in the condition.

The proposed four forms of test tasks are the main and most common, but there is no reason to make them absolute. Often the specific content of the controlled subject requires the use of new forms that are more adequate to the purposes of test development. Typically, such innovations are built on the basis of a combination of individual elements of the listed basic forms.

Regardless of the form, the tasks in the test must comply with the general requirements:

Each task has its own serial number, which can change after a statistical assessment of the difficulty of the task and the choice of a strategy for presenting test tasks;

Each task has a standard for the correct answer (grading standard for tasks with a freely constructed answer);

All elements in the task are located in clearly defined places, fixed within the chosen form;

For tasks, standard instructions for completion are developed, which do not change within each form and precede the formulation of tasks in the test;

For each task, a rule for assigning a dichotomous or polytomous assessment is developed, common to all tasks of the same form and accompanied by verification instructions with standardized procedures for calculating raw (primary) test scores.

The test measurement process is extremely standardized if:

No student is given any advantage over others;

A pre-developed scoring system is applied to all student responses without exception;

The test includes tasks of the same form or different forms with regulated weighting coefficients, the values ​​of which are obtained statistically;

Testing of different groups of subjects is carried out at the same time under similar conditions;

The group of test takers is aligned according to motivation;

All subjects perform the same tasks.

The last condition does not exclude the possibility of cheating, hints and other violations, so usually they try to create several versions of one test that are parallel in content and difficulty. In general, the choice of the form of tasks and the number of test options depend on the content of the controlled course, the goals of control, and the required level of test reliability. In particular, during certification they try to include more multiple-choice tasks, since due to their high technology and thanks to automated verification procedures, they can increase the amount of content covered in the test, the length of the test, the reliability and content validity of the results of pedagogical measurements.

5.2. Tasks with the choice of one or more correct answers

In tasks with a choice (or closed tasks - a name used in some domestic literature of a methodological nature), one can distinguish the main part containing the statement of the problem and ready-made answers formulated by the teacher. Among the answers, most often only one is correct, although other options with the choice of several correct answers, including to varying degrees, are not excluded.

Incorrect but plausible answers are called distractors. If there are two answers in a task, one of which is a distractor, then the probability of randomly selecting the correct answer by guessing is 50%. The number of distractors is chosen so that the task does not become too cumbersome and difficult to read, but at the same time they try to prevent too high a probability of guessing the correct answer. Therefore, most often there are 4 or 5 distractors in tasks, although in some cases, when there is such a need, their number can reach 6–7.

Tasks with two answers are usually used for express diagnostics, for example, in automated control and training programs for entering a training module, during adaptive testing, or for self-control, when the test taker needs to quickly identify gaps in his own knowledge. The use of tasks with two and three answers in the final control leads to an increase in measurement error due to guessing, so they are never included in certification tests, where, for greater reliability, all tasks with the same number of answers are placed.

If distractors are formulated incorrectly, without the slightest appeal even for the weakest subjects in the group, then they cease to fulfill their function, and in fact the task turns out not with the planned one, but with a smaller number of answers. In the worst case, when all distractors in a task fail, most students will complete the task correctly, choosing the only plausible correct answer. Ideally, each distractor should equally attract all subjects who choose the incorrect answer. The measure of attractiveness of distractors is assessed after the first testing of the test on a representative sample of subjects by calculating the proportion of students who chose each of the distractors as the correct answer. Of course, exact equality of shares is a certain idealization, practically unattainable with empirical testing, but nevertheless, when creating tasks, one must strive for this equality.

An in-depth analysis of the frequency of choice of each distractor by students with different levels of preparedness allows us to draw a conclusion about the validity of incorrect answers. If a distractor more often attracts weak students who completed only a small number of tasks in the test correctly, then it is considered valid. Otherwise, when a distractor seems attractive mainly to strong students, its validity is low and the task must be reworked. In general, we can say that a test task is considered to be “working well” if knowledgeable students perform it correctly, and ignorant students choose any of the distractors with equal probability.

If testing is carried out using forms, then tasks with the choice of one correct answer are accompanied by the instruction: “CIRCLE THE NUMBER (LETTER) OF THE CORRECT ANSWER.”

Tasks with several correct answers are usually used in ongoing control to test classification and factual knowledge, although there are cases when the specific content of the discipline forces them to be included in the final tests. They are accompanied by special instructions emphasizing the need to select all the correct answers and having the form: “CIRCLE THE NUMBERS OF ALL CORRECT ANSWERS.”

When there are too few distractors and there are many more correct answers, it is easy to guess them. As a way out of this situation, you can include only one incorrect answer in the number of answers, and ask students to choose one incorrect answer, if this does not contradict the didactic goals of control and is allowed by the content of the subject. In this case, the instructions look like: “CIRCLE THE NUMBER OF THE INCORRECT ANSWER.”

Sometimes, according to the author’s intention, when developing a task, several correct answers are included, among which there is a more correct one and a less preferable one. In this case, the task is accompanied by the instruction: “CIRCLE THE NUMBER OF THE MOST CORRECT ANSWER.”

When issuing tasks on a computer, the instructions may look like: “TO ANSWER, PRESS THE KEY WITH THE NUMBER (LETTER) OF THE CORRECT ANSWER.”

Typically, if all tasks are formulated in the same form, then the instructions are given at the beginning of the test. Otherwise, when the test includes items of different forms, the instructions change each time the form changes. It's easy to imagine how difficult it would be to alternate instructions to choose correct and incorrect answers. Inattentive students who cannot concentrate on changing instructions will inevitably get confused and complete some of the tasks incorrectly, even when they probably know the correct answer. Therefore, it is recommended to change the instructions in the test as rarely as possible - exactly as many times as required by the strategy for presenting test items.

Tasks with choice have a number of advantages related to the speed of their completion, the ease of calculating final test scores, the ability to automate procedures for checking student answers and the resulting minimization of the subjective factor when assessing test results. With their help, it is possible to more fully cover the content of the subject being tested and, consequently, increase the content validity of the test. The undoubted advantage of the choice task form is its versatility; it is suitable for almost any subject.

Among the disadvantages of choice tasks is the guessing effect, which is typical for poorly prepared test takers when answering the most difficult test items. Although the possibility of guessing does exist, testologists have learned to combat it using various methods. Sometimes special instructions are introduced that direct subjects to skip an unfamiliar task instead of answering by guessing. In other cases, special weighting coefficients close to zero are added in calculating the scores of weak students obtained on the most difficult test items. Sometimes a special formula is used to correct individual scores, adjusted for guesswork. The last method and the formula that explains it are given at the end of this chapter.

Certain difficulties arise when using choice tasks to test productive-level skills associated with the application of knowledge in an unfamiliar situation, creative aspects of preparation, and cases when it is necessary to transform the conditions of the task assigned to students. Then tasks with a choice of ready-made answers are most often impossible to use. In the case of mass certification testing, when it is necessary to use effective computerized technologies to calculate test scores and obtain high objectivity of the results of pedagogical measurement, the advantages of choice tasks clearly outweigh the disadvantages. Therefore, this form often dominates the development of final certification tests.

Multiple-choice tasks must satisfy a number of requirements, the fulfillment of which can improve the quality of the test:

Any ambiguity or unclear wording must be eliminated in the text of the assignment;

The main part of the task is formulated extremely briefly, preferably no more than one sentence of 7-8 words;

The syntactic design of the task is extremely simplified without compromising the correctness of the content and its unambiguous understanding by students;

The main part of the task includes most of the conditions of the task, and leaves no more than 2-3 of the most important ones for the answer. keywords for the problem formulated in the condition;

All answers to one task must be approximately the same length, or the correct answer may be shorter than others, but not in all test tasks;

All verbal associations that contribute to choosing the correct answer using a guess are excluded from the text of the task;

The frequency of choosing the place number for the correct answer in different test tasks should be approximately the same, or the place number for the correct answer is chosen randomly;

All distractors for each task should be equally likely to be attractive to subjects who do not know the correct answer.

When developing tasks, it is necessary to ensure their relative independence, excluding chain execution logic, when the answer from one task serves as a condition for another test task. Academic achievement tests cannot contain trap items found in psychological tests.

The easiest way to select distractors in tasks with two answers is by denying what is true. However, it is not recommended to use “yes-no” words instead of distractors, since otherwise it is quite difficult to formulate statements to which an unambiguous answer can be given.

EXAMPLES OF TASKS

Exercise 1

IF THE SUBTRACT IS INCREASED BY 12 UNITS, AND THE DIFFERENCE ALSO INCREASES BY 15 UNITS, THEN THE DECREASED

A. Increased

B. Decreased


Tasks with three answers, as well as with two answers, are usually used in express diagnostics. Sometimes three responses appear due to the removal of dysfunctional distractors. In general, they can be considered unsuccessful because they are not brief enough, and at the same time they have a high probability of guessing the correct answer.

Task 2

HIGH RATES OF URBANIZATION IN LATIN AMERICA ARE ASSOCIATED WITH

A. Rapid economic growth

B. Strengthening the role of large cities

B. Mass migration of people from villages to cities


In most tests there are tasks with 4–5 answers, of which one is correct. When developed skillfully, they are quite short, and they have a low probability of guessing the correct answer (0.25 with four answers and 0.20 with five).

Task 3

THE ASSUMPTION THAT MONEY IS A SPECIFIC COMMODITY IS CONSISTENT WITH THE THEORY OF MONEY

A. Nominalistic

B. Metal

B. Quantitative

G. Trudovoy


Task 4

WHAT STRUCTURE IS A LATRIZED ORGANIZATION BUILT ON THE PRINCIPLE OF DOUBLE SUBMISSION OF EXECUTORS?

A. Design

B. System

B. Matrix

G. Functional

D. Geographical


Task 5

THE FUNCTION OF THE MEASURE OF COST IS PERFORMED BY:

A. Metal money

B. Ideal money

B. Real money

D. Mentally imagining money

D. Credit money


Sometimes a choice task has a dual structure, offering a set of statements or statements that are assessed by comparison with the proposed answers. For example, in task 6, statements characterizing the concept “Management” must be compared with various options for their truth.

Task 6

WHAT STATEMENTS PROVIDE CHARACTERISTICS OF THE CONCEPT OF “MANAGEMENT”?

1. The process of distribution and movement of resources in an organization with a predetermined goal, according to a pre-developed plan and with continuous monitoring of performance results.

2. A set of methods, principles, means and forms of managing organizations with the aim of increasing the efficiency of activities.

A. Only the first

B. Only the second

B. Neither the first nor the second


Despite its apparent attractiveness, in terms of content, task 6 is poorly formulated, since it can lead to an ambiguous interpretation of students’ answers. The choice of two answers A and B is equivalent to the choice of answer D, although the answers to the task should always have the property of relative independence and, figuratively speaking, negate each other.

Another example of modifying the form of a task with a choice of answers is given in task 7, where the selected answer is asked to be mentally substituted in place of the dash in the main part.

Task 7

MANAGEMENT IS THE COORDINATION OF _________ RESOURCES FOR THE PURPOSE OF SOLVING SET MANAGEMENT TASKS.

A. Information

B. Human

B. Temporary

G. Material


Even with a well-organized testing process, one version of a test cannot be administered due to cheating, hinting, and other similar undesirable effects. Therefore, you always have to develop 5–8 parallel versions of the test, for which faceted tasks can be used. A facet is a form that provides the representation of several variants of the same element of test content. Each subject receives only one task option from the facet. In this case, all test groups perform the same type of tasks, but with different facet elements and, accordingly, with different answers. Thus, two tasks are solved simultaneously: the possibility of cheating is eliminated and the parallelism of test options is ensured. For example, task 8 contains two test tasks obtained for each of the cities given in curly brackets.

Task 8

TO THE PALACE COMPLEXES IN THE SURROUNDINGS

{Moscow

St. Petersburg)

RELATE:

1) Pavlovsk, Oranienbaum

2) Arkhangelskoye, Tsaritsino

3) Peterhof, Gatchina

4) Tsarskoe Selo, Strelnya


In task 9, the author suggests choosing an element that is not related to the subject economic theory, which is not entirely justified didactic purposes control, but in this case is allowed by the content of the subject.

Task 9

PROVISIONS NOT RELATED TO THE SUBJECT OF ECONOMIC THEORY

A. Economic good

B. Unlimited resources

B. Maximizing need satisfaction D. Efficient use of resources

D. Legal relations


Such tasks, as well as tasks with several correct answers, as in task 10, are usually avoided from being included in certification tests, the results of which are used to make administrative and management decisions in education.

Task 10

SPECIFY THREE INTEGRAL ENVIRONMENTAL PARAMETERS AFFECTING THE DECISION-MAKING FUNCTION

A. Uncertainty

B. Complexity

B. Dynamism

D. Certainty

D. Limited


The appearance of partially correct answers by students, which arise when not all planned correct answers are selected in each such test task, leads to a decrease in the objectivity and comparability of test scores. If it is impossible to avoid several correct answers, then to increase the standardization of assessment procedures, a certain decisive rule. For example, if the subject chooses all the correct answers, then he receives 1 point, in all other cases - 0 points.

When calculating the results of tasks with a choice of one correct answer, a dichotomous assessment is usually preferred. For correct completion of a task, the subject receives 1 point, and for an incorrect answer or omission - 0. The summation of all units allows one to calculate the individual (primary or raw) score of the subject, which in the case of a dichotomous assessment is simply equal to the number of correctly completed tasks in the test. If the correct answer is not the only one, then a polytomous assessment is most often used, which is set in proportion to the number of correctly chosen answers.

If the test consists of multiple-choice items, then the individual scores of the test takers will be significantly distorted by the effect of random guessing of answers. Therefore, they try to correct the raw scores by introducing corrections for guesswork. The formula for correcting points obtained as a result of completing tasks with answers of which only one is correct has the form

X" i = X i – [W i / (k – 1)]

Where i– number of any subject in the group; X" i - adjusted score i-th subject; X i - test score before correction; W i– the number of unfulfilled (incorrectly completed, missed and unachieved) test tasks, and X+W i = n, Where n- number of tasks in the test.

When completing tasks with two answers k – 1 = 1, therefore, for each subject, the difference between the number of correctly completed and failed test tasks is calculated. As the number of distractors to a task increases, the number of points deducted decreases, which is quite natural, since the more distractors there are, the more difficult it is to guess the correct answer.

The correction formula itself has certain disadvantages that reduce the accuracy of test measurements. This is due to the fact that its construction is based on a number of artificial assumptions, which are often inconsistent with the actual procedure for performing the test. In particular, the assumption that all incorrect answers are the result of random guessing is far from being fully satisfied. Equally conditional is another assumption about the equal probability of choosing each answer to a test task.

In the process of creating tasks, certain form requirements are unknowingly violated. As a rule, this is due to the fact that all the developer’s attention is absorbed by the content, and not by the form. Violation of certain requirements gives rise to a number of characteristic shortcomings that arise even among experienced authors in the process of working on test tasks. The most common shortcomings of pretest tasks include:

Lack of logical correctness in the wording of test items, leading to unplanned correct answers;

Violation of the correct proportions in the task form, when the answers are much longer than the main part of the task;

Violation of brevity caused by inclusion unnecessary words or by presenting in test form content that is not suitable for testing by test;

Selecting answers for different reasons;

Miscalculations of task developers that contribute to guessing the correct answers without completing test tasks.

For example, task 11, which contains a short main part and long answers, can be considered unsuccessful due to the fact that the developer obviously rearranged them in vain. If the very definition of the circulation of capital, given in second place under the letter “B,” is placed at the beginning of the task, then the answers can consist of only one or several words.

Task 11

CIRCULATION OF CAPITAL IS

A. Continuous and consistent movement of money capital

B. Consistent transformation of capital from one functional form to another

B. Refund of Advance Value

D. Functioning of commodity capital


The answers in task 12 were poorly chosen, if we consider them without regard to the author’s substantive miscalculations. If the first three answers compare the volume of output of the monopolist and competition, then the fourth is aimed at establishing a causal relationship between objects.

Task 12

THE VOLUME OF PRODUCT OUTPUT OF THE MONOPOLY WILL BE COMPARED TO PERFECT COMPETITION

B. Less

B. Same

D. Depends on market conditions


There is no substantive and logical correctness in task 13, where out of the eight parameters given, only five are used in the answers.

Task 13

WHAT TWO PARAMETERS ARE NOT USED TO ANALYZE THE DIFFERENCES OF COUNTRY CULTURES FROM EACH OTHER?

1. The relationship between man and the environment

2. Time estimation

3. The nature of people

4. Communication style

5. Assessing the degree of activity

6. Freedom of access to information

7. Relationships between people

8. Attitude towards owning space


A. Fourth and sixth

B. Third and sixth

B. Third and fifth

D. Fourth and seventh


Task 14

A. Responsive to changes in the external environment and changes in accordance with them

B. Perceives all new trends from the external environment and necessarily applies them in his activities

B. Open to any innovation required by the owner

D. Reacts sensitively to the behavior of competitors and perceives the most effective principles for solving management problems


Task 15, although it looks rather cumbersome, differs favorably from most of the examples given above, since it contains a statement of the problem, and is not aimed, as is most often the case, at testing factual or conceptual material.

Task 15

WHAT WILL THE OBJECTIVE FUNCTION LOOK LIKE IN THE MATHEMATICAL MODEL OF THE CONTROL PROCESS IN APPLICATION TO THE FOLLOWING PROBLEM:

The company produces two drinks: “Tonic” and “Tarragon”. Production volume is limited by the number of auxiliary additives and production capacity. The production of 1 liter of “Tonic” takes 0.02 hours of equipment operation, the production of 1 liter of “Tarragon” takes 0.04 hours. The consumption of auxiliary additives is 0.01 kg/l for Tonic and 0.04 kg/l for Tarragon. The daily operating time of the equipment is 24 hours. The resource of auxiliary additives is 16 kg per day. The profit from the sale of 1 liter of “Tonic” is 0.1 rubles/l, and that of “Tarragon” is 0.3 rubles/l.

How many Tonic and Tarragon products should be produced daily to maximize daily profits?

A.0.01 X 1 + 0,04 X 2 > max

B. 0.02 X 1 + 0,03 X 2 > max

B.0.02 X 1 + 0,04 X 2 > max

G. 0.03 X 1 + 0,01 X 2 > max

5.3. Constructed answer questions

In tasks with a constructed answer (also called: addition tasks, open tasks), ready-made answers are not given; they must be invented or obtained by the student himself. There are two types of constructed-response tasks, one of which requires the subject to receive correct answers that are strictly regulated in content and form of presentation. The second is tasks with freely constructed answers, in which subjects make detailed answers, arbitrary in content and form of presentation and including a complete solution to the problem with explanations, micro-essays (essays), etc.

Students find constructed answer questions more difficult because they eliminate guesswork. Indeed, it is easier to choose the correct answer from those proposed, sometimes based not so much on knowledge as on intuition, than to formulate it yourself or find it in the process of solving the problems posed. But it is precisely this property that is extremely attractive for teachers, especially for those who are accustomed to relying on traditional means of control in their work and do not trust tests.

In tasks of the first type, the answer is usually quite short: in the form of a word, number, formula, symbol, etc. To develop tasks with a constructed, regulated answer, you need to mentally formulate a question, then write down a clear and concise answer, in which a dash is placed in place of the keyword, symbol or number. Due to the unambiguity of the correct answer, checking the results of tasks with a constructed regulated answer is quite objective; it is carried out in a computer form with the subsequent re-checking of all incorrect answers of students by expert means. Answers to assignments are given in place of the dash or entered by students on a special form.

For example: Enter the correct answer.

Exercise 1

Determining the end results to be achieved and the corresponding means necessary to obtain certain end results includes the functions of ___________________.


Task 2

A form of influence that involves masking real intentions and goals – _______________.


When performing tasks with a constructed regulated answer, partially correct and correct answers to varying degrees often appear. By adding the answer in place of the dash, the test taker can offer synonyms for the missing word planned by the developer or change the order of the elements in the missing formula, which significantly complicates the automated verification and evaluation of test takers’ results. For these reasons, it is not uncommon for the revision process to develop additional scoring conventions for partially correct answers.

Tasks with constructed regulated answers must satisfy a number of requirements:

Each task must be aimed at only one complementary word, symbol, etc., the place for which is recommended to be marked with a dash or dots;

A dash is placed in the place of the key element, the knowledge of which is most essential for the material being controlled;

It is recommended that all dashes in tasks for one test be of equal length;

It is better to allocate a place for the answer at the end of the task or as close to the end as possible;

After the dash, if possible, the units of measurement are indicated;

The text of the assignment must have an extremely simple syntactic structure and contain minimal amount information necessary to complete the task correctly.

Tasks of the second type with a freely constructed answer have no restrictions on the content and form of presentation of answers. During the allotted time, the test taker can write anything and however he wants on special answer forms. Undoubtedly, such fulfillment conditions are in many ways close to traditional written work, and therefore tasks with freely constructed answers are perceived positively by the absolute majority of teachers. They are interesting and varied in content.

Developing tasks of the second type may seem unreasonably easy. In fact, it is difficult not to formulate the task, but to offer a standard of the optimal answer along with standardized rules for assessing the results of its implementation. For example, the wording of a history task with a detailed answer is quite brief.

Task 3

NAME THE MAIN TASKS THAT WERE SOLVED IN THE FOREIGN POLICY OF RUSSIA IN THE 17TH CENTURY (INDICATE AT LEAST TWO TASKS). GIVE EXAMPLES OF WARS, CAMPAIGNS AND EXPEDITIONS IN THE 17TH CENTURY UNDERTAKEN TO SOLVE THESE PROBLEMS (AT LEAST THREE EXAMPLES).


But in order for a task to be included in the test, its author needs to standardize the verification procedure, and this is a voluminous work that sometimes causes a lot of criticism due to the ambiguity of the results of its implementation.

IN natural sciences it is much easier to propose a performance standard along with evaluation criteria. For example, for task 4 you can offer the following criteria for assessing the results of completion

Task 4

AT WHAT VALUES X CORRESPONDING FUNCTION VALUES f(x) = log 2 x And g(x) = log 2 (3 – x) WILL THEY DIFFER BY LESS THAN 1?


Testing of tasks with detailed answers is carried out by experts in accordance with standardized instructions containing the standard of the optimal answer with its characteristics and quality attributes, as in the example given. The standard must be accompanied by evaluation categories for issuing a polytomous assessment, which require testing and statistical substantiation of quality, since among them there may be some that do not work and those that reduce the differentiating effect of the test.

Tasks with detailed answers require significant teaching labor when checking, when experts have to analyze many answers that are correct to varying degrees and compare them with the standard, without taking into account the completeness, external format of the answers, spelling errors and everything that is not included in the criteria for assigning a polytomous assessment. Sometimes they try to standardize verification by developing computer expert programs.

Outside of automated systems, checking the results of completing tasks with detailed answers is quite subjective, and coordinating the assessments of several experts is difficult, so usually such tasks take up no more than 10–15% of the total length of the certification test.

Free-response items are primarily designed to assess cognitive skills. They should be developed only in cases where simpler forms cannot be used;

The length and complexity of the answer can vary widely (up to several pages of answer text, justification for the given solution to the problem, etc.). It is advisable to introduce restrictions on the maximum length of the answer for each task in the instructions;

The formulation of the task should include a statement of the problem, a standard of performance and evaluation criteria. The production part should be so clear as to minimize possible deviations in the correct answers of students from the standard of execution planned by the developer;

The choice of time frame for completing each task should allow the student to formulate a sufficiently detailed answer and have time to write it down.

The reliability of assessments of the results of tasks with freely constructed answers can be increased if:

Competently compose the task, guided by the above recommendations;

When checking, use only the developed standardized assessment scheme with no more than three assessment categories (0, 1, 2);

Train assignment reviewers to use standardized assessment criteria;

Involve at least two experts to check each task and invite a third expert if the scores of the first two differ by more than one point;

Use the principles of anonymity of the work being checked and independence of expert judgments;

Do not look at the grade given to a previous assignment when grading a subsequent one.

Essay type assignments can be graded according to the following:

Simple assessment schemes, when criteria are built with a focus on the content of students’ answers;

Complicated assessment schemes that take into account during the examination the content of the answers, the quality characteristics of the text, its completeness and style, or any other factors that seem important to the developer of the task;

Under any grading scheme, items with freely constructed answers require polytomous scoring, which sometimes unjustifiably inflates their overall weight in the test score. In order to avoid such overestimation and to reduce the influence of the subjective component, they usually try to make the number of evaluation criteria quite small, limiting themselves to polytomous assessments, for example, from 0 to 3 or from 0 to 2.

For tasks with a short, regulated answer, formulated in the form of unfinished statements and presented without special answer forms, an instruction consisting of one word is usually used: “ADD”. In cases where, for answers to tasks with a short regulated answer, it is necessary to give answers in special forms, and not next to the tasks, the instructions may look like: “Write ANSWERS TO TASKS IN THE ANSWERS FORM TO THE RIGHT OF THE CORRESPONDING TASK NUMBERS. WRITE EACH LETTER IN A SEPARATE BOX ACCORDING TO THE SAMPLES PROVIDED ON THE ANSWER FORM.”

Instructions for tasks with freely constructed answers usually have a free form. The main thing is to say as much as possible to facilitate and standardize the work of experts when checking test results to reduce the influence of subjective factors and increase the reliability of pedagogical measurements. In the most general form, for humanities subjects, the instructions may look like: “FOR ANSWERS TO TASKS, USE A SEPARATE ANSWER FORM. FIRST WRITE DOWN THE TASK NUMBER AND THEN A DETAILED ANSWER TO IT. WRITE YOUR ANSWERS CLEARLY.”

5.4. Compliance tasks

Correspondence tasks have a specific form, where under the instructions there are elements of two sets, the correspondence between which is asked to be established by the subject. On the left are usually the elements of the defining set containing the statement of the problem, on the right are the elements to be selected.

The correspondence between elements of two columns can be one-to-one when each element on the right corresponds to exactly one element on the left. If the number of elements in two columns is the same, then the last element of the specifying set will not be selected. There are cases, determined by the specific content of the subject, when the same elements on the right are selected for several elements of the left column, so there may be fewer of them than on the left. Finally, the optimal task is one in which the right set contains more elements, each of which is selected only once. For example, success is 1, not success 2, because the number of items to select on the right is equal to the number of items in the left column.

Exercise 1

FOR EACH OF THE THREE ELEMENTS (1, 2, 3), ONE CORRESPONDING ELEMENT IS SELECTED FROM THE RIGHT SIDE WITH THE LETTERS (A, B, C, D, E, E, G, Z, I, K).

Determine the correspondence of manager roles to three blocks according to G. Mintzberg’s model


The answers can be presented in the form of a table, in which case there is no need for detailed instructions such as those given for task 1.


Task 2

MATCH



Extra elements in the right column that cannot be selected if the answers are correct are called distractors. As in multiple-choice tasks, the greatest difficulties in development are associated with the selection of plausible redundant elements in the right set. The credibility measure of each distractor is established empirically.

When developing compliance tasks, you should be guided by the following rules:

The task is formulated so that all content can be expressed in the form of two sets with appropriate names;

The elements of the specifying column are located on the left, and the elements for selection are located on the right;

It is desirable that each column have a specific name that summarizes all elements of the column;

It is necessary that the right column contains at least several distractors. It’s even better if the number of elements in the right set is approximately twice as large as the number of elements in the left column;

It is necessary that all distractors in one task be equally likely to be plausible;

Column items should be selected on a single basis to ensure that only homogeneous material is included in each test item.

In a certification test, compliance tasks are ineffective due to their cumbersomeness, which does not allow covering a large amount of content.

Matching tasks come with a standard two-word instruction: “MATCH.” Sometimes the instructions are expanded, especially in cases where there is a separate answer form. For example, the instruction may look like: “FIRST WRITE THE LETTERS CORRESPONDING TO THE SPECIFIED ELEMENTS IN THE TABLE GIVEN IN THE TEXT OF THE TASK, AND THEN TRANSFER THEM TO THE FORM.”

Performance on matching tasks is assessed using either a dichotomous or a polytomous assessment. In dichotomous scoring, 1 point is given for all correctly identified matches in the test item. If at least one match is incorrect, then the subject receives 0 points for a partially correctly completed matching task. Another way is to assign one point for each correct match, then when checking items for matches, polytomous scoring is used, and the total number of points for the item is equal to the number of correctly identified matches.

5.5. Tasks to establish the correct sequence

Test tasks of the fourth form are intended to assess the level of proficiency in a sequence of actions, processes, etc. In such tasks, elements related to a specific task are presented in a random order, and the subject must establish the correct order of the proposed elements and indicate it in a given way in a specially designated place.

The standard instructions for tasks of the fourth form are: “ESTABLISH THE CORRECT SEQUENCE.” Sometimes instructions are included in the text of the assignment.

Exercise 1

PLACE THE NAMES OF THE RUSSIAN COMMANDERS IN THE CHRONOLOGICAL SEQUENCE OF THEIR ACTIVITIES. WRITE THE LETTERS THAT INDICATE THE NAMES IN THE CORRECT SEQUENCE IN THE TABLE PROVIDED IN THE TEXT OF THE TASK, AND THEN TRANSFER THEM TO THE FORM.

A) Dmitry Pozharsky

B) Alexey Ermolov

B) Mikhail Skobelev

D) Alexey Orlov


Task 2

ESTABLISH THE CORRECT SEQUENCE OF THE EXHALITION MECHANISM BY PLACEMENT OF THE NUMBERS IN THE DESIGNATED PLACES:

? – collapse of the lungs

? – inhibition of the respiratory muscle center in the spinal cord

? – stimulation of the expiratory center in the medulla oblongata

? – relaxation of the diaphragm and auxiliary muscles

? – reduction of the chest cavity


In many cases, tasks to establish the correct sequence are extremely low-tech or are not applicable due to the specific content of the subject. They are cumbersome and often allow for an ambiguous sequence of answers, so they are not recommended for use in certification tests.

5.6. Comparative characteristics of test task forms

In the process of developing a test, the author always has a question: should he stick to just one form of tasks or decide to combine different forms in one test? And if you stop at one, which one should you prefer? The author's choice is largely determined by the specific content of the academic discipline, the goals of creating and using the test. A lot here depends on the technology of testing, collecting and processing empirical data, on the technical and material support for the process of applying the test. It is easy to organize computerized collection and analysis of test results in the case where all tasks are multiple-choice.

The results of completing tasks with constructed answers require manual processing. As a rule, experts have to be involved to evaluate the results of their implementation, and this requires additional material costs and time for verification.


Advantages and disadvantages of various forms of assignments

1. Tasks with two answers

Advantages: due to their brevity, they allow you to cover a large amount of material, are easy to develop (only one distractor), and the results of execution are quickly processed with high objectivity.

Flaws: stimulate rote memorization, encourage guessing, require an increase in the number of tasks and, accordingly, testing time to compensate for the effect of guessing.

2. Tasks with a choice of four to five answers

Advantages: suitable for a wide variety of academic subjects, due to the brevity of the wording in the test, a large amount of content can be covered, provide the possibility of automated testing and high objectivity of student assessments, allow for detailed statistical analysis their characteristics, adjust them and significantly increase the reliability of pedagogical measurements.

Flaws: require significant work by the authors when selecting distractors and correcting students’ scores; they are not suitable for testing the productive level of activity and cognitive skills.

3. Tasks with constructed regulated answers

Advantages: easy to develop, eliminates guessing, partially suitable for automated verification.

Flaws: they mainly check knowledge of factual material or conceptual apparatus, usually (in humanitarian subjects) are too easy, sometimes leading to ambiguous correct and partially correct answers.

4. Tasks with freely constructed answers

Advantages: allow you to evaluate complex educational achievements, including communication skills, creative level activities, are easily formulated as traditional tasks, and eliminate guessing.

Flaws: they require a lengthy, expensive verification procedure, significant execution time, do not allow covering a significant amount of subject content, and reduce the reliability of pedagogical measurements.

5. Compliance tasks

Advantages: easy to develop, ideal for assessing associative knowledge and conducting ongoing monitoring, reducing the effect of guessing.

Flaws: most often used when checking the reproductive level of activity and algorithmic skills, they are cumbersome in the form of presentation.


Comparative characteristics of the technological properties of various test forms are presented in Table. 5.1.


Table 5.1Comparative technological characteristics of molds





The choice of the form of pre-test tasks is determined by the specifics of the controlled content and the goals of creating the test. Each form of tasks has its own advantages and disadvantages, its own scope of application.

The development of pre-test tasks is carried out in accordance with standardized requirements, the content of which depends on the specifics of the test form. Tasks with multiple choice answers are most convenient for final control due to a number of technological advantages that increase the efficiency of the processes of applying the final test and assessing student results. Complementation tasks are preferable for learning control.

Modern trends in the development of final tests are associated with a departure from monoformity and the widespread use of supplementary tasks, since the variety of knowledge and skills being tested requires the introduction of various forms in the test.

Practice exercises and discussion questions

1. Circle the number of the correct answer.

It is more effective to use test items with two answers in control:

1) current

2) thematic

3) final

4) input

2. Circle the number of the correct answer.

1) current

2) final

3) input

3. Circle the number of the correct answer.

The probability of guessing the location number of the correct answer in a task with five distractors will be:

4. Find two significant shortcomings in the task, reformulate the task to eliminate the shortcomings.

Which class of animals are characterized by the following characteristics: cold-blooded, living in water and on land, breeding in water?

1. Fish class

2. Class of reptiles

3. Class of amphibians

4. Class of mammals

5. Suggest a method for improving the wording of tasks.

Which of the following was eliminated by the reforms of the 60s of the 19th century in Russia?

1. Autocracy

2. Serfdom

3. Estate

4. Landownership of land

5. National oppression

Expert: in Information Technology and Educational Video

Before we get into the principles of test design, there are a few points that need to be made.

Differences between the test and tasks in the test form

Concepts are constantly confused in everyday consciousness test And test task systems(or pre-test tasks).

As a rule, the test is developed by a team of researchers and tested over a certain period of time. After testing, adjustments are made to the test. The test consists of test tasks. In English-language literature, the term “Quiz” (but not “Test”!) is used to refer to a test.

Thus, the teacher (teacher) cannot create tests. Instead, he develops tasks in test form that are superficially similar to the test, but do not undergo statistical or other testing. Such tasks can be used in educational process to solve certain pedagogical problems.

It follows from this that it is fundamentally impossible to use a number of test characteristics. For example, the difficulty of a test task is determined experimentally, based on the results of a large sample of students. In practice, the teacher does not have both the time to conduct the experiment and the required sample size. Therefore, the difficulty is often determined “by eye”.

In general, tasks in test form (as well as test tasks) meet the following requirements:

  • brevity;
  • manufacturability;
  • certainty of purpose;
  • logical form of statement;
  • certainty of place for answers;
  • the same rules for evaluating answers;
  • correct location of task elements;
  • identical instructions for all subjects;
  • adequacy of the instructions to the form and content of the task.

So, brevity tasks in the test form is ensured by a careful selection of words, symbols, and graphics, allowing, with a minimum of means, to achieve maximum clarity of the semantic content of the task.Manufacturability of tasks is defined as a property that allows the testing process to be carried out using technical means, and to do so accurately, quickly, economically and objectively.Logical form of the statement is a means of streamlining and effectively organizing the content of a task.

Forms of test tasks

In addition, the principles for developing test tasks (tasks in test form) are related to their forms. Various authors The forms of test tasks are classified differently. To make matters worse, each automated testing system names the same forms differently. Let us summarize the variety of forms of test tasks with the following classification.

  1. True False (True or False, from English True or False)– contains a statement with which the student must either agree or disagree.

For example:

The first US President was George Washington

  1. Right
  2. Wrong

In the Unified State Exam, similar tasks are found in KIMs according to foreign languages in listening tasks: students listen to the text, then move on to tasks of the True or False type.

This form of test tasks is the simplest for both the teacher to compile and the students to answer. Such forms of assignment are characterized high degree guessing the correct answer.

2. Multiple choice (tasks with the choice of one or more correct answers). This is the most common form of test tasks. It contains a statement (question) and alternative answers.

For tasks with a choice of one correct answer, it is recommended to use at least 4 (if less, the probability of guessing the correct answer increases) and no more than 6 (it is difficult to come up with plausible alternatives).

For tasks with multiple correct answers, at least 6 alternatives are recommended.

3. Tasks to establish compliance. It is a set of elements in two columns - the student needs to establish a correspondence between the elements of the left and right columns. Having a heading for each set of columns is a must—it allows the student to avoid wasting time summarizing the items in the columns and get straight to the activity.

Compare:

  1. a) Label
  2. b) Ulus
  3. c) Volosten
  4. d) Vice
  5. e) Plintha
  1. Battery wall structure
  2. Brick
  3. Khan's charter
  4. Governor of the volost
  5. Possession

Also a task, reformatted taking into account specific requirements:

As we see, in the second case the task is more readable, its meaning is easily understood. Please note that, for example, the OnlineTestPad service and some others allow you to add such headers. Others (like Moodle) do not have this functionality. In this case, it is necessary to write a complete instruction, for example, “Match the correspondence between .... and ....”

In paper tests of this form of tasks for the correct answer You are asked to fill out a special form. The option with arrows is considered less technologically advanced for testing, so it should be avoided.

2 3 4

Also desirable set an odd number of elements left and right columns so that the last element is not selected by elimination.

Look at this assignment:

Arrange the names of Russian commanders in chronological order (in ascending order) of their activities

Dmitry Pozharsky

Alexey Ermolov

Mikhail Skobelev

Alexey Orlov

This is also a task to establish correspondence, or more precisely, its variety - a task to establish the correct sequence. A number of foreign researchers are inclined to such a union. That is why, for example, we will not find such a form in Moodle. But it can easily be constructed from a matching task. For clarity, let’s slightly reformat the previous task:

We see a classic matching task, just the left column represents the numerical order. The student should also enter the correct answers in a special table.

2 3

Sometimes the tasks described above are true or false, multiple choice and correspondences combine to a closed task group, which have the following common features:

  • The correct answer is clearly present, you just need to choose it one way or another;
  • Answers to questions can be guessed (the probability of guessing increases as the number of alternatives decreases);
  • Answers can be recalled
  • Answers can be selected logically, discarding obviously incorrect alternatives.

5. Addition (short answer). In these tasks, the student must complete the correct answer. Sometimes this type of task is called open type tasks. In contrast to the forms of test tasks discussed above, strategies such as guessing, remembering the correct answer, etc. “do not work.” Therefore, this type of task is considered more difficult for students.

6. Essay– a short answer from the student to the essence of the question. Strictly speaking, an essay is not a form of test, because... it does not meet the necessary criteria of brevity, manufacturability, etc. In our opinion, the essay was introduced to overcome well-known difficulties in composing typical test tasks, the main of which is the inability to presentalleducational material in test form and the reproductive nature of typical test tasks.

However, essay assignments must propose standard(s) of an optimal answer along with standardized rules for assessing the results of its implementation.

For what values ​​of x are the corresponding values ​​of the functions f(x)=log 2 x and g(x) = log2 (3 – x) will differ by less than 1?

Criteria for assessing the correct answer

Points Criteria for assessing the completion of task 9
2 The correct sequence of solution steps is given:

1) compiling an inequality containing a module;

2) solution of the inequality.

All transformations and calculations were carried out correctly, the correct answer was obtained

1 The correct sequence of solution steps is given. When solving the inequality in step 2, a typo and/or a minor computational error was made, which does not affect the correctness of the further progress of the solution. This clerical error and/or error may result in an incorrect answer.
0 All cases of solution that do not meet the above criteria for scoring 1 or 2 points

Principles for developing tasks in test form

The next thing we should focus on is the principles of developing tasks in test form.

For a long time there was a belief that the test itself was an objective means of control. However, then the understanding came that the test provides, first of all, procedural objectivity. To assess the quality of a test, there are a number of related areas - reliability (the guarantee that there are no random errors in the test), validity (the guarantee that the test measures exactly what it is supposed to measure), difficulty, etc. As we indicated above, all these parameters are developed based on various mathematical models during experimental work author's team and are not available to teachers and professors. Therefore, we will only dwell on a number of theoretical requirements for the development of tasks in test form.

  1. Start constructing a task with the correct answer. It often happens that a task formally contains more correct answers than planned. There are also opposite cases - the task does not contain the correct answer at all.
  2. The content of the assignment is based on the requirements of the program and reflects the subject (meta-subject) content. Sometimes they try to include questions in the test for which there is simply no correct answer.

For example:

We will study Latin language because…

  1. It is spoken in many countries around the world
  2. We want to understand better native language, since it contains many words borrowed from Latin
  3. We want to better understand the history and culture of the ancient world

This good job. But it should be used in a sociological survey, and not in tasks to test educational achievements.

  1. The question should be aimed at identifying one element of knowledge, one complete thought. Otherwise, it is difficult to diagnose the reason for task failure.

Confucius..

  1. lived in Africa
  2. lived in China
  3. was a doctor
  4. was a ruler
  5. was a philosopher

This task is aimed at identifying two elements at once - where Confucius lived and who he was. It is necessary to separate these two issues.

  1. When writing questions, you should avoid the words “sometimes”, “often”, “always”, “a little”, “more”, etc. Such words have a subjective meaning and may lead to erroneous answers. Test tasks (tasks in test form) must have a clear, unambiguous answer.
  2. Avoid introductory phrases or sentences that have little connection with the main idea, and do not resort to lengthy statements.

For example:

“Anadyr depression. It’s very flat, and the Anadyr wags along it like a huge boa constrictor... “The Anadyr is a yellow river,” that’s how the essay can be called later. Tundra and lakes throughout the depression. It’s difficult to understand what is more: either lakes or land” (O. Kuvaev). Which sea does this river flow into?

  1. Correct answers must be plausible, skillfully selected, there should be no obvious wrong answers (they are a kind of hint - this answer is certainly wrong). Incorrect answers that are very similar to the correct ones are calleddistractors. For example:

Birthplace of Karl Marx:

  1. Trier
  2. Karl-Marx-Stadt
  3. Sturgard
  4. Munich

Here we can assume that the city of Karl-Marx-Stadt got its name because it was where Karl Marx was born. However, the correct answer is Trier.

  1. Don't ask trick questions – the most capable or knowledgeable students who know enough to fall into the trap are likely to be misled, and it also defeats the purpose of determining the level of knowledge and understanding.
  2. Longer questions and shorter answers should be used that are grammatically consistent with the main part of the task. .

For example:

Which statement is correct?

  1. Incomplete sentences- these are sentences in which one of the main members is missing
  2. Incomplete sentences are sentences in which one of the minor members is missing
  3. Incomplete sentences are sentences in which any member of the sentence is missing - main or secondary

It is easy to see that there is a repeated phrase here, which should be included in the wording of the task:

Incomplete sentences are sentences that are missing

  1. one of the main members
  2. one of the minor members
  3. any member of the sentence - main or secondary
  1. Don't use negatives in the body of the question. Firstly, this leads to a misunderstanding of the essence of the task. Secondly, the object of control should be elements of knowledge, not elements of ignorance.

For example:

Whether or not these people actually lived in Ancient Greece?

  1. Homer
  2. Achilles
  3. Zeus
  4. Pericles
  5. Phidias
  6. Aristotle
  7. Socrates

IN in this case It’s not clear how to answer – yes they lived, or yes they didn’t live. Therefore, the question needs to be formulated more precisely, for example: Name the mythological characters of Ancient Greece.

  1. When alternating correct answers in questions there should be no obvious system – for example, only 1 option is always correct, or the correct options are sequentially the first, second, third, fourth option. In computer testing this problem usually does not exist because the computer automatically shuffles the alternatives.
  2. If the question is of a quantitative nature, then it is necessary to indicate the sequence (from least to most or vice versa) of choosing the correct answers .

For example:

Distance from the Sun

a) Saturn

b) Mercury

c) Earth

d) Uranium

e) Venus

f) Mars

In this example, there are, as it were, two sets of correct answer options - one sequence from the closest planet from the Sun, the other from the most distant one.

  1. The question and answer should differ in font and spatial design. For example, a question (task) is highlighted in bold, the answer is in regular font. Additional indentation is used to record responses. But this rule applies only to paper tests - in computer automated systems, the design is set by software, which is not advisable to change.

And remember - not every task can be presented as a test control.

When writing the article, examples were used http://koi.tspu.ru/koi_books/samolyuk/

As is known, the unit of the test, its structural element, is the test task. It can be defined as “the simplest and at the same time holistic structural element test. The tasks themselves included in the test can be varied both in the form of presentation and in content. There are different approaches to classifying test tasks according to the form of their presentation. The most common types of test tasks are shown in Figure 3.1.

The main factor influencing the form of the test task is the method of obtaining the answer (choosing from the options offered or independently formulating the answer). Then this classification can be represented by the following scheme.


It should be noted that test tasks have a number of characteristics. Each test task has its own serial number. As a rule, tasks in the test are arranged in increasing order of difficulty, although it is possible that the complexity of tasks fluctuates in different directions as you progress through the test.

Each test task has a standard correct answer. As a rule, tasks that do not have a correct answer are not included in the test.

Test items of one form are usually accompanied by standard instructions, which precede the formulation of the items in the test.

For each test task, a rule for grading (awarding points) is developed.

Test in terms of presentation form and execution time, it is usually quite short. When formulating a task, pay attention to the fact that all statements of the test are understandable to all students without exception (formulated in simple expressions with commonly used vocabulary, without terms using foreign or rarely used words. If possible, in tasks, phrases with the negation “not” are avoided, since it is considered that it is preferable to affirm something (both positive and negative).

Open type tasks. In open-form tasks (tasks for addition), ready-made answers are not given; they must be obtained. There are two types of open tasks:

  • 1) with restrictions imposed on the answer;
  • 2) without restrictions imposed on the answer, in which test takers must compose a detailed answer in the form of a solution to the problem.

Tasks of the second type differ little from traditional tests, require greater testing costs and are more difficult to standardize.

When answering an open-ended task with a limited answer, the student fills in the missing word, formula or number in place of the dash or in a specially designated space on the answer form.

Instructions for open-type tasks are usually accompanied by the words: “Write the missing word in place of the dash” or “Get and write the answer on the answer form,” etc.

Closed type tasks. Multiple choice tasks. A closed task with a choice of answer, as a rule, includes a question and several possible answers to it (they are indicated by the letters A, B, C, D,... or numbers: 1,2,3,4,...). The student must choose the correct ones from among the answers. In most tests, only one is correct. But sometimes test developers include several correct answers among the answers. Plausible responses are called distractors. Their number in a task is usually no more than five. Distractors are selected taking into account typical mistakes schoolchildren.

A closed test task with a choice of answers is considered to be “working well” if students who know the educational material complete it correctly, and those who do not know choose any of the answers with equal probability.

Tasks with multiple choice answers are usually preceded by the following instructions: Indicate the number (letter) of the correct answer (for blank testing) or: Press the key with the number (letter) of the correct answer (for computer testing).

Test items with a choice of one correct answer, as a rule, have the following characteristics:

ambiguity and ambiguity are avoided in the text of the assignment;

the task has a simple syntactic structure;

the main part contains as many words as possible, leaving no more than 2-3 keywords for the answer for a given problem. All repeated words are excluded from the answers by entering them into the main text of the task;

answers to one task usually offer the same length;

they try to exclude all verbal associations that contribute to choosing the correct answer using a guess;

the frequency of choosing the same number of the correct answer in different text tasks is usually the same or this number is random;

  • *Test items that contain value judgments and opinions of the test taker on any issue are usually excluded;
  • *the number of answer options in each task is the same and usually no more than five (rarely - 7);
  • *when formulating distractors (plausible answers), avoid the expressions “none of the listed”, “all of the listed”, etc., which contribute to guessing; in answers try not to use words such as “all”, “none”, “never” ", "always", etc., as facilitating guessing;
  • *distractors are offered so that they are equally attractive to subjects who do not know the correct answer;

none of the distractors is a partially correct answer that turns into a correct answer under certain conditions;

answers that follow from one another are excluded from the list of incorrect ones;

answers are selected so that the key of one task does not serve as the key to the correct answers of another task, that is, distractors from one task are not used as the correct answers of another;

all answers, as a rule, are parallel in design and grammatically consistent with the main part of the test task;

*if there are alternative answers in a task, they are not placed next to the correct one, as this immediately focuses attention on them.

Comparative characteristics of test task types. The choice of test task types is determined by many parameters: the specific content of the academic subject, testing goals, the level of complexity of the tasks, the professionalism of the developer, etc.

Each type of test task has its own advantages and disadvantages. For example, closed-form multiple-choice tasks are characterized by the advantages that all tests have, namely:

  • - objectivity in assessing the results of work;
  • - speed of checking completed tasks;
  • - systematic testing of a sufficiently large volume of educational material.

At the same time they have positive characteristics, inherent only this species tasks. For example, they are the easiest to process, allow you to organize computer collection and analysis of results, etc., without much expense. But such tests also have their drawbacks:

checking only the final results of the work;

inability to trace the logic of a student’s reasoning when completing tasks;

some probability of choosing an answer at random;

the impossibility of testing certain types of educational activities (for example, independently finding directions for solutions).

A fairly large number of tasks in the test (usually more than 20) and big number answer options (more than 4).

Some of these disadvantages (for example, guessing the answer) can be avoided by open-ended tests. But, at the same time, the results of these tasks are more difficult to statistically process, and to evaluate tasks with a detailed answer, the involvement of experts is required, which, in turn, reduces the objectivity of control, complicates the standardization of the test, and increases the time and financial costs of processing test results.

In test theory, the view is increasingly expressed that in one test it is desirable to use as few different forms of test items as possible. Professional tests are often distinguished by the monoformity of their tasks. However, this requirement is not always feasible due to the specifics of a particular subject. Therefore, developers often combine within one test different kinds test tasks (for example, closed and open).

For example, tests centralized testing contain two parts (part A and part B). Part A contains closed-type test tasks, and part B contains open-type tests.

Tables 1.2 and 1.3 show the comparative characteristics of test tasks of various types.

Table 1.2. Comparative analysis of test tasks in accordance with the levels of mastery of educational material

Based on some of these characteristics, test creators can choose a form of test items that is suitable for certain purposes. It should also be noted that only a reasonable combination of tests with traditional forms and methods of control will allow obtaining a comprehensive picture of the level of knowledge.

Table 1.3. Comparative analysis of test tasks in accordance with test design indicators

Design indicators

Types of tasks

Closed

Open

Multiple choice

To establish compliance

To establish the correct sequence

With a limited answer

Free answer

Ease of design

Not always

Not always

Not always

Guessing effect

Objectivity in assessing the result of implementation

Depends on the quality of the task

No, the rating is subjective

Possibility of student errors when writing an answer

Share with friends or save for yourself:

Loading...