| To test, or not to test, is that the question? |
|
By Dr. Anita Craig ‘Testing’ new intakes of scholars, job applicants, and also those in need of particular care or help, has become popular, if not standard practice, across the board. Moreover, just about every schoolchild knows about IQ, and its false copy, EQ, vocational tests, and tests of interest and personality (type ‘psychometrics’, ‘the popularity test’, ‘self-esteem test’, and ‘EQ’ into a search engine for specific examples). Generally, we test because we want to know how an individual’s test performance differs from other individuals like him/herself (e.g., asking who the cleverest is); why an individual behaves as s/he does (e.g., asking why someone seems unable to do certain things); and how an individual will behave and perform in future – given his/her test performance. We therefore test because we want to describe, explain, and predict an individual’s behaviour on the basis of her/his performance on a specific test. Phrased differently, the individual’s standardized score on a test tells us where an individual is placed against a ‘norm group’, i.e., others like him/herself in terms of age, sex, and other relevant factors which will have an influence on that which the test measures. Further, we assume that an individual’s performance on a test tells us not only about his/her present intelligence or ability, interests, or personality (depending on the specific test in question), but also provides the basis on which to predict her/his future performance on similar tasks or in similar situations. It is worth noting that psychological tests, generally, are not like tape-measures that measure height, or scales that measure body mass, or our ‘weight’, as we commonly say. They are different from tape-measures and scales in that one centimetre, like one gram, remains one centimetre or one gram throughout the process of measurement and regardless of when a test was conducted (Find out about the Flynn Effect here), who was measured, and whether the comparison was between a 100 year old French speaker who lived in Timbuktu and a one year old Xhosa speaker who lived in New York. Psychological tests, especially IQ measures, are converted from ‘raw scores’ (the number of correct answers to the questions and problems set on the test), into a psychological measure of the individual’s performance on the test; once thus standardized the score can of course be used for comparative purposes (e.g., across age, sex and language). The standardized score is obtained through a calculation which takes into account the performance of a norm group on the same questions and problems. It is in this way that we may understand someone being described as ‘below’ or ‘above’ average; the individual’s psychometric measure falls below or above what is typical of the performance of others like him/herself. Another introductory comment to make, albeit slightly trivial, is that well-constructed psychometric assessments are not the same as ‘star gazing’, ‘fortune telling’ or other ‘psychic or mystic readings’. Psychometrics aims to be a scientific study of individual differences, that is to say to make claims about the differences between us that are based on good, strong supporting evidence and derived through logical thought processes, so that the claims can be checked and re-checked – publicly – for their reliability and validity. In what follows I make a few notes on different kinds of tests and specific abuses in the use of tests. Different kinds of tests Before Binet and Spearman’s contributions, Herbert Spencer (1820-1903) and Francis Galton (1822-1911) laid the tracks for the scientific study of individual differences (or the foundation of psychometrics). From these beginnings grew the field now so central to all aspects of our lives. Generally speaking, a professionally administered and interpreted test of intelligence can tell us a great deal. It is worth noting in this regard that experts agree that human intelligence differences can indeed be measured with high reliability, and that these measures are stable across most of the lifespan – other things being equal. When an individual’s circumstances (e.g. learning opportunities, access to resources, and exposure to stimuli) before and up to the test are likely to change after testing, a once-off test might not give a reliable indication of the individual’s potential to learn. For example, someone who has never encountered various gadgets or corporate environments cannot express a facility with these in a test. If such a person is, however, exposed during testing or in-between testing occasions to such opportunities, s/he may well develop the relevant ability and/or interest. This is the issue behind dynamic testing (in contrast to static, or once-off tests), or measures of learning potential. The issue of an individual’s learning history before testing has a specific bite when it comes to disadvantaged populations and groups of individuals who have been denied the relevant opportunities and resources to adapt to specific problem-solving situations and contexts of communication. The assessment of an individual’s interests is another focus of psychological tests; interests in, for example, working with things and gadgets, scientific pursuits, aesthetic pursuits and opportunities for self-expression, people contact and helping professions, corporate environments: buying, marketing, selling, and office practices and well-structured tasks (type RAISEC into a search engine for more). These kinds of assessments are used for advising and counselling new intakes regarding educational choice, career paths and so on. Unlike tests of intelligence, the assessment of interests often encounters large sex differences in individuals’ test performance, females typically veering towards ‘people’ (rather than ‘things’) in their interests. Moreover, it is obvious than one cannot be interested in something that you have not encountered before, so that the learning history of an individual or her/his past experiences has a distinct influence on the outcome of tests of interests. In addition to the assessment of cognitive abilities and interests, the study of individual differences focuses on enduring features of someone’s psychological make-up, character, or personality traits. The so-called ‘Big Five’ personality traits (extraversion, neuroticism, agreeableness, conscientiousness, and openness) are well known and often included in tests of personality. (Type ‘Big Five Personality Tests’ into your favourite search engine – there is a lot to read on this.) Having some grasp on how these traits are configured in the personality of a particular individual is thought to assist in fitting an applicant to the right organisation, position, or role. The idea of ‘EQ’ has had a fashionable run in the popular press or lay media. It is however not certain that the idea of emotional intelligence is more than a new-fangled label for an agreeable person, someone who is also probably high on extraversion and openness. In addition, we may note that motivation (or will) and emotion (or affect) clearly play their respective roles in problem-solving, but whether one of these ‘faculties’, emotion, is best thought of as a kind of intelligence is not certain (read more on Howard Gardner’s ideas about multiple intelligences here). What is certain is that EQ measures, if these can be called measures, have nothing like the scientific basis on which modern-day IQ measures rely to make reliable and valid statements about people’s intellectual abilities. Abuses of tests Phrased differently, all professionally developed tests are hemmed in by consideration of factors such as the following:
Put plainly, measures on a test of interest is not indicative of intelligence, and one standardised on 20 year old, English-speaking males in the USA, is not suitable for use with 50 year old women in Africa. Moreover, if whatever is measured is defined one way (e.g. intelligence = speed of task execution), it does not include another way of defining intelligence (e.g. intelligence = abstract reasoning skills). Moreover, if one or the other of these definitions depends on a discarded theory (of intelligence, in our example), then the measure of this - in the terms set by the test used - is then also questionable. Lastly, we expect from our tests that they will measure whatever it is that they purport to measure, the same repeatedly, and that they will do so accurately (visit what makes a good test?). Even the best tool can be turned into a useless instrument if a clumsy or stupid agent should misapply it. In this regard it is not usually tests, as such, that fail us, but more often a practitioner’s over-hasty and careless jump from a test score to a set of recommendations. Take for example a score on an IQ test of 80: If one wanted to be 100% sure that you have the correct measure of the person’s intelligence (scientifically, justifiable conclusions regarding the measurement of an individual’s performance on various test items, in terms of a specified norm group, etc.), one has to say that a person who scored thus has an IQ in the range of 65 – 95, give or take 5 points at both ends to allow for different systems. If we think of a range of possible scores, of which the given score (80 in our example) is the mid-point, the individual’s abilities begin to look both more flexible (their expression perhaps depending on context and learning opportunities) and not as clearly pinned to one, and only one, point on a scale. Further, specific, abuses of tests have to do with the practitioners’ commitments (or not) to the rules of scientific practice. For example, upholding the standardised procedure for administering a test, being objective in scoring, remaining within the limits of the test’s explicit framework for the interpretation of results, basing recommendations solely on the scope outlined in the test manual, and taking note of the critical literature surrounding the test used. Psychometrics is not tape-measuring and is not star gazing. It is however better than common sense when used wisely and well. The question is therefore not about whether to test or not, but about the right test to use for the task at hand, and about doing so within the limits set by the test design and the rules for scientific practice. For more information, visit Dr A.P. Craig online.
|





