The trouble with exams | Centre for the Use of Research & Evidence in Education (CUREE)

Submitted by paul crisp on Tue, 17/04/2012 - 19:12

I am intrigued by the recent attack on A Levels by Michael Gove. For me (though not necessarily for Mr Gove), it links back to the big fuss around the 'scandal' of exam boards briefing teachers on the content of future papers. Those getting aerated about exam papers were, I contend, demonstrating lack of contact with the real world. My fundamental thesis - which I am alarmed to see is apparently shared by the Secretary of State - is that we try to achieve too many mutually exclusive outcomes with the exam system and a spot of rationalisation is needed for the system to hit any. My basis stance, and the reason anyone from CUREE might have a professional interest in this, is that the exam system is essentially a very flawed research method (the research 'subject' being the student's command of the knowledge set) and thinking of it in this way helps reveal both the problems with it and some of the remedies.

Our current system was devised, remember, to provide a route to university and to relieve universities of the burden of establishing (or continuing) their own admissions grading systems. It was originally run by universities individually or collectively (Cambridge still owns their exam board) and was designed to process a relatively small number of candidates from independent and grammar schools. Because the system's purpose was a simple and competitive one - select from a larger number, a smaller number of students equal to the number of university places - the grading system was normative (e.g. give the top nn% an A) and longitudinally inconsistent. This is one of the reasons it is impossible to compare year on year results.

Because the quantity of candidates was small, the questions could be qualitative (and challenging). This required a small number of expert examiners who would take a relatively long time applying their connoisseurial knowledge to the answers. Their marking was pretty inconsistent (i.e. wrong) at least as often as now but, in a more deferential age, they were less often challenged. This did mean, though that the questions could have high relevance, that is, the result was meaningfully connected to the purpose - which was to uncover the candidate's knowledge of a subject. However, even in those halcyon days, you couldn't test the candidate's knowledge of the entire syllabus - the exam would have lasted days - so the questions were a sample of the full range of knowledge encompassed in the syllabus. For that sample (the questions actually answered) to be a reliable indicator of the population (i.e. the full syllabus), the questions must be random or (because there are degrees of randomness) as random as possible. They weren't, of course, then any more than they are now but the whole thing mattered less, the competition for university places was less intense,you could get on to popular courses in 'good' universities (then, there wasn't any other kind) with B's and C's or worse, and there were still many good jobs for which a degree was optional.

It all went wrong in the 1970s:

Parallel exam systems were merged to produce a single continuous system - it was no longer principally for university admissions;
the number of university places increased dramatically and a degree became increasingly the 'normal' route into desirable occupations;
more schools aspired to get students (in ever larger numbers) to university;
governments and others used exam outcomes increasingly as the standard way of judging school performance;
those agencies also tried to use the exam results year on year to support or refute arguments about system improvement.

The damaging outcome of all these changes were;

a massive increase in the number of students taking exams;
it was harder to find sufficient experts to mark the papers;
the cost of the whole system rocketed
if exam scores represented an absolute level of knowledge (rather than a competition for top slicing the best of a cohort), then there was no reason not to modularise it and allow multiple attempts (as most vocational exams do including accountancy and the law);
it all mattered more to everyone so the system had to be more resistant to challenge (i.e. more 'objective'). Relevant (but 'subjective') questions had to be replaced by reliable (and 'objective') marking schemes;
exams become a game (i.e. an artificial construct with its own rules for determining the winner) with less and less relevance to the original purpose.

If this was a research design, one would conclude that the high cost (from a large sample) plus the need for technically valid (replicable, standardised over time) research instruments results in poor relevance - the answers bear a weak relationship to the question.

This was all exaggerated by the difficulty of working out what you were testing for anyway - which brings us back to Mr Gove's idea about making A Levels more the creature of (Russell Group) universities. A test good for judging the the suitability of a candidate for a place on a geography undergraduate course is unlikely to very good for judging his/her suitability for a job in banking, or marketing, or nursing or, frankly, any job at all. Over 30 years ago Ron Dore, in his excellent book "The Diploma Disease" put his finger on it. In a (slightly) more recent revisiting of the issue. Professor Dore sums up the problem thus:

"As (high stakes exams] come to dominate the curriculum, there is a ‘backwash’ effect: the preparatory functions of primary and junior secondary schools – preparing the ‘successful’ minority for selection exam success and further education – tends to dominate their terminal educational functions – preparing the ‘unsuccessful’ majority for life and entry into work. This increases the problems of ‘relevance’ of the basic educational cycle."

All very well, you might say, but what's the solution? Simplistically put (and I make no claim to practicality), it is to design your tests to provide data relevant to your questions. This would result, for instance, in:

a distinctive system for assessing suitability for higher education;
establishing a large bank of questions from which a small set was selected quasi-randomly (completely randomly would throw up other anomolies) without foreknowledge of anyone and just before the test period starts;
allowing much longer for marking (to allow you to use your scarce expert markers more efficiently);
recognising that relevance is as important as validity - by shielding exam setters from legal and other challenges about individual scores (not wholly comfortable with this one myself);
prohibiting employers from using A levels for job selection unless it included sponsorship on a degree course (I did warn you about practicality!).

Mr Gove's plan to have A Levels designed by Russell Group academics (though I note that RG has already declined the honour) may be a step along this path.

Oh - I should perhaps mention that I was talking to a professor of educational assessment at a recent meeting at All Souls and she pointed out that I completely misunderstood the whole exams thing. You, however, can make up your own minds

Paul Crisp