Monday, 3 March 2014

Language Testing: The Road to Perfection



Language testing is an unavoidable part of the language teaching experience. 

Some of us have to create classroom tests to check progress in the material, others join full-time testing teams charged with the task of creating an assessment syllabus and a complex exam system for an institution. At one point or another in our careers we all find ourselves in situations where the responsibility of measuring students’ performance is thrust upon us – often without much support or advance preparation. Which is precisely why we must seek opportunities to exchange experiences with like-minded fellow professionals – more often than not to find that the issue you have long been struggling with has already been solved in someone else’s institution. And vice versa.

TEAM 2014, Kayseri


I recently gave a series of talks and workshops at Meliksah University in Kayseri, Turkey which hosted a three-day training event, in co-operation with SELT Academy and IDP Education, called TEAM 2014: Testing, Evaluation and Assessment Masterclass. Professionals from 13 universities around the country came together to share their insights. My fellow trainer Dr. Simon Phipps and I were in equal measures pleasantly surprised and thoroughly vindicated when, on day 2, one of the trainees stood up before a plenary to suggest they all get together informally to continue discussing their testing experiences. In their free time! It is precisely these grassroots initiatives that will provide us with the support that we all need when we design tests.

A three-way tug-of-war


No matter what the context is, there are some shared principles that apply to us all. Creating a language test is always a balancing act between three major forces: validity (how closely the test scores correlate with the students’ real-life abilities as well as the teaching curriculum), reliability (to put simply, how well it measures those abilities) and practicality (how user-friendly our test is).



It is impossible to create a test where all three forces operate at 100%. To give you an example: you can increase reliability by reducing the margin of error. The more test items you devote to checking a particular structure (or skill), the less likely it is that the obtained score are a result of a random factor – like blind luck. However, as you increase the item count, your test will gradually become less and less valid. The more items you devote to a single structure the more you are also shifting the bias towards this single structure – to the detriment to all other aspects of language. And the higher the item count, the more time candidates will need to complete them – which will also negatively affect practicality. It will simply not be worth using a test if it takes so long to get a reliable score.

All we can do is accept the best available compromise. Provided, of course, that the compromise doesn’t involve sacrificing any of the measurement qualities.

What are we testing?


Before we begin developing a test, we must first and foremost pin down what it is what we want to find out. I often define testing as „the systematic gathering of information for the purpose of making decisions” (Weiss 1972, quoted in Bachman 1990) – so the questions are:

1. What decisions do we want to make?


For example: a) which candidates have the required skills and knowledge to enter tertiary education; b) which candidate is most suitable for a particular job, and so on.

2. What information do we need?


For example: a) how much do they know about the foundations of their chosen subject, how developed are their research and academic writing/speaking skills; b) what would they do in a typical workplace situation, how well do they co-operate during teamwork, and so on.

3. What’s the best way of collecting that information?


Now, the point here is that the answer won’t always be: "through a test"! Multiple-choice tests, for instance, are all very impressive and professional-looking, but they will never give us a complete picture of someone’s complex set of skills. The trick is to select those particular sources of information and evaluation methods that give us the information we need. And not something else completely. And not a jumble of a lot of factors that we cannot untangle. And not a lot of irrelevant information just because it was easy to measure it.

What we do with our test must always be appropriate to our purpose, and do just what we need it to do rather than become a burden on candidates. We shouldn't test for testing's sake - we should test because there's information without which we can't make that important decision.

It cuts both ways


But let’s not forget that testing and teaching both have an impact on each other. Tests should reflect prevailing teaching practices (otherwise they are not measuring what candidates have learnt and how they have learnt it), as well as they should have a positive effect on the teaching process – a positive "washback”. (I will blog more on washback later - being a hobby-horse of mine, it would soon become a considerable diversion here.)

The importance of using yardsticks


Obviously, our responsibility doesn’t only extend to what measurement tools we use, but also to how well we use them. Which is why following accepted standards (like the far-from-perfect but nonetheless essential Common European Framework of Reference, CEFR) and, within their internationally transparent framework, establishing our own standards is so vital. To quote Alderson et al. (1995), we need „an agreed set of guidelines which should be consulted, and as far as possible, heeded in the construction and evaluation of a test”.

(In case you want to find out what I meant in my comment above, see my presentation from TEAM 2014 on Slideshare at: http://www.slideshare.net/SeltAcademy/21-applying-standards-to-testing-plenary-ctsacademic.)

The long road to perfection


Test development is a long, time-consuming process. I’m not suggesting here that, for a ten-minute flash test you must go through all the stages of test planning and writing – simply that you don’t forget about what goes into the development of a good test and take some time to consider all the key factors. The principles are still the same, no matter how big or small your test is.

Test development doesn’t finish when the first complete test is finally written. It’s a cyclical, on-going process, which – I’m sorry to say – never actually ends.



The key to developing good, reliable, valid and practical test is never to rest on your laurels. Plan them well, write them with due care and attention, use them wisely – but never forget to look back on the testing experience, to analyse your data and use them to make improvements to your test before it is used again. In order to find out what needs improving, we must learn the art of feedback - which will, again, be the subject of a future post on this blog. (For those who attended our training, a gentle reminder: remember "sandwich".)

Test development is an ever-rising spiral, with our tests becoming better and better as we go on. Perfection may be far off yet, but that doesn’t mean we shouldn’t be travelling towards it all the time.

Friday, 22 November 2013

Speaking and writing in exam training: blended solutions


I presented this paper on behalf of OUP at the IATEFL conference in Liverpool in April 2013. For those of you who couldn't be there, here's a summary of what we discussed in the session.

Success factors in exam training

When teachers are faced with the task of preparing a group of students for a given exam, there is one known constant: the students, without exception, must attain the level required by the exam. Although this objective – exam success – is clear, there are several factors that necessitate teachers’ attention in the classroom.

First of these is the current level or starting level of the individuals comprising the group. Teachers need measurement tools, both valid and reliable, to efficiently measure students’ level of English in order to establish the starting point for the exam training syllabus. Online and computer-assisted as well as printed placement tests cater for this need.

Another key factor in exam success is familiarity. As one delegate aptly put it in the session: ’there ought to be no surprises in the exam’. Learners need to be familiar both with the test itself (its structure, the types of tasks involved, the amount of time available for each paper and part and so on) and its assessment criteria (to put simply, how candidates can obtain or lose scores as well as the relative weighting of the various papers and parts). The simplest solution is to select course materials designed around the particular exam that learners are preparing for – or to supplement a core course material with such dedicated exam training materials.

Finally, teachers must also provide a balanced coverage of all the language (grammar, vocabulary, functions) and all the skills tested in the exam, both receptive and productive skills. To ensure exam success, the focus should not be on what is easier to quantify and to teach, or what is practical to fit into a lesson timeframe, but on what each learner will be expected to do in the exam.

Potential problem areas for training for productive skills


Writing and speaking both present a number of practical difficulties that teachers must find solutions for. By their nature, productive skills are less predictable (there is often no such thing as ’the’ correct answer) and more challenging to break down into classroom activities.

Writing issues


The writing process proper takes an inordinate amount of time – often more than how much time is feasible in the contact hours available. Writing activities therefore often take place outside the classroom – that is, outside the environment controlled by the teacher. Only the lead-in work, and occasionally some form of follow-up is usually done in class. Task-setting may be done in or outside class, but subsequent monitoring is difficult, impractical, often even impossible. Teachers only find out that some learners got on the wrong track after students have completed their assignments. This then leads to further complications like more unplanned remedial work or a repeat of the writing task, preferably with modified parameters to avoid duplicating the task for those learners who got it right first time around.

Furthermore, if marking is to be thorough, by necessity it will be extremely time-consuming. Conversely, if it is to be done quickly and promptly, it will be superficial.

Speaking issues


All speaking work, on the other hand, is normally done in class – precisely so that teachers can control it. This means learners either have to perform simultaneously, where again the issue of proper monitoring arises, or they perform individually (or in open pairs or groups), which reduces all other learners’ chances to speak. Teachers face the dilemma of providing either maximum opportunity–minimum control/feedback or maximum control/feedback–minimum opportunity.

Blended solutions


Blended learning: the amalgamation of face-to-face and course material-driven approaches with online approaches can provide a solution for the above issues.

The following diagram shows a possible model for covering writing training.



The difficulties caused by the practical necessity of completing writing assignments out of the classroom can be remedied through the use of a Learning Management System (LMS) which allows teachers to set up and monitor tasks remotely. Many online services also offer an automatic (or at least a guided semi-automatic) marking facility for more closed types of writing.

Speaking can also be aided through the use of online learning services (online workbooks, practice tests, etc.), which often offer a speak-and-record facility. Teachers can find similar free-to-use web services online. Embracing social media channels, like online video chat can also enhance exam preparation and extend contact time.

Blended learning may not provide solutions for every exam training issue, but is well worth considering.

Go digital - life is peaceful there... or is it?

Digital was a big thing at this year's IATEFL in Liverpool - some publishers didn't even bring books to their stands! I fondly recall a confused teacher walking up to said publisher's stand, asking: "You don't have any coursebooks any more?" - to which the marketing person replied: "Of course, we do. Have a look in our catalogue." The teacher, relieved, "So, may I have a catalogue then?" Marketing person, slightly embarrassed, "Actually, we haven't got any printed copies with us, but use one of the iPads at our stand to browse our online catalogue." The teacher ambled up to the row of iPads, then stood around helplessly until the same marketing person walked up to her to find out what, if anything, was the matter this time. "Can you show me how to use this thing, please?", the lady said... No comment. 

BEBC didn't refrain from commenting on the subject, though: http://bebcblog.wordpress.com/2013/11/13/so-is-the-elt-textbook-dead-or-not/

So is digital a bandwagon we should all be in a hurry to jump on? I read this thought-provoking article on eltjam.com where Laurie Harrison made a compelling case for iterative publishing - that is, continually updated online teaching materials replacing coursebooks: http://www.eltjam.com/iterative-publishing-in-elt-10-reasons-why-it-will-and-wont-work/

To me, the cons still outweigh the pros. This is where my reservations stem from: digital is simply a different medium, but the pedagogical values and teaching/learning objectives should neither be restricted by the medium's limitations or by its vast technological potential. We shouldn't be doing only what the format allows us to do, but we should mould the format to allow us to do what we want to do. We pay a lot of lip service to "blended learning", but what I'm missing here is old-fashioned (yes, coursebook) values blended with a mixture of media to provide a more stimulating as well as more familiar environment for learners. 

For more food for thought, have a look at Hugh Dellar's insightful blog post into the wider ramifications: http://hughdellar.wordpress.com/2012/10/26/technology-and-principles-in-language-teaching/

I'm not against digital in principle, obviously. Digital media open up opportunities for real communication as well as communication practice, they provide access to vast amounts of information as well as stimulating materials, and they come equipped with the facilities for differentiated, personalised learning experiences. None of which was around when I was a student, then a teacher of English. I had to make do with what limited amount of support I could find around me in a non-English-speaking environment. Before the advent of digital, that wasn't a lot.

Anyway, I just thought I'd throw in this topic for all of us to think more about. I believe there is a way in which digital can work in our favour - but we, teachers, should be mapping the route and leading the way, not technology. What do you think?