Assessment of Item Parameters of 2017 National Examinations Council’s English Language Multiple Choice Test Using Item Response Theory

Main Article Content

Temitope G. Bolaji
A. A. Adediwura
J. F. Omidiora

Abstract

The study aimed to evaluate the adequacy of the NECO 2017 English Language examination items in measuring their intended objectives among secondary school students in Osun State. It assessed the dimensionality, local independence, item difficulty, and discrimination indices of the examination. A non-experimental design of descriptive research type was employed, involving 26,127 senior secondary students who sat for the exam in June/July 2017. Using a stratified random sampling technique, the students were divided based on characteristics such as gender, school type, and academic performance. Data from the NECO headquarters, including responses and scores, were analyzed using the two-parameter logistic (2PL) IRT model. Three research questions were addressed, revealing that the test was unidimensional with a maximum DETECT value of -0.1331, ASSI of -0.211, and RATIO of 0.142. However, 30 item pairs were locally dependent. Additionally, 19 out of 100 items were found to be of poor quality based on difficulty parameters outside the acceptable range (-3 to 3), and 29 items had poor discrimination indices (less than zero). The study concluded that while the exam's multiple-choice items were largely unidimensional and independent, suggesting reliable scores. It was recommended that there is a need to identify specific skills influencing student performance and develop targeted interventions to improve overall test performance.

Article Details

Section
Articles

References

Adediwura, A. A. (2012). Teacher’s perception of school-based assessment in Nigerian secondary schools. Mediterranean Journal of Social Sciences 3(1).

Afolayan, A. (1977). Acceptability of English as a second language in Nigeria. Acceptability in Language.

Baker, F. (2001). The basics of item response theory (2ed.): ERIC Clearing on Assessment and Evaluation.

Camili, G. (2006). Test fairness. I R.L Brenan (ed), educational measurement (4th ed.), Westport, CT: American Council on Education & Praeger. 4, pp 221-256.

Chalmers, R. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software.

Demars, C. (2010). Item response theory. Understanding statistics and measurement. Oxford University Press.

Embretson, S. E., & Reise, S. P. (2000). Using item response theory to improve the psychometric properties of an instrument.

Faleye, B. A., & Olajide, A. A (2012). A revalidation of students’ evaluation of teaching effectiveness rating scale. IFE Psychology, 20.

Flores, G. S. (2000). What is cultural validity in assessment? Retrieved from http://www.edgateway.net/cs/cvap/print/dcos/cvap/news.htm.

Hambleton, R. K. & Jones, R. V. (1993). Comparison of classical test theory and item response theory and their application to test development. Educational Measurement: Issues and Practices, 12(3); 38 – 47.

Jimoh, K. (2021). Gender and culture-related differential item functioning in 2016 National Examinations Council Mathematics multiple choice questions in Nigeria. Unpublished Ph.D. Thesis, Obafemi Awolowo University, Ile-Ife.

Lam, T. C. M. (1995). Fairness in performance assessment, Eric Clearing house on counselling and student service Greensboro NC. Retrieved from http://www.ericfacility.net/ericdigest/ed391982.htm

McNamara, T., & Roever, C. (2006). Language testing: The social dimension, Malden, MA & Oxford: Blackwell.

Ojerinde, O., Popoola, K., Ojo, F. & Onyeneho, P. (2012). Introduction to item response theory, parameter models, estimation and application. Marvelouse Mike Press LTD

R Core Team (2019). R: A language and environment for statistical computing. R foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Robitzsch, A. (2019). Sirt: Supplementary item response theory models. R package version 3.7-40. https://CRAN.R-project.org/package=sirt

Wilberg, M. (2007). Differential item functioning analysis of high stake test in terms of gender. Malaysian Online Journal of Educational Sciences.

Zhang, J., & Stout, W. (1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64(2), 213-249. Doi:10.1007/BF02294536.