- Appel, R., & Wood, D. (2016). Recurrent word combinations in EAP test-taker writing: Differences between high-and low-proficiency levels. Language Assessment Quarterly, 13(1), 55–71. [Google Scholar]
- Aryadoust, V. (2016). Gender and academic major bias in peer assessment of oral presentations. Language Assessment Quarterly, 13(1), 1–24. [Google Scholar]
- Aryadoust, V., & Zhang, L. (2016). Fitting the mixed Rasch model to a reading comprehension test: Exploring individual difference profiles in L2 reading. Language Testing, 33(4), 529–553. [Google Scholar]
- Attali, Y., Lewis, W., & Steier, M. (2013). Scoring with the computer: Alternative procedures for improving the reliability of holistic essay scoring. Language Testing, 30(1), 125–141. [Google Scholar]
- Babaii, E., Taghaddomi, S., & Pashmforoosh, R. (2016). Speaking self-assessment: Mismatches between learners’ and teachers’ criteria. Language Testing, 33(3), 411–437. [Google Scholar]
- Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1–42. [Google Scholar]
- Baker, B. A. (2012). Individual differences in rater decision-making style: An exploratory mixed-methods study. Language Assessment Quarterly, 9(3), 225–248. [Google Scholar]
- Barkaoui, K. (2014). Examining the impact of L2 proficiency and keyboarding skills on scores on TOEFL-iBT writing tasks. Language Testing, 31(2), 241–259. [Google Scholar]
- Bax, S. (2013). The cognitive processing of candidates during reading tests: Evidence from eye tracking. Language Testing, 30(4), 441–465. [Google Scholar]
- Bochner, J. H., Samar, V. J., Hauser, P. C., Garrison, W. M., Searls, J. M., & Sanders, C. A. (2016). Validity of the American Sign Language Discrimination Test. Language Testing, 33(4), 473–495. [Google Scholar]
- Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. [Google Scholar]
- Bridgeman, B., Cho, Y., & DiPietro, S. (2016). Predicting grades from an English language assessment: The importance of peeling the onion. Language Testing, 33(3), 307–318. [Google Scholar]
- Brooks, L., & Swain, M. (2014). Contextualizing performances: Comparing performances during TOEFL iBTTM and real-life academic speaking activities. Language Assessment Quarterly, 11(4), 353–373. [Google Scholar]
- Butler, Y. G., & Zeng, W. (2014). Young foreign language learners’ interactions during task-based paired assessments. Language Assessment Quarterly, 11(1), 45–75. [Google Scholar]
- Cai, H. (2013). Partial dictation as a measure of EFL listening proficiency: Evidence from confirmatory factor analysis. Language Testing, 30(2), 177–199. [Google Scholar]
- Cai, H. (2015). Weight-based classification of raters and rater cognition in an EFL speaking test. Language Assessment Quarterly, 12(3), 262–282. [Google Scholar]
- Canale, M., & Swain, M. (1981). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1(1), 1–47. [Google Scholar]
- Chalhoub–Deville, M., & Deville, C. (1999). Computer adaptive testing in second language contexts. Annual Review of Applied Linguistics, 19, 273–299. [Google Scholar]
- Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–405. [Google Scholar]
- Chapman, E. (2003). Alternative approaches to assessing student engagement rates. Practical Assessment, Research & Evaluation, 8(13), 1–7. [Google Scholar]
- Choi, I. (2017). Empirical profiles of academic oral English proficiency from an international teaching assistant screening test. Language Testing, 34(1), 49–82. [Google Scholar]
- Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117–135. [Google Scholar]
- Denies, K., & Janssen, R. (2016). Country and gender differences in the functioning of CEFR-based can-do statements as a tool for self-assessing English proficiency. Language Assessment Quarterly, 13(3), 251–276. [Google Scholar]
- Dulock, H. L. (1993). Research design: Descriptive research. Journal of Pediatric Oncology Nursing, 10(4), 154–157. [Google Scholar]
- Eckes, T. (2012). Operational rater types in writing assessment: Linking rater cognition to rater behavior. Language Assessment Quarterly, 9(3), 270–292. [Google Scholar]
- Eckes, T. (2014). Examining testlet effects in the TestDaF listening section: A testlet response theory modeling approach. Language Testing, 31(1), 39–61. [Google Scholar]
- Eckes, T. (2017). Setting cut scores on an EFL placement test using the prototype group method: A receiver operating characteristic (ROC) analysis. Language Testing, 34(3), 383–411. [Google Scholar]
- Farnsworth, T. L. (2013). An investigation into the validity of the TOEFL iBT speaking test for international teaching assistant certification. Language Assessment Quarterly, 10(3), 274–291. [Google Scholar]
- Fidalgo, A. M., Alavi, S. M., & Amirian, S. M. R. (2014). Strategies for testing statistical and practical significance in detecting DIF with logistic regression models. Language Testing, 31(4), 433–451. [Google Scholar]
- Goodwin, A. P., Huggins, A. C., Carlo, M., Malabonga, V., Kenyon, D., Louguit, M., & August, D. (2012). Development and validation of extract the base: an English derivational morphology test for third through fifth grade monolingual students and Spanish-speaking English language learners. Language Testing, 29(2), 265–289. [Google Scholar]
- Granfeldt, J., & Ågren, M. (2014). SLA developmental stages and teachers’ assessment of written French: Exploring Direkt Profil as a diagnostic assessment tool. Language Testing, 31(3), 285–305. [Google Scholar]
- Green, A., & Hawkey, R. (2012). Re-fitting for a different purpose: A case study of item writer practices in adapting source texts for a test of academic reading. Language Testing, 29(1), 109–129. [Google Scholar]
- Han, C. (2016). Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach. Language Assessment Quarterly, 13(3), 186–201. [Google Scholar]
- Harding, L. (2014). Communicative language testing: Current issues and future research. Language Assessment Quarterly, 11(2), 186-197. [Google Scholar]
- Harding, L., Alderson, J. C., & Brunfaut, T. (2015). Diagnostic assessment of reading and listening in a second or foreign language: Elaborating on diagnostic principles. Language Testing, 32(3), 317–336. [Google Scholar]
- Haug, T. (2012). Methodological and theoretical issues in the adaptation of sign language tests: An example from the adaptation of a test to German Sign Language. Language Testing, 29(2), 181–201. [Google Scholar]
- Hirai, A., & Koizumi, R. (2013). Validation of empirically derived rating scales for a story retelling speaking test. Language Assessment Quarterly, 10(4), 398–422. [Google Scholar]
- Hoang, G. T. L., & Kunnan, A. J. (2016). Automated Essay Evaluation for English Language Learners: A Case Study of MY Access. Language Assessment Quarterly, 13(4), 359–376. [Google Scholar]
- Hsieh, M. (2013a). An application of Multifaceted Rasch measurement in the Yes/No Angoff standard setting procedure. Language Testing, 30(4), 491–512. [Google Scholar]
- Hsieh, M. (2013b). Comparing yes/no Angoff and Bookmark standard setting methods in the context of English assessment. Language Assessment Quarterly, 10(3), 331–350. [Google Scholar]
- Hsu, T. H. L. (2016). Removing bias towards World Englishes: The development of a Rater Attitude Instrument using Indian English as a stimulus. Language Testing, 33(3), 367–389. [Google Scholar]
- Huang, B., Alegre, A., & Eisenberg, A. (2016). A cross-linguistic investigation of the effect of raters’ accent familiarity on speaking assessment. Language Assessment Quarterly, 13(1), 25–41. [Google Scholar]
- Huang, F. L., & Konold, T. R. (2014). A latent variable investigation of the Phonological Awareness Literacy Screening-Kindergarten assessment: Construct identification and multigroup comparisons between Spanish-speaking English-language learners (ELLs) and non-ELL students. Language Testing, 31(2), 205–221. [Google Scholar]
- Ilc, G., & Stopar, A. (2015). Validating the Slovenian national alignment to CEFR: The case of the B2 reading comprehension examination in English. Language Testing, 32(4), 443–462. [Google Scholar]
- Jin, T., & Mak, B. (2013). Distinguishing features in scoring L2 Chinese speaking performance: How do they work?. Language Testing, 30(1), 23–47. [Google Scholar]
- Jin, T., Mak, B., & Zhou, P. (2012). Confidence scoring of speaking performance: How does fuzziness become exact?. Language Testing, 29(1), 43–65. [Google Scholar]
- Kang, O. (2012). Impact of rater characteristics and prosodic features of speaker accentedness on ratings of international teaching assistants' oral performance. Language Assessment Quarterly, 9(3), 249–269. [Google Scholar]
- Katzenberger, I., & Meilijson, S. (2014). Hebrew language assessment measure for preschool children: A comparison between typically developing children and children with specific language impairment. Language Testing, 31(1), 19–38. [Google Scholar]
- Kim, H. J. (2015). A qualitative analysis of rater behavior on an L2 speaking assessment. Language Assessment Quarterly, 12(3), 239–261. [Google Scholar]
- Knoch, U., & Chapelle, C. A. (2017). Validation of rating processes within an argument-based framework. Language Testing, 34, 1–23. [Google Scholar]
- Kokhan, K. (2013). An argument against using standardized test scores for placement of international undergraduate students in English as a Second Language (ESL) courses. Language Testing, 30(4), 467–489. [Google Scholar]
- Koo, J., Becker, B. J., & Kim, Y. S. (2014). Examining differential item functioning trends for English language learners in a reading test: A meta-analytical approach. Language Testing, 31(1), 89–109. [Google Scholar]
- Kuiken, F., & Vedder, I. (2017). Functional adequacy in L2 writing: Towards a new rating scale. Language Testing, 34(3), 321–336. [Google Scholar]
- Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33(3), 319–340. [Google Scholar]
- Lado, R. (1961). Language Testing. New York: McGraw-Hill. [Google Scholar]
- Lam, R. (2015). Language assessment training in Hong Kong: Implications for language assessment literacy. Language Testing, 32(2), 169–197. [Google Scholar]
- Lee, H., & Winke, P. (2013). The differences among three-, four-, and five-option-item formats in the context of a high-stakes English-language listening test. Language Testing, 30(1), 99-123. [Google Scholar]
- Lee, S., & Winke, P. (2018). Young learners’ response processes when taking computerized tasks for speaking assessment. Language Testing, 35(2), 239-269. [Google Scholar]
- Li, H., & Suen, H. K. (2013). Detecting native language group differences at the subskills level of reading: A differential skill functioning approach. Language Testing, 30(2), 273–298. [Google Scholar]
- Li, H., Hunter, C. V., & Lei, P. W. (2016). The selection of cognitive diagnostic models for a reading comprehension test. Language Testing, 33(3), 391–409. [Google Scholar]
- Lin, C. K., & Zhang, J. (2014). Investigating correspondence between language proficiency standards and academic content standards: A generalizability theory study. Language Testing, 31(4), 413–431. [Google Scholar]
- Ling, G. (2017). Is writing performance related to keyboard type? An investigation from examinees’ perspectives on the TOEFL iBT. Language Assessment Quarterly, 14(1), 36–53. [Google Scholar]
- Mann, W., Roy, P., & Morgan, G. (2016). Adaptation of a vocabulary test from British Sign Language to American Sign Language. Language Testing, 33(1), 3–22. [Google Scholar]
- Murray, J. C., Riazi, A. M., & Cross, J. L. (2012). Test candidates’ attitudes and their relationship to demographic and experiential variables: The case of overseas trained teachers in NSW, Australia. Language Testing, 29(4), 577–595. [Google Scholar]
- Nakatsuhara, F., Inoue, C., Berry, V., & Galaczi, E. (2017). Exploring the use of video-conferencing technology in the assessment of spoken language: a mixed-methods study. Language Assessment Quarterly, 14(1), 1–18. [Google Scholar]
- Pan, M., & Qian, D. D. (2017). Embedding Corpora into the Content Validation of the Grammar Test of the National Matriculation English Test (NMET) in China. Language Assessment Quarterly, 14(2), 120–139. [Google Scholar]
- Papageorgiou, S., & Cho, Y. (2014). An investigation of the use of TOEFL® Junior™ Standard scores for ESL placement decisions in secondary education. Language Testing, 31(2), 223–239. [Google Scholar]
- Pill, J., & McNamara, T. (2016). How much is enough? Involving occupational experts in setting standards on a specific-purpose language test for health professionals. Language Testing, 33(2), 217–234. [Google Scholar]
- Saida, C. (2017). Creating a Common Scale by Post-Hoc IRT Equating to Investigate the Effects of the New National Educational Policy in Japan. Language Assessment Quarterly, 14(3), 257–273. [Google Scholar]
- Sanchez, S. V., Rodriguez, B. J., Soto-Huerta, M. E., Villarreal, F. C., Guerra, N. S., & Flores, B. B. (2013). A case for multidimensional bilingual assessment. Language Assessment Quarterly, 10(2), 160–177. [Google Scholar]
- Sato, T. (2012). The contribution of test-takers’ speech content to scores on an English oral proficiency test. Language Testing, 29(2), 223–241. [Google Scholar]
- Savignon, S. J. (1972). Communicative competence: An experiment in foreign language teaching. Philadelphia: Center for Curriculum Development. [Google Scholar]
- Shaw, S., & Imam, H. (2013). Assessment of international students through the medium of English: Ensuring validity and fairness in content-based examinations. Language Assessment Quarterly, 10(4), 452–475. [Google Scholar]
- Shohamy, E., Gordon, C. M., & Kraemer, R. (1992). The effect of raters' background and training on the reliability of direct writing tests. The Modern Language Journal, 76(1), 27–33. [Google Scholar]
- Spolsky, B. (2008). Language assessment in historical and future perspective. In E. Shohamy & N. Hornberger (Eds.), Encyclopedia of language and education (Second ed., Vol. 7: Language testing and assessment, pp. 445–454). New York: Springer Science. [Google Scholar]
- Stansfield, C. W. (2008). Lecture: “Where we have been and where we should go”. Language Testing, 25(3), 311–326. [Google Scholar]
- Suvorov, R. (2015). The use of eye tracking in research on video-based second language (L2) listening assessment: A comparison of context videos and content videos. Language Testing, 32(4), 463–483. [Google Scholar]
- Suzuki, Y. (2015). Self-assessment of Japanese as a second language: The role of experiences in the naturalistic acquisition. Language Testing, 32(1), 63–81. [Google Scholar]
- Tengberg, M. (2017). National reading tests in Denmark, Norway, and Sweden: A comparison of construct definitions, cognitive targets, and response formats. Language Testing, 34(1), 83–100. [Google Scholar]
- Timpe-Laughlin, V., & Choi, I. (2017). Exploring the Validity of a Second Language Intercultural Pragmatics Assessment Tool. Language Assessment Quarterly, 14(1), 19–35. [Google Scholar]
- Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Findings of a European study. Language Assessment Quarterly, 11(4), 374–402. [Google Scholar]
- Wagner, E. (2013). An investigation of how the channel of input and access to test questions affect L2 listening test performance. Language Assessment Quarterly, 10(2), 178–195. [Google Scholar]
- Wei, J., & Llosa, L. (2015). Investigating differences between American and Indian raters in assessing TOEFL iBT speaking tasks. Language Assessment Quarterly, 12(3), 283–304. [Google Scholar]
- Widdowson, H. G. (1983). Learning purpose and language use. Oxford: Oxford University Press. [Google Scholar]
- Winke, P., Gass, S., & Myford, C. (2013). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30(2), 231-252. [Google Scholar]
- Xi, X., Higgins, D., Zechner, K., & Williamson, D. (2012). A comparison of two scoring methods for an automated speech scoring system. Language Testing, 29(3), 371–394. [Google Scholar]
- Zhang, L., Goh, C. C., & Kunnan, A. J. (2014). Analysis of test takers’ metacognitive and cognitive strategy use and EFL reading test performance: A multi-sample SEM approach. Language Assessment Quarterly, 11(1), 76–102. [Google Scholar]
|