Evaluating the Efficacy of Simulated User Models in Interactive Information Retrieval: A User-Based Approach
DOI:
https://doi.org/10.26692/surjss.v57i02.7672Keywords:
Interactive Information Retrieval,, simulated users,, user modeling, , search behavior,, evaluation,, user satisfactionAbstract
Abstract
Simulated user models are increasingly employed to evaluate IIR systems due to their scalability and consistency. However, the extent to which these models realistically replicate human behavior across diverse search tasks remains underexplored. This study investigates the behavioral fidelity of simulated users, specifically rule-based and LLM-driven agents, by comparing them to real users across factual, exploration, and comparative search tasks. Using a controlled experimental framework and the Search data set, analyze the 32 real-user sessions and 32 matched simulations based on retrieval performance (MAP, nDCG), behavioral patterns such as query reformulations, session time, and satisfaction measures. The study results show that simulated users closely approximate real-user performance in factual tasks; they significantly underperform in exploratory and comparative contexts, particularly in query reformulation frequency and satisfaction alignment. Simulated satisfaction scores, estimated through relevance proxies, diverged from real user ratings (3.4 vs. 4.1 average), highlighting cognitive and affective realism gaps. These findings suggest that current simulation models lack real users' adaptability and strategic diversity, especially in open-ended tasks. The study contributes empirical evidence of simulation limitations and guides for improving user model fidelity, emphasizing the need for hybrid evaluation frameworks that combine real-user insight with scalable simulation.
References
Adhav, H., & Singh, V. (2022). Topic Evolution Model for Interactive Information Search (pp. 149–164). https://doi.org/10.1007/978-981-16-9447-9_12
Aula, A., Khan, R. M., & Guan, Z. (2010). How does search behavior change as search becomes more difficult? Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 35–44. https://doi.org/10.1145/1753326.1753333
Baeza-Yates, R., Hurtado, C., Mendoza, M., & Dupret, G. (2005). Modeling User Search Behavior. Third Latin American Web Congress (LA-WEB’2005), 242–251. https://doi.org/10.1109/LAWEB.2005.23
Balog, K., & Zhai, C. (2023). User Simulation for Evaluating Information Access Systems.
Bates, M. J. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407–424. https://doi.org/10.1108/eb024320
Borlund, P. (2003). The IIR evaluation model: A framework for evaluation of interactive information retrieval systems. Information Research, 8(3).
Cleverdon, C. (1967). The CRANFIELD TESTS ON INDEX LANGUAGE DEVICES. Aslib Proceedings, 19(6), 173–194. https://doi.org/10.1108/eb050097
Dervin, B. (1983). An overview of sense-making research: Concepts, methods, and results to date. International Communication Association Annual Meeting, Dallas, TX.
Ebrat, D., Paradalis, E., & Rueda, L. (2024). Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems.
Engelmann, B., Breuer, T., Friese, J. I., Schaer, P., & Fuhr, N. (2023). Context-Driven Interactive Query Simulations Based on Generative Large Language Models.
Fu, W., & Pirolli, P. (2007). SNIF-ACT: A Cognitive Model of User Navigation on the World Wide Web. Human–Computer Interaction, 22(4), 355–412.
Järvelin, K. (2009). Explaining User Performance in Information Retrieval: Chal-lenges to IR evaluation. In Lecture Notes in Computer Science (Vol. 5766). Springer.
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446. https://doi.org/10.1145/582415.582418
Ji, K., Hettiachchi, D., Salim, F. D., Scholer, F., & Spina, D. (2024). Characterizing Information Seeking Processes with Multiple Physiological Signals. https://doi.org/10.1145/3626772.3657793
Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting clickthrough data as implicit feedback. Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 154–161. https://doi.org/10.1145/1076034.1076063
Kelly, D. (2009). Methods for evaluating interactive information retrieval systems with users. Foundations and Trends in Information Retrieval, 3(1–2), 1–224. https://doi.org/10.1561/1500000012
Kelly, D., Arguello, J., Edwards, A., & Wu, W. (2015). Development and Evaluation of Search Tasks for IIR Experiments using a Cognitive Complexity Framework. Proceedings of the 2015 International Conference on The Theory of Information Retrieval, 101–110. https://doi.org/10.1145/2808194.2809465
Lu, Y., Huang, J., Han, Y., Bei, B., Xie, Y., Wang, D., Wang, J., & He, Q. (2025). LLM Agents That Act Like Us: Accurate Human Behavior Simulation with Real-World Data.
Lykke, M., Larsen, B., Lund, H., & Ingwersen, P. (2010). Developing a Test Collection for the Evaluation of Integrated Search (pp. 627–630). https://doi.org/10.1007/978-3-642-12275-0_63
Maxwell, D., & Azzopardi, L. (2016). Simulating Interactive Information Retrieval. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1141–1144. https://doi.org/10.1145/2911451.2911469
O’Brien, H. L., Arguello, J., & Capra, R. (2020). An empirical study of interest, task complexity, and search behaviour on user engagement. Information Processing & Management, 57(3), 102226. https://doi.org/10.1016/j.ipm.2020.102226
Reinanda, R., Meij, E., & de Rijke, M. (2020). Knowledge Graphs: An Information Retrieval Perspective. Foundations and Trends® in Information Retrieval, 14(4), 289–444. https://doi.org/10.1561/1500000063
Sahiti, L. (2023). Models and Evaluation of User Simulation In Information Retrieval.
Sekuli?, I., Aliannejadi, M., & Crestani, F. (2022). Evaluating Mixed-initiative Conversational Search Systems via User Simulation. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 888–896. https://doi.org/10.1145/3488560.3498440
Wadhwa, S., & Zamani, H. (2021). Towards System-Initiative Conversational Information Seeking. 2nd International Conference on Design Simulation for Conversational Search. ACM Transactions on Intelligent Systems and Technology, 15(3), 1–22. https://doi.org/10.1145/3650041
Wadhwa, S., Zamani, H. (2021). Towards System-Initiative Conversational Information Seeking. 2nd International Conference on Design of Experimental Search & Information REtrieval Systems, September 15–18, 2021, Padua, Italy. http://ceur-ws.org
Wang, L., Zhang, J., Yang, H., Chen, Z.-Y., Tang, J., Zhang, Z., Chen, X., Lin, Y., Sun, H., Song, R.,
Zhao, X., Xu, J., Dou, Z., Wang, J., & Wen, J.-R. (2025). User Behavior Simulation with Large Language Model-based Agents. ACM Transactions on Information Systems, 43(2), 1–37. https://doi.org/10.1145/3708985
White, R. W., & Roth, R. A. (2009). Exploratory Search. Springer International Publishing. https://doi.org/10.1007/978-3-031-02260-9
Yang, G. H., Sloan, M., & Wang, J. (2016). Dynamic Information Retrieval Modeling. Synthesis
Lectures on Information Concepts, Retrieval, and Services, 8(3), 1–144. https://doi.org/10.2200/S00718ED1V01Y201605ICR049
Zhang, F., Liu, Y., Mao, J., Zhang, M., & Ma, S. (2020). User behavior modeling for Web search evaluation. AI Open, 1, 40–56. https://doi.org/10.1016/j.aiopen.2021.02.003
Zhang, Z., Liu, S., Liu, Z., Zhong, R., Cai, Q., Zhao, X., Zhang, C., Liu, Q., & Jiang, P. (2024). LLM-Powered User Simulator for Recommender System.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sindh University Research Journal - SURJ (Science Series)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


