Increasing Rigor in Online Health Surveys Through the Reduction of Fraudulent Data.
Journal:
Journal of medical Internet research
Published Date:
Aug 21, 2025
Abstract
Online surveys have become a key tool of modern health research, offering a fast, cost-effective, and convenient means of data collection. It enables researchers to access diverse populations, such as those underrepresented in traditional studies, and facilitates the collection of stigmatized or sensitive behaviors through greater anonymity. However, the ease of participation also introduces significant challenges, particularly around data integrity and rigor. As fraudulent responses-whether from bots, repeat responders, or individuals misrepresenting themselves-become more sophisticated and pervasive, ensuring the rigor of online surveys has never been more crucial. This article provides a comprehensive synthesis of practical strategies that help to increase the rigor of online surveys through the detection and removal of fraudulent data. Drawing on recent literature and case studies, we outline several options that address the full research cycle from predata collection strategies to validation post data collection. We emphasize the integration of automated screening techniques (eg, CAPTCHAs and honeypot questions) and attention checks (eg, trap questions) for purposeful survey design. Robust recruitment procedures (eg, concealed eligibility criteria and 2-stage screening) and a proper incentive or compensation structure can also help to deter fraudulent participation. We examine the merits and limitations of different sampling methodologies, including river sampling, online panels, and crowdsourcing platforms, offering guidance on how to select samples based on specific research objectives. Post data collection, we discuss metadata-based techniques to detect fraudulent data (eg, duplicate email or IP addresses, response time analysis), alongside methods to better screen for low-quality responses (eg, inconsistent response patterns and improbable qualitative responses). The escalating sophistication of fraud tactics, particularly with the growth of artificial intelligence (AI), demands that researchers continuously adapt and stay vigilant. We propose the use of dynamic protocols, combining multiple strategies into a multipronged approach that can better filter for fraudulent data and evolve depending on the type of responses received across the data collection process. However, there is still significant room for strategies to develop, and it should be a key focus for upcoming research. As online surveys become increasingly integral to health research, investing in robust strategies to screen for fraudulent data and increasing the rigor of studies is key to upholding scientific integrity.