|Tesis Doctorales de la Universidad de Alcalá
|SIMULATION AND COMPARISON OF SYNDROMIC SURVEILLANCE FORECASTING WITH ARIMA MODELS AND DATA FROM SEARCH ENGINES AND SOCIAL MEDIA|
|Autor/a||Samaras , Loukas|
|Departamento||Ciencias de la Computación|
|Director/a||García Barriocanal, María Elena|
|Codirector/a||Sicilia Urbán, Miguel Ángel|
|Fecha de defensa||16/07/2021|
|Calificación||Sobresaliente Cum Laude|
|Programa||Comunicación, Información y Tecnología en la Sociedad en Red (RD 99/2011)|
|Resumen||From 1989, when Tim Berners-Lee invented the World Wide Web, the use of the Internet has growth at unprecedented rates, bringing with it new problems but also new opportunities. Web Science is concerned with the study of large-scale socio-technical systems, such as the World Wide Web, while syndromic surveillance is concerned with public health. In recent years, several researchers have proposed Web data as a valuable data source for monitoring epidemics based on two ideas or hypotheses. First, the idea that what is happening in the world has close connection with activity in the Web. People use the Internet, make web searches and interact with each other by using social media on various matters, among which is health. It has been found that if this web activity is collected, we can use it as a potential useful tool for monitoring and assessing health status among in populations. Second, if the first hypothesis is true, it means that we can construct rules, patterns and estimation models for the spread and the outbreak of an infectious disease in such a way, that it could lead to accurate and early prediction.
This PhD research shows how an Internet surveillance system can be developed and its usefulness, based on the Auto-Regression Integrated Moving Averages procedure (ARIMA) and data from the Web, particularly from Google Trends and Twitter. It reveals a novel approach in simulating the spread of epidemics by applying specific techniques in real-time detection and forecasting. The aim of this PhD has been that of applying the techniques in a systematic way to construct dynamic forecasting models from different sources of data, diseases and geographical contexts. The experiments were implemented in the context of three major infectious diseases: Influenza, scarlet fever and measles. The comparisons made for different diseases, countries and data sources investigate the feasibility and the effectiveness of the approach to detect and predict the spread of epidemics across different cases.
Results show that there is a strong and significant correlation between data from the Web and the actual spread of an infectious disease, which is consistent with previous studies. What it is more interesting, is that the detection of epidemics by using the ARIMA procedure can be realized for different diseases and different geographic locations. On the other hand, both Google and Twitter have the capacity to help in detecting epidemics by providing remarkable results.|