Uncertainty in archaeological predictive modelling

Conceptual modelling for archaeological databases: incorporating uncertainty in predictive modelling

My interdisciplinary PhD research has been centred, so far, in the development of archaeological predictive models for several Swiss regions. These models take environmental data (such as quality of soil or terrain features) plus georeferenced known sites as input and produce a quantitative estimation of the likelihood that relevant archaeological sites exist at a given location, by using a machine learning algorithm (random forest). One big problem with the input data is that it often comes from different sources and in very different formats, so its interoperability is extremely low. In addition, it sometimes contains vagueness indicators that are difficult to interpret and process, which makes it hard to reuse. For these reasons, I decided to invest some time and effort in improving data interoperability, so that a common structure could be used for different datasets from different sources. Similarly, I decided to investigate on how vagueness indicators could be standardised or improved so that data became easier to reuse.

I was fortunate enough to obtain a SEADDA (Saving European Archaeology from the Digital Dark Age, www.seadda.eu) scholarship for a stay at Incipit CSIC, between January and April 2020. This allowed me to work in two areas of the research. Firstly, I studied Incipit’s approach to handling uncertainty in archaeological data through conceptual modelling by using ConML (www.conml.org). ConML offers a qualitative description of uncertainty, so in collaboration with researchers from Incipit and the Department of Philosophy of the University of Santiago de Compostela, I have developed a quantification protocol that will allow me (or anybody in the future) to assign quantitative uncertainty values to qualitative linguistic labels (such as “unsure” or “unknown”) in a systematic manner. Once obtained, quantitative values can be algorithmically processed as part of a predictive model. Secondly, I have explored different statistical procedures that are better at using continuous data ranges such as those of uncertainty (which may fall anywhere in the [0,1] interval). To do this, I have liaised with data processing specialists at the Research Centre for Intelligent Technologies (CITIUS) of the same university, and we are currently working together towards better solutions.

Thanks to this stay, we have developed a clear approach to quantifying uncertainty in archaeological data, and we have explored additional algorithms to further validate the resulting predictive models. By incorporating quantitative uncertainty values, the archaeological datasets will become more reusable by third parties. In addition, the application of predictive models will discover patterns in existing data and suggest likely locations of sites so far unknown, thus generating new knowledge and establishing the basis for new research questions and project opportunities. In addition, this stay has allowed me to establish contacts with additional researchers in different areas (philosophy, computer science), and encouraged me to write project proposals for new venues.

Uncertainty in archaeological predictive modelling

Conceptual modelling for archaeological databases: incorporating uncertainty in predictive modelling

My interdisciplinary PhD research has been centred, so far, in the development of archaeological predictive models for several Swiss regions. These models take environmental data (such as quality of soil or terrain features) plus georeferenced known sites as input and produce a quantitative estimation of the likelihood that relevant archaeological sites exist at a given location, by using a machine learning algorithm (random forest). One big problem with the input data is that it often comes from different sources and in very different formats, so its interoperability is extremely low. In addition, it sometimes contains vagueness indicators that are difficult to interpret and process, which makes it hard to reuse. For these reasons, I decided to invest some time and effort in improving data interoperability, so that a common structure could be used for different datasets from different sources. Similarly, I decided to investigate on how vagueness indicators could be standardised or improved so that data became easier to reuse.

I was fortunate enough to obtain a SEADDA (Saving European Archaeology from the Digital Dark Age, www.seadda.eu) scholarship for a stay at Incipit CSIC, between January and April 2020. This allowed me to work in two areas of the research. Firstly, I studied Incipit’s approach to handling uncertainty in archaeological data through conceptual modelling by using ConML (www.conml.org). ConML offers a qualitative description of uncertainty, so in collaboration with researchers from Incipit and the Department of Philosophy of the University of Santiago de Compostela, I have developed a quantification protocol that will allow me (or anybody in the future) to assign quantitative uncertainty values to qualitative linguistic labels (such as “unsure” or “unknown”) in a systematic manner. Once obtained, quantitative values can be algorithmically processed as part of a predictive model. Secondly, I have explored different statistical procedures that are better at using continuous data ranges such as those of uncertainty (which may fall anywhere in the [0,1] interval). To do this, I have liaised with data processing specialists at the Research Centre for Intelligent Technologies (CITIUS) of the same university, and we are currently working together towards better solutions.

Thanks to this stay, we have developed a clear approach to quantifying uncertainty in archaeological data, and we have explored additional algorithms to further validate the resulting predictive models. By incorporating quantitative uncertainty values, the archaeological datasets will become more reusable by third parties. In addition, the application of predictive models will discover patterns in existing data and suggest likely locations of sites so far unknown, thus generating new knowledge and establishing the basis for new research questions and project opportunities. In addition, this stay has allowed me to establish contacts with additional researchers in different areas (philosophy, computer science), and encouraged me to write project proposals for new venues.