An Atlas of Economic Activities in the UK: tapping into web archives for social science research
Project description
This is the website of our Smart Data Research UK project entitled An Atlas of Economic Activities in the UK: tapping into web archives for social science research.
This project uses the Web, one of the largest sources of smart data, to map economic activities in the UK at an unprecedented level of detail. Our tools and data products will allow for the continuous monitoring and mapping of economic activities. They can support policy makers to understand how economic activity evolves over time and in different places. Our project showcases the value of the Web as an untapped source of smart data and creates tools for the broader social science community to utilise these data.
For this project, we are developing the computational tools that are needed to utilise web data at scale from web archives and, specifically, the Common Crawl. By analysing self-descriptions of economic activities on business websites, we will produce typologies of economic activities that are rich in terms of content and their reach extends beyond small case studies. We will map and model the spatial footprints and the dynamics of economic activities in the UK. By geolocating and observing commercial websites over time we will expose the dynamics of economic activities: from stable industrial clusters to emerging economic activities and their geographies. It will also assess potential biases associated with archived web data. Just like non-digital archives, web archives do not archive everything - be it all public websites (archival extend) or all webpages within a website (archival depth).
Websites are archetypal smart data: they are born digital data positioned at the core of what we understand as the internet; they are geospatial as 70% of all websites contain some place reference; they are commercial and transactional since they capture information – often self-reported – about various entities, from individuals to firms and third sector organisations; and they are unstructured, containing textual and visual information, among other things. Despite the utility of web data for social science research, the usage of such rich and big textual data is hindered by a lack of easy-to-access data and relevant tools.
A Bristol example
This is an extract of the Common Crawl data from 2023 for the city of Bristol. It maps commercial websites (.co.uk), which contain one unique postcode from the Bristol area.
Team members
We are all based at the School of Geographical Sciences at the University of Bristol and the Quantitative Spatial Science research group.
Prof Emmanouil Tranos (Principal Investigator)