PALP will design and implement its database using the principles of Linked Open Data (LOD). The primary inspiration PALP takes from LOD will be the creation of publically available and stable URI-based identifiers for all entities at Pompeii that PALP will describe. These identifiers will provide access to both human-readable and machine-actionable versions of PALP data.
Design and information architecture
The Pompeii Artistic Landscape project is a Linked Open Data initiative from its original conception through to its implementation and long-term sustainability. This means that it focuses on creating unique identifiers for entities within Pompeii, uses standardized vocabularies adapted to Pompeii to describe those entities and capture their spatial extent, and then uses standardized vocabularies to express the spatial and hierarchical relations between entities. In particular, PALP intends to define templates for entity types that make use of existing LOD vocabularies and project specific vocabularies.
Whenever possible, however, PALP will use standard LOD vocabularies. Every entity in PALP will have an RDFS type using classes – such as “Pompeian House” or “Pompeian Region” – that we define. The essential hierarchical vocabulary item that we will use is Dublin Core “Is Part Of”. Every wall painting will be said to be part of a wall, which will be said to be part of a room, which is part of a building, which is part of an insula, which is part of a region. To capture that rooms are next to each other or that buildings face onto streets, PALP will use the properties defined in the Open Geospatial Consortium’s GeoSPARQL standard, which defines, for example, relationships such as “touches”, “crosses” and “intersects”. The descriptive vocabularies that PALP uses will depend on the entity being addressed. When possible we will use VRA Core to describe artworks. That vocabulary can easily indicate that a wall painting depicts Hercules, for whom there will also be an entity in PALP. PALP will create its own property to indicate that a painting is, for example, of the “Fourth Style”. We will do this by creating both the vocabulary item “Has Pompeian Style” and an entity for each style within PALP itself. These PALP vocabulary items and entities will be defined in a PALP namespace; when possible PALP specific properties will be formally specified as sub-properties of widely adopted terms. PALP will also link its identifiers to existing LOD resources such as those published by the Getty (e.g., “Fourth Style“). While there does not happen to be an individual WikiData identifier for each Pompeian style or a Pleiades identifier for each building, we will link PALP entities to WikiData and the Pleiades Gazetteer whenever feasible. For other aspects – such as room area, wall height, etc. – we will look for existing vocabularies and only define properties in the PALP namespace when that will support more robust searching and a better user experience.
Every spatial entity in PALP will be associated with a GeoJSON string that allows it tobe displayed in our interface. As noted, PALP will use the existing data of the Pompeii Bibliography and Mapping Project as the source for these spatial representations. The timeline of the project includes work on converting the monolithic shapefiles into GeoJSON that provides a spatial representation for individual entities. This process will lead to representations for all Pompeian regions, insulae, buildings, rooms, and walls at a minimum. As necessary, we expect also that PALP will extend to include a spatial representation of any relevant feature at the site. Our interface will also be the public front-end for browsing and searching PALP. For many searches, our linked data will allow effective discovery and use of PALP resources. This is particularly the case for keyword searching and navigation. For navigation, PALP will also develop a hierarchical browser that will display the context of any entity as well as show what entities it contains.
Access and dissemination
The ultimate digital product of PALP is an online resource that makes publicly available a collection of atomized yet interdependent descriptions of Pompeii’s architectural and artistic landscapes. As such, sharing these data lies at the core of our efforts. Our primary digital products will be an online database and multimodal user interface, as well as an open-licensed RDF-formatted dataset (and other derived formats) that will be available through our open and institutional repositories. To be clear, disseminating our data is not an additional task, but a primary outcome of our proposed activities. We will disseminate these data as well as any software developed to process and to serve them under open access licenses. Throughout the grant period, we will regularly upload the latest version of the PALP data to GitHub. These data will remain available after completion. All PALP data will be placed in the institutional repositories of both collaborator’s institutions. We will also publish all the data produced by the end of the funded period in http://zenodo.org and any other repositories that seem to have a wide audience among archaeologists and art historians. Finally, to ensure scholars know about PALP and to get their feedback on its design, we will apply to present our preliminary results at the annual meetings Archaeological Institute of America (Jan. 2023), the College Art Association (Feb. 2023), and host a workshop at UMass (Oct. 2021).
Source data management
Although PALP will create large volumes of data, they will be generated in very simple data formats. Such simplicity will facilitate the management of those data over the duration of the project and will form the basis of its sustainability (see above). We envision producing the following varieties of data in linked open data formats (1-5) as well as additional forms of data and code used in the architectural and art historical analyses (6-7): 1. text-based geographical data; 2. hierarchical data among those geographical data; 3. descriptive data of art and architectural elements; 4. hierarchical data among art and architectural elements; 5. control vocabularies / authority list; 6. workflow templates; 7. design and code of the user interface and search portals. To generate and to manage these data will require the effort of a number of trained, undergraduate students (at least 5 and up to 10) and the expertise of the project directors and consultants to supervise the project and to develop its components. Keeping this large, if straightforward, project running smoothly will require constant attention to procedures and to benchmarks. As we are using academic labor, the academic semester provides a convenient unit for maintaining regular processes (e.g., hiring and training of students), measuring overall progress (e.g., describing at least three artworks per student hour), and disseminating general progress (e.g., writing one or two process blog posts). Each of the project’s six semesters will have its own goals, which are described in the project timeline, below.
Project sustainability
The sustainability of PALP ensured by three factors: 1) very strong institutional supportfor and investment in the project; 2) the use of standards-based data formats and vocabularies that will make our information attractive to other projects; and 3) the dissemination of our data and platforms under a Creative Commons license. To the first point, in addition to promise of collaboration and support, the institutional support for PALP’s sister project at the University of Massachusetts Amherst over the last six years, the PBMP, has been exceptionally strong. Beginning with a seed grant in 2011, UMass has shared personnel and expertise as well as its GIS infrastructure and technical support during both the start-up phase of the PBMP and through the current maturation. Second, PALP’s sustainability is aided by the simplicity of our data formats and the openness of their dissemination. By implementing LOD practices, we avoid locking our spatial data up into commercial formats or our observational data into database softwares that will be increasingly difficulty to access, even in the near future.