HighTech and Innovation

Within the field of Digital Humanities, a great effort has been made to digitize documents and collections in order to build catalogs and exhibitions on the Web. In this paper, we present WeME, a Web application for building a knowledge base, which can be used to describe digital documents. WeME can be used by different categories of users: archivists/librarians and scholars. WeME extracts information from some well-known Linked Data nodes, i.e. DBpedia and GeoNames, as well as traditional Web sources, i.e. VIAF. As a use case of WeME, we describe the knowledge base related to the Christopher Clavius’s corre spondence. Clavius was a mathematician and an astronomer of the XVI Century. He wrote more than 300 letters, most of which are owned by the Historical Archives of the Pontifical Gregorian University (APUG) in Rome. The built knowledge base contains 139 links to DBpedia, 83 links to GeoNames and 129 links to VIAF. In order to test the usability of WeME, we invited 26 users to test the application.


Introduction
Over the last years, a great effort has been made in the field of Digital Humanities to digitize documents and collections in different formats, such as PDF, XML, plain texts and images. All these documents are often stored either in digital libraries or big digital repositories in the form of books and catalogs (e.g. the Oxford Digital Library † , the Library of Congress ‡ , and the Perseus Digital Library § ). Sometimes, projects are developed to annotate a subset of texts and images, such as the Clavius on The Web project ** [1,2] where the idea behind the work presented in this paper originated. Other projects include the Digital Vercelli Book † † and Burckhardtsource ‡ ‡ .
The process of cataloging requires also the creation of a knowledge base, which contains contextual resources associated with documents of the catalog, such as the authors of the documents and places where documents were written. Information contained in the knowledge base can be used to enrich document details, i.e. metadata associated with documents. Most of the existing tools for catalog creation allow you to build the knowledge base manually, in the sense that the user must insert each piece of information (metadata) one by one. This process is often tedious, because it consists in editing well-known information about a document, such as the author's name and date of birth. In addition, this process is repetitive, because many documents are written by the same author and in the same place thus requiring to write the same information twice or more. In general this manual effort produces three main disadvantages: a) the probability of introducing errors increases, b) the whole process is slowed down because it is not automatic, c) inserted information is isolated, i.e. not connected to the rest of the Web. Involving users as co-creators of metadata could be a possible solution to the described problems [3].
In this paper we present the Web Metadata Editor (WeME), a Web application which provides users with a userfriendly interface to build a knowledge base associated with a collection. WeME helps archivists to enrich their catalogs with resources extracted from two kinds of Web sources: Linked Data [4] and traditional Web sources. The use of Linked Data permits also the creation of semantic resources, which seems to be the best solution for information preservation [5]. WeME mitigates the three described disadvantages, produced by manual effort, by extracting well-known metadata from some Linked Data nodes (e.g. DBpedia * [6] GeoNames † ) and other traditional Web sources (VIAF ‡ ). WeME exploits semantic and traditional Web to extract information, through the construction of SPARQL [7] and RESTful APIs queries to the Web, in a way totally transparent to the user. In fact, in the Web interface, the user must specify only the name of the resource to be searched. WeME then retrieves information from the Web and shows them to the user, who can decide whether or not to accept, edit or discard them. Through this automatic search of metadata, the process of metadata insertion is accelerated and the probability of introducing errors is reduced. The advantages derived from WeME are essentially two: firstly WeME eases the task of building a knowledge base; secondly, WeME establishes new relations both among documents within the same catalog and with documents belonging to Web sources.
WeME was used to build the knowledge base related to the Christoper Clavius correspondence. Clavius wrote and received more than 300 letters to and from other scientists of the same period. Among them, Galileo Galilei and Tycho Brahe. Most of these letters are hosted by APUG. Around this correspondence, the Clavius on the Web project (CoW) was started in 2013 and lasted four years.
To test the usability of WeME a questionnaire was prepared, in order to understand the level of interest in the project and the degree of appreciation of the application. Out of a sample of 26 interviewed, 5 found it excellent (rating 5/5), 13 judged it very useful (rating 4/5), 6 defined it as a good tool (rating 3/5) and only 1 found it useless (judgment 1/5). The interviewees were mostly IT experts, researchers in the field of Digital Humanities or users with archival skills.
The remainder of the paper is organized as follows: Section 2 illustrates some related work. In Section 3 we describe the approach employed in this paper, while Section 4 illustrates the Web application. In Section 5 we describe the use case and in Section 6 we illustrate the Usability Test. Finally, in Section 7 we describe conclusions and future work.

Related Works
In this section firstly we review the current literature on tools and projects which exploit Linked Data to build knowledge bases and then we briefly illustrate some tools for cataloging.
DaCura [8] is a framework which provides tools to collect and curate high quality linked datasets. DaCura is not thought for digital libraries or digital repositories. However, it covers more aspects that are important in the context of digital humanities, such as data provenance, data quality, etc. Another important initiative is the CULTURA project [9], which develops a metadata-driven personalization environment to navigate collections. In addition, it supports different categories of users, such as professional researchers and simple users. A more recent initiative is the FREME project § developed by the group behind DBpedia. FREME provides an interactive editor to identify and annotate entities in texts in an interactive editor. Users are even able to manage the entities discovered. The FREME tool suite furthermore discovers people, places and events. Gonzalez-Toral et al. (2019) [10] proposed a strategy to enrich a digital repository through the combination of the OAI-PMH protocol and Linked Data.
With respect to the existing tools, frameworks and projects, WeME provides a simple Web application, which does not require any specific skill. In fact, WeME can be used by any kind of user, e.g. scholars and archivists/librarians, as well as students. In addition, WeME can be easily installed and run within a Web server, without any specific * http://dbpedia.org † http://www.geonames.org ‡ http://viaf.org § http://www.freme-project.eu/ configuration. Finally, its source code can be downloaded as open source from the GitHub platform, as described later in the paper.

Tools for Cataloging
Many software tools have emerged recently, making it possible to catalog and manage digital collections. Among proprietary tools, the most famous is CONTENTdm * , created by OCLC. CONTENTdm is a digital collection management tool that permits to upload, describe, manage and access digital collections. It is a very powerful tool with an easy-to-use interface. However, its cost is prohibitive for many no profit organizations, i.e. entry level license options start at $4,300 annually.
Open-source software tools include: Omeka † , which provides a unified application for the Web interface and backend cataloging system; Collective Access ‡ , whose main focus is on cataloging and multiple metadata schemas; CollectionSpace § which does not permit to create digital collections, but it enables users to connect with other existing open-source applications; Open Exhibits ** a multitouch, multi-user tool, whose main aim is to develop online and interactive exhibits of collections; DSpace † † and Fedora ‡ ‡ , which are the most used tools to build and manage digital repositories [11].
Existing tools provide very powerful interfaces to add, edit or delete metadata associated with digital documents, but all this information must be edited by the user, manually. As Alessandra Moi highlighted in her paper, cataloguing tools have been living a transition period, where there is not a complete awareness regarding the importance of Linked Data to enrich collections [12]. Following Moi's suggestions, the tool described in this paper exploits Web sources (Linked Data and RESTful APIs) to retrieve contextual resources automatically.

Approach
The core idea of this work consists in building a knowledge base which contains contextual resources connected to the documents of a collection, such as the authors and places of the documents. Allowed resources are a subset of the Europeana Data Model (EDM) [13] ontology: person, place and cultural heritage object (CHO). We choose EDM to represent our data because it defines relations among resources in a very efficient way: a CHO is related to a person, if the person is its author, as well as a place is related to a CHO, if the CHO was created in that place.
Every resource can be built through a simple Web interface, which gives the possibility to edit resources manually or by invoking Linked Data and Web RESTful APIs. The user formulates a simple query, based on the pair (name, surname) for people, and (name) for places. The application triggers a call to some remote Web services (e.g. DBpedia, VIAF and GeoNames) to retrieve information associated with the resource, such as the birth place and a description. The user is then free to accept, edit or discard retrieved information and save them to the knowledge base. Then, the user can view, edit and organize her resources.
One of the main issues while dealing with different sources regards resource disambiguation. In fact, it can happen that there is a conflict on a given field (e.g. birth date) between two or more sources. Currently, WeME leaves the user the task of performing resource disambiguation. However, as future work, we could organize the sources into a hierarchy of importance (i.e. associate a score to each source). If a field is found in more than one source, the system could suggest to the user the field provided by the source with the highest priority.
Another aspect of WeME concerns the fact that the built knowledge base is completely self-contained, while still maintaining links to external sources. Another possible approach could consist in updating existing sources, such as DBpedia and GeoNames. However, we preferred to follow the self-contained strategy essentially for four reasons: a) users are able to claim their authorship on their work (i.e. towards academy or funding agencies), b) users can keep control of updates that could break their work, c) avoid delays and blockage in updating data due to validation processes, d) needed resources are too contextual to the dataset and not of sufficient general interest to be accepted in an encyclopedic knowledge database.

WeME
The Web Metadata Editor (WeME) provides a Web editor to build a knowledge base, which contains contextual resources, related to digital documents. The application is envisaged for archivists/librarians, but in general it can be used by scholars, students and other people who want to build a knowledge base and connect it to the Web. Figure 1 shows the flowchart of WeME, which is composed of three modules: the WeME editor, the search engine and the knowledge base. Users insert new resources and their related metadata in the WeME editor. When inserting a new resource, users can exploit the WeME search engine to search for additional information regarding a person or a place. Finally, users can save the created resources into the WeME knowledge base, for further visualisations.

Users
Although WeME can be used by various stakeholders, a distinction should be done between librarians, archivists and scholars [14]. From the point of view of WeME, librarians and archivists can be grouped in the same category. They own very specific skills to create a knowledge base for a collection of documents. Their main interest is capturing all reusable and relevant metadata to facilitate discovery, classification, exploration of catalogs. Scholars, instead, are concerned with compiling a knowledge base for answering their research questions. On the one hand, archivists/librarians may have an expertise in a specific field, for instance history, that facilitate their task. Scholars, on the other hand, do not necessarily have this specific background. WeME tries to satisfy the needs of both archivists/librarians and scholars. From the point of view of archivists/librarians, WeME exploits Linked Data to cap ture common metadata, shared by different resources thus allowing resource reuse and common metadata classification. Regarding scholars, WeME provides a mechanism to link resources both to external sources, such as GeoNames and DBpedia and to internal sources, such as places and people within the same knowledge base. Given these relations, a scholar could execute some reasoning tools to extract new information. At the moment, WeME does not implement reasoning mechanisms. Anyway, it would be interesting to extend it to also provide this feature. WeME differs from the strategy adopted in Debruyne et al. (2016) [14] study, where two different knowledge bases are built, one for archivists/librarians and the other for scholars. In WeME, instead, only one knowledge base is built to satisfy both needs. In this way, the application is kept simple and there is no replication of information. Figure 2 shows a snapshot of the interface. We defined a layout composed of three views:  Person box: the editor gives the possibility to add/edit a new person, by specifying the following fields: name, surname, birth date, birth place, death date, death place, image link, Wikipedia link, VIAF link. There is also a checkbox still alive, which allows to specify whether the person is or not still alive. The user can edit all the fields, manually, or she can select the check with DBpedia/check with VIAF buttons, to populate, if available, the fields from DBpedia/VIAF. When the information is ready, the user can click the send button, to store the person in the knowledge base. If the person is already present in the knowledge base, the editor gives an alert.

Layout
 Place box: the editor provides a form to add/edit a new place, by specifying the following fields: original name, English name, country, region, population, latitude, longitude, description, image link, Wikipedia link and GeoNames link. The user can edit all fields manually or she/he can use the button check with DBpedia/check with GeoNames, as specified in the case of the add person box.
 CHO box: the editor allows the user to add a new cultural heritage object, such as a letter, a painting and so on, by specifying the following fields: original title, English title, author, creation date, issue date, type (text, video, sound, image, 3D), language, description, image link and Wikipedia link. All these fields, which follow the ontology defined by the Europeana Data Model, should be added by the user manually.

Use Case
WeME was used within the Clavius on the Web project, to help the construction of the knowledge base associated with Christopher Clavius's correspondence. Christopher Clavius (1538-1612) was a Jesuit mathematician and astronomer and one of the most important characters in the scientific scene of the late 16th century. These manuscripts consist of two volumes of correspondence (about 330 letters) and seven volumes of works, some of which were printed in those years and some still unpublished. The importance of the correspondence becomes clear just looking at the authors of the letters: Galileo Galilei, Tycho Brahe, Joseph Scaliger, Guido Ubaldo Dal Monte and many others. The Clavius on the Web project (CoW) aimed at digitizing, annotating, enriching, exporting all this heritage to the Web and linking it to similar Web resources. One of the parts of the CoW project was the creation of a knowledge base of all people and places associated with the context of letters, such as people who wrote the letters and places where the letters were written. The idea was to link the APUG historical heritage to Web resources already contained on the Web, such as DBpedia and Wikipedia.
The Clavius knowledge base is composed of three main classes: person, place and cultural heritage object (CHO). A person is a historical character who wrote a letter to Cristopher Clavius; a place is a location where a letter was written; a CHO corresponds to a physical letter sent to Christopher Clavius from one of the people described before. Some persons had a related page in DBpedia or VIAF, thus WeME retrieved their related information. Other persons, instead, such as Ilario Altobelli, were not present in DBpedia, thus they were added to the knowledge base manually.
The same was done for places. Table 1 resumes how many people and places were added to the knowledge base and how many links we found.

Usability Test
To test the usability of WeME, a questionnaire was prepared to guide users in the use of the various functions of the application: various usage scenarios were set, in order to verify the efficiency of the various functions. The proposed test had a twofold objective: to understand the degree of appreciation of the application by users and to obtain suggestions for its improvement.
The questionnaire was forwarded to various mailing lists relating to the issues of cultural heritage. In total, 26 users participated, of which 61.5% men and the remaining women. Figures 3 and 4 show the age distribution of users and their skills respectively. As can be seen from Figure 4, 38.5% of users are experts in the IT sector, another 38.5% are a researcher in the field of Digital Humanities, while only 3% have archival skills. However, of all users, only 69.2% showed a clear interest in the cultural heritage sector (see Figure 5).  The test was organized in the execution of the following activities, detailed in

T1
Manual account creation within the application a) How difficult was it to create the account? a) Scale from "very difficult" to "very easy"

T2
Creating a new collection a) How difficult was it to create a collection? a) Scale from "very difficult" to "very easy" Adding items to the collection (a person, a place and a CHO) a) How difficult was it to add items to a collection? a) Scale from "very difficult" to "very easy" Exporting the collection in CSV a) How difficult was it to export a collection? a) Scale from "very difficult" to "very easy"

T3
Adding resources to the Database by searching with DBpedia, VIAF and GeoNames (one person, one place and one CHO) For each category: a) What resource did you add? (optional) b) Did you find the data using DBpedia? c) Did you find the data using VIAF? d) Did you find the data using GeoNames?
a) Short answer b, c and d) Multiple choice between "yes", "no" and "partly" e) How difficult was it to add items to the Database? e) Scale from "very difficult" to "very easy" Viewing the resources added previously a) How difficult was it to search for the resources? b) Do you think the knowledge base is well organized? a) Scale from "very difficult" to "very easy" b) Scale from "very confusing" to "very clear"

T4
General considerations on WeME a) How difficult was it to navigate in WeME? b) Suggestions to improve navigation (optional) a) Scale from "very difficult" to "very easy" b) Open answer c) What do you think of the WeME graphics? d) Suggestions for improving the graphics (optional) c) Scale from "poor" to "excellent" d) Open answer e) Do you think the collections are well organized? e) Multiple choice between "yes", "no" and "maybe" f) Overall judgment on the application f) Scale from 1 to 5

Account Creation
Users have been asked to create their own account in the application, using the appropriate menu. Once created, they were asked to log in. In general, there were no difficulties in the procedure: about 75% of users, in fact, found the process very easy or easy, another group considered it of medium difficulty, while only one user encountered complications (see Figure 6). Figure 6. Difficulty of the account creation procedure

Management of a Collection
Users have been asked to test the various functions for managing a collection, i.e. creation, insertion and export. As seen in Figure 7, the creation procedure did not create obvious difficulties, and for this reason more than 80% of users defined it as easy or very easy.

Figure 7. Difficulty in creating a collection
The insertion of resources into the collection and the export of the same (see Figures 8 and 9), on the other hand, led to more problems: more than half of the people found the procedure easy or very easy, but more users found it difficult or very difficult.

Management of New Resources
Search and insertion. Users have been asked to insert resources in the Database trying to retrieve data from DBpedia, VIAF and GeoNames. In particular, it was requested to add a person, a place and a CHO that were related to each other (for example "Dante Alighieri", "Florence", "The Divine Comedy"), in order to view the link between the various records. In the end, they were asked to evaluate the complexity of the whole procedure: as seen in Figure 10, almost 70% of users found the operation easy or very easy, while the remainder encountered technical problems, which have been specified in the tips section.  Figure 11, more than 50% of users were able to obtain information through DBpedia, and a good part of the other users obtained at least partial information. Research using VIAF, on the other hand, proved to be less fruitful, showing a more equitable division between people who obtained information and others who did not (see Figure 12).  Figure 13, in fact, half of the users failed to retrieve information. However, the search using GeoNames worked very well, allowing data to be retrieved in 70% of cases (see Figure 14). Searching and inserting CHOs. This phase does not involve automatic searches, so users were simply asked to manually enter a CHO, so that the person added in the beginning was the author. Among the added resources we can mention "Don Quijote de la Mancha", "Ossi di Seppia", "IT", etc.
Visualization. Once the insertion procedure was completed, users were asked to search the knowledge base for the newly added resources, to then evaluate the difficulty of the process and give an opinion on the organization of the knowledge base. The research generated mixed opinions, and it is the topic for which the most suggestions were made: about 50% of people found the process easy or very easy, while the rest of users encountered problems or suggested improvements. (See Figure 15). Despite this, as can be seen from Figure 16, most people found the organization of information clear, which did not create significant complications. In the last phase of the test, users were asked to express general judgments on the application, in particular with regard to navigation, graphics and organization of documents. In addition, optional fields have been included in which you can enter any type of suggestion, from reporting problems to proposing new features. This last section highlighted the presence of some bugs within the code, which sometimes prevented some users from completing the required procedure.
As for navigation, more than 50% of users were satisfied (See Figure 17). However, a substantial number of people have reported complications of various kinds, and have communicated suggestions for resolving them. The graphical setting of the application, as seen in Figure 18, was much appreciated: only 7 users defined it as mediocre or poor, while the rest of the people expressed a positive opinion. In general, there was a need to make the application more responsive and adaptable to devices of different sizes. The organization of the collections was very satisfactory, since no users expressed negative judgments. Some people have made suggestions for improving its effectiveness, but have not highlighted any problems whatsoever ( Figure 19). Table 3 summarizes the set of suggestions that have been provided by users, for the improvement of graphics, navigation and organization. In general, there was a need to correct some errors and make the platform more intuitive for each type of user. In addition, more specific suggestions were made for the improvement and extension of the tested functions. Improve the clarity of the "Home" button Improve navigation via browser using the "Back" button Give the possibility to search for a resource without specifying the class Start the search for a resource in the Database by pressing the "Enter" key

Figure 19. Opinion on the organization of documents
Search the Database even with incorrect strings, for example "montale" instead of "Eugenio Montale Show a popup warning when the search does not give results

Management of documents
Give the possibility to modify a collection managing several resources at the same time The last question asked to express an overall opinion on WeME, evaluating the application on a scale of 1 to 5 (see Figure 20). Many users gave a positive opinion (4 or 5), expressing interest in the potential of the project. Other people have chosen an intermediate judgment (3), pointing out the presence of some problems which, however, have not discouraged from considering WeME a tool with great potential for improvement.

Conclusion and Future Work
In this paper we have illustrated WeME, a user-friendly Web editor of metadata based on semantic Web technologies, whose main design goal is to help archivists and scholars enter metadata of cultural-heritage objects while building a catalog. In addition, we have described the Clavius' knowledge base, which was built around the Clavius's correspondence of about 300 letters. The test procedure that was carried out, in which 26 users took part, confirmed the usefulness and potential of the platform: the project stimulated the interest of users involved in research, archiving and cataloging, who underlined the need for a tool capable of performing the functions implemented in the application, and for this they have welcomed it.
As future work, we are planning to extend WeME with the following features: a) exporting the knowledge base in different formats, i.e. RDF, XML and CSV; b) managing different ontologies, such as bibo * ; c) supporting other classes, such as events. In addition we are planning to start a campaign among different categories of users to test the accessibility and usability of the interface, as well as the quality of the produced information. Finally, we are going to make WeME more configurable, thus it will be simple to customize it to deal with different scenarios, datasets, and criteria to match named entities with Linked Data objects.