Adam Matthew partners with libraries and archives around the world to digitize historical documents and primary sources for the humanities and social sciences. These digital documents are made available through a series of online databases, which together contain millions of pages of original content. UCLA have access to all of these databases.
While developing these collections, the editorial team at Adam Matthew create metadata for each document. This records details such as when it was produced, who the author was, and what topics get mentioned in it. All printed items are also run through an OCR (Optical Character Recognition) program, which produces a searchable text-file version of the writing.
The data can be made available free of charge. If you are interested in initialing such a project, please contact a librarian in Scholarly Communications and Licensing to begin the process.
There are two main options for accessing the data:
Both require a form to be filled in, which a librarian in Scholarly Communication and Licensing will complete. This is for data protection purposes, as Adam Matthew have a responsibility to the source archives to let them know how the data is being stored. Once you have a research project in mind, and know which collection’s data you wish to use, you can request a form by emailing a librarian in Scholarly Communication and Licensing. Simply list the collections you wish to use, and your preferred access method, in the email and the appropriate form will be sent to you.
Once the form has been completed, the timeline for receiving approval and the data varies depending on the amount of data requested and the method of delivery required. A request for metadata is quicker to turn around than a request for full-text data. Access via API is quicker to set up than FTP.
As a general rule, Adam Matthew will try to provide all data requested via an API within one week of receiving the completed data mining form. Data requested via FTP has a similar turnaround time, but it will depend on the amount of data, especially if the request includes full-text data.
Many projects and research tools were created using Adam Matthews content and TDM. Some of these were created by the developers at Adam Matthew, while others were part of academic research projects. In order to provide an idea of the potential for this sort of work, let's take a look at the example below.
This shows that George Germain appears in 3,868 documents (number underlined in dotted blue). The person who most often appears alongside him is Henry Clinton, who appears in 1,011 of those documents (circled in red). Clinton himself appears in 1,586 documents in total (number in the green box), meaning 63.7% of his appearances are alongside George Germain. The darker the purple shading (e.g. number in the green box), the higher the percentage of documents in which the people co-occur.
This is the type of association analysis that can be produced from full-text data. Projects like this helps to raise and answer questions about how and why different people were connected to each other. Of course, this is just one example of how TDM can work.