Words are valuable, powerful and useful. Great speeches have changed nations; printed newspapers have bankrupted moguls. It seems apt that words have become the cornerstone of 21st Century Technology, with search engines like Google creating vast algorithms that cleverly relate terms to one another or find information based on key-words.
Slándáil has been working on text analysis with a specific goal in mind. Terms that relate to natural disasters are key to managing text analysis in a disaster management system, and the long-term goal of Slándáil is to use specific terms found on social media to highlight potentially affected areas during a natural disaster.
Text Analytics: The difference between words and terms
A term is something that has meaning in a specific context. It can be made up of one word or several in the English language, but is always context-specific. For example, the word “flood” might signify a disaster event in the sentence “The roads have flooded near the river”, but has a different context in the sentence “The people flooded into the supermarket”. While it is simple for most English-speaking people to tell the difference, the key with text analytics for Slándáil is to teach a machine to see the same type of difference, and only focus on the more important information. Linguists at the University of Padua are working with technologists in Trinity College Dublin (Ireland) and Instituts für Angewandte Informatik (Germany) to try to compile a large structure of text (or corpora) that will assist the emergency management system that Slándáil are building.
The finished system will harvest and analyse text from digital and social media and produce messages for emergency managers that tell them where problems may be occurring, based on the information that they receive. As a result, it is important that the text analysis tools are trained to recognise the difference between a reference to a disaster and a regular post on news or social media.
Emergency managers have specific terms that they use for natural disasters. “Early warning system” or “natural hazards” may not come into regular speech but are equally important for emergency scanning as they are used in emergency systems. Other special grammars are currently being studied at Trinity College Dublin, including emoticons and social media-specific uses of language that will also form part of the overall corpus of terms for Slándáil.
One of the most difficult aspects of text analysis is how to manage different languages. While machine-learning can manage a large amount through term analysis, it is far more difficult to manage different languages. One of the benefits of the Slándáil project is that it is working in three different languages: German, Italian and English. In order to facilitate this, researchers at the University of Padova have been working on a terminology wiki that defines disaster terms in three different languages.
Text analysis is being undertaken at Trinity College Dublin and Instituts für Angewandte Informatik, Leipzig. In both of these universities text has been harvested from news and online sources including social media in order to gain a better understanding of language and term use during a natural disaster. Technology partner CID (Germany) already have a software, Topic Analyst, that analyses trends in word use online, and this technology will be adapted to incorporate the text analytical tools that the universities are building to become a disaster-specific software.
Putting it All Together
Once a dictionary of terms has been collected it is still not useful to a digital system. The system needs to learn about associations between terms and words in order to be effective. Similar to how Google can suggest that the terms “pen” and “ink” may be associated, the Slándáil system needs to be able to relate terms like “earthquake” and “collapse” in order to be effective. For this, terms need to be laboriously tagged, and the system then needs to be trained to recognise the connections. The initial lexicon was completed in March 2015, however the term databases need to be regularly updated, and this is how the Terminology Wiki is being used.
The importance of text as a tool for social media analysis is highlighted in many projects, but few have focussed on specialising on emergency management, and it is hoped that the developments at partner institutions will provide useful tools for future text analysis in natural disasters.
Results from the terminology studies have been published at the Disontology Workshop, Vienna, July 2015 and will be presented at IDEAL 2015, Warsaw, October 15-16 and a terminology workshop, Dublin, on September 14 2015.
If you are interested in viewing the Wiki please make contact with the project. Further information on how these studies are being employed can be seen on the Disaster Newsletter page.