Multilingual Terminology Example

Feature: Using Words to Save Lives – Terminology and Technology

Words are valuable, powerful and useful. Great speeches have changed nations; printed newspapers have bankrupted moguls. It seems apt that words have become the cornerstone of 21st Century Technology, with search engines like Google creating vast algorithms that cleverly relate terms to one another or find information based on key-words.

Slándáil has been working on text analysis with a specific goal in mind. Terms that relate to natural disasters are key to managing text analysis in a disaster management system, and the long-term goal of Slándáil is to use specific terms found on social media to highlight potentially affected areas during a natural disaster.

Text Analytics: The difference between words and terms

A term is something that has meaning in a specific context. It can be made up of one word or several in the English language, but is always context-specific. For example, the word “flood” might signify a disaster event in the sentence “The roads have flooded near the river”, but has a different context in the sentence “The people flooded into the supermarket”. While it is simple for most English-speaking people to tell the difference, the key with text analytics for Slándáil is to teach a machine to see the same type of difference, and only focus on the more important information. Linguists at the University of Padua are working with technologists in Trinity College Dublin (Ireland) and Instituts für Angewandte Informatik (Germany) to try to compile a large structure of text (or corpora) that will assist the emergency management system that Slándáil are building.

An example of linked words from the Slándáil Terminology Wiki in visual graph form

An example of linked words from the Slándáil Terminology Wiki

The finished system will harvest and analyse text from digital and social media and produce messages for emergency managers that tell them where problems may be occurring, based on the information that they receive. As a result, it is important that the text analysis tools are trained to recognise the difference between a reference to a disaster and a regular post on news or social media.

Emergency managers have specific terms that they use for natural disasters. “Early warning system” or “natural hazards” may not come into regular speech but are equally important for emergency scanning as they are used in emergency systems. Other special grammars are currently being studied at Trinity College Dublin, including emoticons and social media-specific uses of language that will also form part of the overall corpus of terms for Slándáil.

Language Barriers

One of the most difficult aspects of text analysis is how to manage different languages. While machine-learning can manage a large amount through term analysis, it is far more difficult to manage different languages. One of the benefits of the Slándáil project is that it is working in three different languages: German, Italian and English. In order to facilitate this, researchers at the University of Padova have been working on a terminology wiki that defines disaster terms in three different languages.

Multilingual Terminology Example

Text analysis is being undertaken at Trinity College Dublin and Instituts für Angewandte Informatik, Leipzig. In both of these universities text has been harvested from news and online sources including social media in order to gain a better understanding of language and term use during a natural disaster. Technology partner CID (Germany) already have a software, Topic Analyst, that analyses trends in word use online, and this technology will be adapted to incorporate the text analytical tools that the universities are building to become a disaster-specific software.

Putting it All Together

Once a dictionary of terms has been collected it is still not useful to a digital system. The system needs to learn about associations between terms and words in order to be effective. Similar to how Google can suggest that the terms “pen” and “ink” may be associated, the Slándáil system needs to be able to relate terms like “earthquake” and “collapse” in order to be effective. For this, terms need to be laboriously tagged, and the system then needs to be trained to recognise the connections. The initial lexicon was completed in March 2015, however the term databases need to be regularly updated, and this is how the Terminology Wiki is being used.

The importance of text as a tool for social media analysis is highlighted in many projects, but few have focussed on specialising on emergency management, and it is hoped that the developments at partner institutions will provide useful tools for future text analysis in natural disasters.

Results from the terminology studies have been published at the Disontology Workshop, Vienna, July 2015 and will be presented at IDEAL 2015, Warsaw, October 15-16 and a terminology workshop, Dublin, on September 14 2015.

If you are interested in viewing the Wiki please make contact with the project. Further information on how these studies are being employed can be seen on the Disaster Newsletter page.

Feature: Building Relationships on Trust in Disasters

Trust in Evacuation Warnings

The role of trust is important in disaster management and social media. Joint research over the past twelve months between The University of Padua (Italy), Trinity College Dublin and Stillwater Communications (Ireland) has uncovered some important details about trust and its role in disaster communications.

When people are asked to evacuate from an area they are put in a position where they have to trust the authority of emergency managers who tell them that it is safer to leave their homes than to stay. While this might seem like an easy decision, the events of Hurricane Katrina highlighted a major issue when many people mistrusted the warnings that were given and chose to stay in their homes where they felt safer. This was, in part, due to exaggerated stories of looting and mistrust in government, where people felt it would be safer to stay in a familiar place than to relocate temporarily.

Trust in Who Delivers the Message

Graph of levels of trust from Ipsos Mori

Ipsos Mori’s survey on trust shows a low level of political trust compared to an increasing level of trust in experts. Click image for source

One recent poll (above) by Ipsos Mori (a UK research company that specialise in media and advertising) highlights the low level of trust in politicians and journalists when compared to experts. In evacuation situations, politicians often deliver messages from emergency managers, including warning messages and evacuation orders. Read more

Feature: The Intrusion Index for Digital Privacy

Researchers at Trinity College Dublin have been working on a system to improve technology that harvests data from social media by analysing how it may intrude upon individual privacy. The system, called an Intrusion Index, detects potentially private information in digital data so that this information can be deleted if necessary.

During a natural disaster there is a large volume of information shared on social media sites like Facebook and Twitter. Some of this information contains private data that could be used to identify individuals, although it is difficult to process all of this of data. Slándáil researchers have been looking at ways to better protect sensitive information, including encryption methods and anonymisation methods, and part of this includes a novel system that works on Named Entity Recognition.

A diagram showing how the intrusion index highlights named entities and then removes them when necessary.

By recognising named entities in the text, the system can then automatically remove these and log where they appear to avoid privacy issues.

Background

Work on the Intrusion Index began in 2014 for Slándáil, and progress has been ongoing in testing and development. The index searches online text for named entities including place-names and people’s names, and creates a log when this data is detected in social media text. The system is now being tested on social media data. Read more

Feature: Novel Way to Study Images by Ulster University

University of Ulster: A Novel Spiral Addressing Scheme for Rectangular Images

As part of the research for Slándáil, Ulster University have been working on a novel method for analysing images based on a system that they have developed. The goal of this is to make image analysis from social media more efficient, taking into account the large volume of images that are shared during a natural disaster (for example, Instagram reported 1.3 million pictures posted during Hurricane Sandy).

Background

Communication via accurate, complete and real-time information sharing is key to prepare, respond and recover in disaster management. Sharing visual content not only increases the credibility of the information, but also encourages social media user engagement.

For most existing web search platforms, such as Bing, Google, and Yahoo, search is based on context information, i.e., tags, time or location. Text-based search is fast and convenient, but the search results can be mismatched, less relevant, or duplicated due to web noise. Therefore, incorporating content-based analysis, such as image analytics, can improve the search quality. Read more