The Big Picture – Multimedia Ontologies and MPEG-7 (part 1 of 2)

8 min read
Multimedia content are complex objects, © James Nash -
Multimedia content are complex objects, with sounds, audio, subtitles, moving and ever changing images. © James Nash –

As more and more multimedia content is digitalised and added to the Web or Digital Libraries, the need for ontologies to connect the meaning and relationships of the pictured objects increases. But how are we to connect the dots, when the dots are difficult to see, let alone interpret. This can however be overcome by ontology matching, a “bridging of the semantic gap by matching a multimedia ontology against a common-sense knowledge resource” (James, Todorov & Hudelot, 2005).

Multimedia content are complex objects, with sounds, audio, subtitles, moving and ever changing images. Which language are the actors speaking, are they speaking the same language, and are the subtitles in yet another tongue? Is what’s being said relevant or just background chatter?

MPEG-7 is the defacto container for ontologies, and mapping and conversion between ontologies and various syntax is important. It is obvious that there are many challenges when it comes to creating ontologies for multimedia, with the wealth of competing metadata formats and standards, and heterogeneous ontologies being the largest hurdles to overcome. As the amount of multimedia grows, so does the need for a fix-all solution.

Creating yet another standard to bridge the semantic gap, will not solve the problems of annotating multimedia content, since the content quickly becomes too complex and the tools to extract and reason the ontologies vary in quality. The interoperability issues will continue to exist and so will the problem of which levels of granularity and abstraction used to describe the multimedia content is best. Automated low-level interpretation of what’s going on in the media file has become easier to annotate and decipher by machines, but high level descriptions still remain a challenge. How do we define instances of relevance? Read on for an overview of multimedia ontologies that use the MPEG-7 standard.

What are Multimedia Ontologies

 Russion Dolls
Ontologies and multimedia are like russian dolls with seemingly identical containers containing various levels of metadata, mirroring each other. © James Lee –

Ontologies and multimedia are like russian dolls stuck in a kaleidoscope. Seemingly identical containers containing various levels of metadata, mirroring each other. When working with ontologies and multimedia, the object itself is not the only thing referenced, but, also, the content itself needs description. The “aboutness” of a file is embedded in the metadata, however, problems on several levels are likely to arise, since computers don’t see, they read, making automatic creation of ontologies difficult.

When it comes to semantic interpretation, the ontology is in the eye of the beholder. And the bridge crossing the semantic gap of multimedia has a lot of eyes starring at it trying to fix the problems. Werner Bailer (Bailer, 2011) points out three main problems:

  • Integrating Different Standards
  • Lack of Formal Semantics
  • Deployment of Multimedia Metadata

Why MPEG-7 looks like a winner

MPEG-7 (also known as the Multimedia Content Description Interface) is a ISO/IEC standard. Developed in 2002 by the Moving Picture Experts Group (MPEG) as a tool to deal with not only metadata but also the description of structural and semantic content. MPEG-7 defines multimedia Descriptors (Ds), Description Schemes (DSs) and their relationships. The Description Schemes group the Descriptors (visual, texture, camera motion, audio, actors, places, semantics). For low-level descriptors, the annotation is often done automatically and for high-level descriptors, the annotation is manual. The Description Definition Language (DDL) ties the knot and forms the core part of the standard. DDL is written using XML Schema (Troncy, Celma, Little, García & Tsinaraki, 2007).

What MPEG-7 solves

The MPEG-7 standard describes low level features e.g. texture, camera motion or audio/melody, while the description schemas are metadata structures for capturing and annotating audio-visual content in a more abstract way with descriptors using Description Definition Language (DDL). MPEG-7 is defined as an XML schema with defines 1182 elements, 417 attributes and 377 complex types. Without any formal semantics this can cause interoperability issues when extracting or entering data.

What MPEG-7 doesn’t solve

Unlike domain ontology objects, multimedia objects often feature juxtaposed items present at the same time who’s complex relationships are to be mapped. As such, missing semantic descriptions of concepts appearing in multimedia objects can result in ambiguous and inconsistent descriptions, one of the main hurdles for MPEG-7.

Interoperability issues

Complexity Science
In general usage, complexity tends to be used to characterize something with many parts in intricate arrangement. The study of these complex linkages is the main goal of complex systems theory.

The reason MPEG-7 is often cited as the best standard for multimedia is it‘s level of granularity and levels of abstraction. As Hiranmay Ghosh (Ghosh, 2010) points out: “The goal of a multimedia ontology is to Semantically integrate distributed heterogeneous media collections (bridging the gap) and integrate multiple media types.” But there is a downside too. The MPEG-7 Schema defines more than a thoussand elements, half as many attributes and complex types. Without any formal semantics this can cause interoperability issues when extracting or entering data. The interoperability issues and complexity can, according to Troncy, Celma, Little, García & Tsinaraki, also be experienced as a burden.

Why the need for interoperability

There exists many types of metadata and metadata standards, data types and applications which process the file formats. As mentioned above, Werner Bailer points out three main problems, which we look a little closer at below.

Integrating different standards

A multimedia file‘s life cycle can be very complex, with many people dealing with the file at the various stages from production to finished product and use. There is no ideal standard which covers work-scenarios. Structural descriptions and low-level audiovisual features in MPEG-7 work well for some standards, but might not fit other standards, and the concepts of objects are written in RDF/OWL, which for some standards there exists none or limited tools for reasoning.

Lack of Formal Semantics

Semantic elements are far from always properly defined and there are many alternatives to model the same descriptions. This makes it difficult to validate and understood by all software. A way to solve ambiguities could be to use a limited set of description tools.

Deployment of Multimedia Metadata

Many metadata formats exist and the metadata pertaining to the file is often published alongside the multimedia piece itself. This makes it difficult to process automatically and the results can therefor be unreliable. Bailer concludes that: “Semantic technologies are not optimal for all types of data and there are limitations w.r.t. Scalability”

Requirements for a Multimedia Ontology

 Battel of ontologies
May the best ontology win. © Yutaka Seki –

When Arndt et al. set out to create their multimedia ontology (COMM) (Arndt et al., 2007), the authors defined six requirements for designing a multimedia ontology:

  • MPEG-7 Compliance: as this is the standard used worldwide by the broadcasting community.
  • Semantic Interoperability: sufficiently explicitly described. Ensuring that the intended meaning can be shared amongst different systems.
  • Syntactic Interoperability: An agreed-upon syntax e.g. OWL, RDF/XML or RDFa.
  • Separation of Concerns: Clear separation of administrative and descriptive labeling.
  • Modularity: Minimise the execution overhead.
  • Extensibility: The underlying model and assumptions should always be stable and ensure that new concepts can be added to the ontology without clashing with older models.

Suárez-Figueroa, Ghislain & Corcho review the most well-known and used ontologies in the multimedia domain from 2001 to 2013, based on free available RDF(s) or OWL, and present a framework: FRAMECOMMON (Suárez-Figueroa, Ghislain & Corcho, 2013). The authors highlight three criteria to look out for when developing or deciding on a multimedia ontology, namely:

  • Which multimedia dimensions (audio-visual, image, video etc) are covered by the ontology.
  • Documentation and code quality, and how easy is the ontology to pick up and use.
  • Is the ontology trustworthy and free of irregularities.

Also Dasiopoulou et al. highlights conceptual clarity and well-defined semantic models, and point out, that much of the metadata and semantics from multimedia products remain tucked away from the semantic web, due to scalability problems of representation and the capturing of contextual information (Dasiopoulou et al., 2010).

A list of Multimedia Ontologies

The list of Multimedia Ontologies contiues here


The Big Picture – Multimedia Ontologies and …

%d bloggers like this: