1     Definition 2     Functional Requirements 3     Syntax
4     Semantics 5    Conformance Testing 6     Performance Assessment

1. Definition

The digital representation of discourse segments and their associated content structures —semantic and pragmatic entities and their interrelations — designed to enable structured interpretation of discourse and explicit mapping between presentation of discourse and abstract meaning across modalities (according to ISO/TS 24617-5:2014).

2. Functional Requirements

MUST Requirements

  • Segment Core — MUST represent discourse segments with identifiers, type (e.g., word, phrase, clause, sentence, paragraph), and optional attributes such as governing status, text spans, and time spans for multimodal contexts.
  • Content Core — MUST represent content nodes (semantic/pragmatic entities such as events, states, processes, relations, propositions, objects, circumstances) with identifiers, class, and optional features.
  • Mapping — MUST provide explicit mapping between segments and content nodes, supporting one-to-one, one-to-many, and many-to-one links, with optional text/time spans for alignment.
  • Relation Inventory — MUST support representation of discourse relations between content nodes, with type and attributes, and allow profile-based vocabularies.
  • Identity and Referencing — MUST ensure stable identifiers for segments, content nodes, and mapping links; support cross-reference and uniqueness within an instance.
  • Discontinuous Segments — MUST allow representation of discontiguous spans for segments (e.g., subSpans).
  • Multimodal Anchoring — MUST support anchoring segments to text, audio, video, or other media via offsets, timestamps, or frame indices.
  • Normalisation — MUST normalise identifiers to stable namespaces (URIs/IRIs) with language-independent base IDs.
  • Trace — MUST record provenance metadata (origin, timestamp, tool/user, version) for segments, content nodes, relations, and mappings.
  • Extensibility — MUST allow controlled extension points (namespaces, custom attributes, domain vocabularies) without breaking conformance.

SHOULD Requirements

  • Profiles — SHOULD support domain-specific profiles (e.g., dialogue, multimodal interaction, clinical discourse) with additional constraints and enumerations.
  • Quality Measures — SHOULD provide optional confidence scores, quality flags, and validation reports for mappings and relations.
  • Queryability — SHOULD facilitate efficient querying by segment type, content class, relation type, modality, or provenance.
  • Hierarchy Support — SHOULD support hierarchical segment structures (e.g., sentence → clause → phrase) and hierarchical content structures for reasoning.
  • Defaulting & Implicit Links — SHOULD represent conventional defaults and implicit mappings with explicit markers.
  • Cross-Ontology Alignment — SHOULD provide mapping hooks to external semantic resources (e.g., ISO SemAF parts, DR-core) without requiring any single ontology.
  • Multilingual Support — SHOULD support multilingual labels and cross-lingual alignment for segment and content representations.

3. Syntax

https://schemas.mpai.community/MMC/V2.5/data/DiscourseStructureRepresentation.json

4. Semantics

Label Description
Header Discourse Structure Representation Schema Header (schema title and header context)
├─ Standard‑DSR Literal prefix identifying DSR schema (e.g., “MMC‑DSR‑V” embedded in Header pattern)
├─ Version Major version – 1 or 2 digits (part of Header pattern)
├─ Dot‑separator The character “.” separating version components (part of Header pattern)
├─ Subversion Minor version – 1 or 2 digits (part of Header pattern)
DSRID Identifier of this DSR Instance
SegmentGraph Container for discourse segment structure
├─ segments[] Array of Segment objects (minItems=1)
Segment Represents a discourse segment (presentation unit)
├─ segmentId Stable identifier of the segment (string)
├─ type Segment type (enum: word, phrase, clause, sentence, paragraph, section, chapter)
├─ governing Boolean flag indicating governing segment status
├─ textSpan [start, end) character offsets (array of two integers)
├─ subSpans Optional list of discontiguous spans (arrays of [start, end) integers)
├─ timeSpan [start, end) seconds in media (array of two numbers)
├─ attributes Optional object for additional segment metadata
ContentGraph Container for abstract content structure
├─ nodes[] Array of ContentNode objects (minItems=1)
├─ relations[] Array of ContentRelation objects (optional)
ContentNode Represents a semantic/pragmatic entity
├─ contentId Stable identifier of the content node (string)
├─ class Semantic class (enum: event, state, process, relation, proposition, object, circumstance)
├─ label Optional human-readable label (string)
├─ features Optional object for additional semantic features
ContentRelation Represents a relation between content nodes
├─ source ID of source content node (string)
├─ target ID of target content node (string)
├─ type Relation type (string; constrained by profile vocabularies)
├─ attributes Optional object for relation metadata
Mapping Container for links between segments and content nodes
├─ links[] Array of MapLink objects (minItems=1)
MapLink Represents a mapping between content and presentation
├─ contentRef ID of the content node being mapped (string)
├─ segmentRefs[] Array of segment IDs linked to this content node (minItems=1)
├─ textSpan Optional [start, end) character offsets for alignment (array of two integers)
├─ timeSpan Optional [start, end) seconds in media for alignment (array of two numbers)
DescrMetadata Optional metadata for the DSR instance

5     Conformance Testing

A Data instance Conforms with MPAI-MMC Belief (MMC-EBL) if:

  1.  Its JSON Object validates against its JSON Schema.
  2. Any included  JSON Object validates against its JSON Schema.
  3. All Data in the JSON Object:
    1. Have the specified Data Types.
    2. Conform with the Qualifiers signaled in their JSON Schemas.

6     Performance Assessment