MPAI-MMC V2.5 Data Types - Discourse Structure Representation

1 Definition	2 Functional Requirements	3 Syntax
4 Semantics	5 Conformance Testing	6 Performance Assessment

1. Definition

The digital representation of discourse segments and their associated content structures —semantic and pragmatic entities and their interrelations — designed to enable structured interpretation of discourse and explicit mapping between presentation of discourse and abstract meaning across modalities (according to ISO/TS 24617-5:2014).

2. Functional Requirements

MUST Requirements

Segment Core — MUST represent discourse segments with identifiers, type (e.g., word, phrase, clause, sentence, paragraph), and optional attributes such as governing status, text spans, and time spans for multimodal contexts.
Content Core — MUST represent content nodes (semantic/pragmatic entities such as events, states, processes, relations, propositions, objects, circumstances) with identifiers, class, and optional features.
Mapping — MUST provide explicit mapping between segments and content nodes, supporting one-to-one, one-to-many, and many-to-one links, with optional text/time spans for alignment.
Relation Inventory — MUST support representation of discourse relations between content nodes, with type and attributes, and allow profile-based vocabularies.
Identity and Referencing — MUST ensure stable identifiers for segments, content nodes, and mapping links; support cross-reference and uniqueness within an instance.
Discontinuous Segments — MUST allow representation of discontiguous spans for segments (e.g., subSpans).
Multimodal Anchoring — MUST support anchoring segments to text, audio, video, or other media via offsets, timestamps, or frame indices.
Normalisation — MUST normalise identifiers to stable namespaces (URIs/IRIs) with language-independent base IDs.
Trace — MUST record provenance metadata (origin, timestamp, tool/user, version) for segments, content nodes, relations, and mappings.
Extensibility — MUST allow controlled extension points (namespaces, custom attributes, domain vocabularies) without breaking conformance.

SHOULD Requirements

Profiles — SHOULD support domain-specific profiles (e.g., dialogue, multimodal interaction, clinical discourse) with additional constraints and enumerations.
Quality Measures — SHOULD provide optional confidence scores, quality flags, and validation reports for mappings and relations.
Queryability — SHOULD facilitate efficient querying by segment type, content class, relation type, modality, or provenance.
Hierarchy Support — SHOULD support hierarchical segment structures (e.g., sentence → clause → phrase) and hierarchical content structures for reasoning.
Defaulting & Implicit Links — SHOULD represent conventional defaults and implicit mappings with explicit markers.
Cross-Ontology Alignment — SHOULD provide mapping hooks to external semantic resources (e.g., ISO SemAF parts, DR-core) without requiring any single ontology.
Multilingual Support — SHOULD support multilingual labels and cross-lingual alignment for segment and content representations.

3. Syntax

https://schemas.mpai.community/MMC/V2.5/data/DiscourseStructureRepresentation.json

4. Semantics

Label	Description
Header	Discourse Structure Representation Schema Header (schema title and header context)
├─ Standard‑DSR	Literal prefix identifying DSR schema (e.g., “MMC‑DSR‑V” embedded in Header pattern)
├─ Version	Major version – 1 or 2 digits (part of Header pattern)
├─ Dot‑separator	The character “.” separating version components (part of Header pattern)
├─ Subversion	Minor version – 1 or 2 digits (part of Header pattern)
DSRID	Identifier of this DSR Instance
SegmentGraph	Container for discourse segment structure
├─ segments[]	Array of Segment objects (minItems=1)
Segment	Represents a discourse segment (presentation unit)
├─ segmentId	Stable identifier of the segment (string)
├─ type	Segment type (enum: word, phrase, clause, sentence, paragraph, section, chapter)
├─ governing	Boolean flag indicating governing segment status
├─ textSpan	[start, end) character offsets (array of two integers)
├─ subSpans	Optional list of discontiguous spans (arrays of [start, end) integers)
├─ timeSpan	[start, end) seconds in media (array of two numbers)
├─ attributes	Optional object for additional segment metadata
ContentGraph	Container for abstract content structure
├─ nodes[]	Array of ContentNode objects (minItems=1)
├─ relations[]	Array of ContentRelation objects (optional)
ContentNode	Represents a semantic/pragmatic entity
├─ contentId	Stable identifier of the content node (string)
├─ class	Semantic class (enum: event, state, process, relation, proposition, object, circumstance)
├─ label	Optional human-readable label (string)
├─ features	Optional object for additional semantic features
ContentRelation	Represents a relation between content nodes
├─ source	ID of source content node (string)
├─ target	ID of target content node (string)
├─ type	Relation type (string; constrained by profile vocabularies)
├─ attributes	Optional object for relation metadata
Mapping	Container for links between segments and content nodes
├─ links[]	Array of MapLink objects (minItems=1)
MapLink	Represents a mapping between content and presentation
├─ contentRef	ID of the content node being mapped (string)
├─ segmentRefs[]	Array of segment IDs linked to this content node (minItems=1)
├─ textSpan	Optional [start, end) character offsets for alignment (array of two integers)
├─ timeSpan	Optional [start, end) seconds in media for alignment (array of two numbers)
DescrMetadata	Optional metadata for the DSR instance

5 Conformance Testing

A Data instance Conforms with MPAI-MMC Belief (MMC-EBL) if:

Its JSON Object validates against its JSON Schema.
Any included JSON Object validates against its JSON Schema.
All Data in the JSON Object:
1. Have the specified Data Types.
2. Conform with the Qualifiers signaled in their JSON Schemas.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

MPAI-MMC V2.5 Data Types – Discourse Structure Representation

1. Definition

2. Functional Requirements

3. Syntax

4. Semantics

5 Conformance Testing

6 Performance Assessment

Notice