| 1 Definition | 2 Functional Requirements | 3 Syntax |
| 4 Semantics | 5 Conformance Testing | 6 Performance Assessment |
1. Definition
The digital representation of discourse segments and their associated content structures —semantic and pragmatic entities and their interrelations — designed to enable structured interpretation of discourse and explicit mapping between presentation of discourse and abstract meaning across modalities (according to ISO/TS 24617-5:2014).
2. Functional Requirements
MUST Requirements
- Segment Core — MUST represent discourse segments with identifiers, type (e.g., word, phrase, clause, sentence, paragraph), and optional attributes such as governing status, text spans, and time spans for multimodal contexts.
- Content Core — MUST represent content nodes (semantic/pragmatic entities such as events, states, processes, relations, propositions, objects, circumstances) with identifiers, class, and optional features.
- Mapping — MUST provide explicit mapping between segments and content nodes, supporting one-to-one, one-to-many, and many-to-one links, with optional text/time spans for alignment.
- Relation Inventory — MUST support representation of discourse relations between content nodes, with type and attributes, and allow profile-based vocabularies.
- Identity and Referencing — MUST ensure stable identifiers for segments, content nodes, and mapping links; support cross-reference and uniqueness within an instance.
- Discontinuous Segments — MUST allow representation of discontiguous spans for segments (e.g., subSpans).
- Multimodal Anchoring — MUST support anchoring segments to text, audio, video, or other media via offsets, timestamps, or frame indices.
- Normalisation — MUST normalise identifiers to stable namespaces (URIs/IRIs) with language-independent base IDs.
- Trace — MUST record provenance metadata (origin, timestamp, tool/user, version) for segments, content nodes, relations, and mappings.
- Extensibility — MUST allow controlled extension points (namespaces, custom attributes, domain vocabularies) without breaking conformance.
SHOULD Requirements
- Profiles — SHOULD support domain-specific profiles (e.g., dialogue, multimodal interaction, clinical discourse) with additional constraints and enumerations.
- Quality Measures — SHOULD provide optional confidence scores, quality flags, and validation reports for mappings and relations.
- Queryability — SHOULD facilitate efficient querying by segment type, content class, relation type, modality, or provenance.
- Hierarchy Support — SHOULD support hierarchical segment structures (e.g., sentence → clause → phrase) and hierarchical content structures for reasoning.
- Defaulting & Implicit Links — SHOULD represent conventional defaults and implicit mappings with explicit markers.
- Cross-Ontology Alignment — SHOULD provide mapping hooks to external semantic resources (e.g., ISO SemAF parts, DR-core) without requiring any single ontology.
- Multilingual Support — SHOULD support multilingual labels and cross-lingual alignment for segment and content representations.
3. Syntax
https://schemas.mpai.community/MMC/V2.5/data/DiscourseStructureRepresentation.json
4. Semantics
| Label | Description |
| Header | Discourse Structure Representation Schema Header (schema title and header context) |
| ├─ Standard‑DSR | Literal prefix identifying DSR schema (e.g., “MMC‑DSR‑V” embedded in Header pattern) |
| ├─ Version | Major version – 1 or 2 digits (part of Header pattern) |
| ├─ Dot‑separator | The character “.” separating version components (part of Header pattern) |
| ├─ Subversion | Minor version – 1 or 2 digits (part of Header pattern) |
| DSRID | Identifier of this DSR Instance |
| SegmentGraph | Container for discourse segment structure |
| ├─ segments[] | Array of Segment objects (minItems=1) |
| Segment | Represents a discourse segment (presentation unit) |
| ├─ segmentId | Stable identifier of the segment (string) |
| ├─ type | Segment type (enum: word, phrase, clause, sentence, paragraph, section, chapter) |
| ├─ governing | Boolean flag indicating governing segment status |
| ├─ textSpan | [start, end) character offsets (array of two integers) |
| ├─ subSpans | Optional list of discontiguous spans (arrays of [start, end) integers) |
| ├─ timeSpan | [start, end) seconds in media (array of two numbers) |
| ├─ attributes | Optional object for additional segment metadata |
| ContentGraph | Container for abstract content structure |
| ├─ nodes[] | Array of ContentNode objects (minItems=1) |
| ├─ relations[] | Array of ContentRelation objects (optional) |
| ContentNode | Represents a semantic/pragmatic entity |
| ├─ contentId | Stable identifier of the content node (string) |
| ├─ class | Semantic class (enum: event, state, process, relation, proposition, object, circumstance) |
| ├─ label | Optional human-readable label (string) |
| ├─ features | Optional object for additional semantic features |
| ContentRelation | Represents a relation between content nodes |
| ├─ source | ID of source content node (string) |
| ├─ target | ID of target content node (string) |
| ├─ type | Relation type (string; constrained by profile vocabularies) |
| ├─ attributes | Optional object for relation metadata |
| Mapping | Container for links between segments and content nodes |
| ├─ links[] | Array of MapLink objects (minItems=1) |
| MapLink | Represents a mapping between content and presentation |
| ├─ contentRef | ID of the content node being mapped (string) |
| ├─ segmentRefs[] | Array of segment IDs linked to this content node (minItems=1) |
| ├─ textSpan | Optional [start, end) character offsets for alignment (array of two integers) |
| ├─ timeSpan | Optional [start, end) seconds in media for alignment (array of two numbers) |
| DescrMetadata | Optional metadata for the DSR instance |
5 Conformance Testing
A Data instance Conforms with MPAI-MMC Belief (MMC-EBL) if:
- Its JSON Object validates against its JSON Schema.
- Any included JSON Object validates against its JSON Schema.
- All Data in the JSON Object:
- Have the specified Data Types.
- Conform with the Qualifiers signaled in their JSON Schemas.