Compression and Understanding of Industrial Data –
Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international association with the mission to develop AI-enabled data coding standards. Research has shown that data coding with AI-based technologies is more efficient than with existing technologies.
This document is a Call for Technologies for the MPAI Compression and understanding of industrial data (MPAI-CUI) project. The MPAI-CUI standard uses AI substantially to reduce the amount of data with a controlled loss of information and extract the most relevant information from the industrial data, with the aim of assessing company performance and predicting the risk of bankruptcy long before.
The MPAI approach to developing AI data coding standards is based on the definition of standard interfaces of AI Modules (AIM). AIMs operate on input data having a standard format to provide output data having a standard format. AIMs can be combined and executed in an MPAI-specified AI-Framework called MPAI-AIF. A Call for MPAI-AIF Technologies is currently open.
While AIMs must expose standard interfaces to be able operate in an MPAI AI Framework, their performance may differ depending on the technologies used by implementors. MPAI believes that competing developers striving to provide more performing proprietary and interoperable AIMs will promote horizontal markets of AI solutions that tap from and further promote AI innovation.
This document calls for technologies that can specifically be used to develop specifications of input and output interfaces AIMs whose assembly provides a solution to the identified MPAI-CUI Use Case:
- Data Compression and Understanding
It should be noted the Use Case that makes up MPAI-CUI will obviously be non-normative. The internals of the AIMs will also be non-normative. The input and output interfaces of the AIMs whose requirements have been derived to support the Use Cases will be normative.
Therefore, the scope of this Call for Technologies is restricted to the input and output interfaces of the AIMs. However, MPAI invites comments on any element of this Call for Technologies.
The content of this document is
|Chapter 2||briefly introduces the AI Framework Reference Model and its six Components|
|Chapter 3||briefly introduces the Use Case.|
|Chapter 4||presents the MPAI-CUI Use Case with the following structure
1. Reference architecture
2. Description of AI Modules and their I/O data
3. Technologies and Functional Requirements
4. Interfaces of AIM I/O Data
|Chapter 5||identifies the technologies likely to be common across MPAI-CUI and other MPAI use cases|
|Chapter 6||gives suggested references. Respondents are advised to become familiar with the references|
|Chapter 7||gives a basic list of relevant terms and their definition|
2 The MPAI AI Framework (MPAI-AIF)
Most MPAI applications considered so far can be implemented as a set of AIMs – AI/ML and even traditional data processing-based units with standard interfaces assembled in suitable topologies to achieve the specific goal of an application and executed in an MPAI-defined AI Framework. MPAI is making all efforts to identify processing modules that are re-usable and upgradable without necessarily changing the inside logic.
MPAI plans on completing the development of a 1st generation AI Framework called MPAI-AIF in July 2021.
The MPAI-AIF Architecture is given by Figure 1.
Figure 1 – The MPAI-AIF Architecture
- Management and Control manages and controls the AIMs, so that they execute in the correct order and at the time when they are needed.
- Execution is the environment in which combinations of AIMs operate. It receives external inputs and produces the requested outputs both of which are application specific interfacing with Management and Control and with Communication, Storage and Access.
- AI Modules (AIM) are the basic processing elements receiving processing specific inputs and producing processing specific
- Communication is required in several cases and can be implemented, e.g., by means of a service bus and may be used to connect with remote parts of the framework
- Storage encompasses traditional storage and is used to e.g., store the inputs and outputs of the individual AIMs, data from the AIM’s state and intermediary results, shared data among AIMs.
- Access represents the access to static or slowly changing data that are required by the application such as domain knowledge data, data models, etc.
3 Use Cases
3.1 Data Compression and Understanding
A company may need to access the flow of internal (i.e., financial and governance data) and external data to assess and monitor its financial and organizational performance, as well as the impact of vertical risks (e.g., cyber, seismic, etc.), according to the current regulations (e.g., ISO 31000 on risk assessment and management).
The company generating the data flow may need to perform compression and understanding for its own needs (e.g., to identify core and non-core data). Indeed, the company itself can analyse its financial performance, identifying possible clues to the crisis or risk of bankruptcy years in advance. It may help the board of directors and decision-makers to make the proper decisions to avoid these situations, conduct what-if analysis, and devise efficient strategies.
At the same time, a financial institution that receives a request for financial help from a troubled company, can access its financial and organizational data and make an AI-based assessment of that company, as well as a prediction of future performance. This aids the financial institution to take the right decision in funding or not that company, having a broad vision of its situation.
4 Functional Requirements
4.1 Data Compression and Understanding
4.1.1 Reference architecture
This Use Case can be implemented as in Figure 2.
Figure 2 – Compression and understanding of Industrial Data
4.1.2 AI Modules and their I/O data
The AI Modules of Figure 2 perform the functions described in Table 1 – AI Modules .
Table 1 – AI Modules of Industrial Data Compression and Understanding
|Data Conversion (DC)||Gather data needed for the assessment from several sources (internal and external), in different format and covert it in a unique format json.|
|Financial assessment (FA)||To analyse the data generated by the companies (i.e., financial statements) to assess the preliminary financial performances in the form of indexes. To build and extract the financial features for the machine learner.|
|Governance assessment (GA)||To build and extract the features related to the adequacy of the governance asset for the machine learner.|
|Risk matrix (BMR)||To build the risk matrix to assess the impact of vertical risks (i.e., in this use case cyber and seismic).|
|Decision tree (DT)||To create the decision trees for making decisions according to the Random Forest algorithm.|
|Prediction (PRF)||To predict values of the likelihood of company default in a time horizon of 36 months and of the adequacy of the organizational model.|
|Perturbation (PBC)||To perturb the value of the probability of company crisis computed before, considering the impact of vertical risks on company performances|
4.1.3 I/O interfaces of AI Modules
The I/O data of Data Compression and Understanding AIMs are given in Table 2 – I/O data of Use Case AIMs.
Table 2 – I/O data of Use Case AIMs
|AI Module||Input||Output||External data|
|Data Conversion (DC)||Financial statement data
Risk assessment data
|Financial statement data (converted)
Governance data (converted)
|Financial assessment (FA)||Financial statement data||Financial features||Standards from knowledgebase|
|Governance assessment (GA)||Governance data||Governance features|
|Risk matrix (BMR)||Technical data from BIM, internal assessment on cyber security||Severity||Socio-economic data from data bases
Technical data from KB
Standards from KB
|Decision tree (DT)||Financial features, Governance features||Ranking of features importance||Data on active and failed companies from back testing|
|Prediction (PRF)||Financial features, Governance features||Probability of company crisis
Adequacy of organizational model (indexes)
|Data on active and failed companies from back testing|
|Perturbation (PBC)||Probability of company crisis (index); severity from BMR||Index of business continuity||Expert-based mapping rules|
4.1.4 Technologies and Functional Requirements
126.96.36.199 Financial statement data
The Financial statement (raw data) are produced based on a set of accounting principles driving maintenance and reporting of company accounts, so that financial statements can be consistent, transparent, and comparable across companies.
A first set of principles, identified by International Accounting Standard/International Financial Reporting Standard (IAS/IFRS), can be taken as “universal”, as common recognized across all countries are:
An example of corresponding digital representations is
A second set of principles (Principle B) are typically jurisdiction dependent. In the case of Europe example principles are
Repondents are requested to propose
- A set of Principles A
- The corresponding digital representation
- One or more sets of Principles B where applicable jurisdictions are indentified
- The corresponding digital representation
The Financial statement (raw data) are converted to a standard format by the Data conversion (DC).
Respondent are invited to propose a digital representation of financial statement data that are applicable to a minimum set of financial statements whose semantics of universal and local validity. JSON is a primary example of digital representation. However, other representations are possible.
Proponents are invited to comment on this choice and possibly suggest alternative formats. Preference will be given to formats that have been standardised or are in wide use.
188.8.131.52 Governance data
By Governance data we mean attributes that indicate the structure of the governance structure of a company and the roles of key personnel.
The most basic roles are shareholder, manager, sole administrator, president/member of the board of directors, auditor, president/member of the statutory board of directors. They can be taken as “universal”, as common recognized across all countries.
Respondent are invited to propose a governance data ontology that captures today’s practice at the global level. How can the data from a specific company be expressed starting from the ontology?
184.108.40.206 Risk assessment technical data
By Risk assessment technical data, we mean attributes that indicate the internal assessment that the company performs to identify and measure potential or existing vertical risks, and their impact on business continuity.
This data contains values of likelihood, impact, gravity, residual risk and treatments. They should be encoded according to ISO 31000 – “Risk management — Principles and guidelines”.
Proponents are invited to comment on this choice.
Respondent are invited to propose?
220.127.116.11 Financial features
Financial features are a set of indexes and ratios computed using financial statement data. Examples of financial features are given by Table 3.
Table 3 – Financial features
|Feature||Feature value||Feature type|
Respondents are requested to propose Financial features suitable for financial assessment, e.g., those reported in Table 3. Financial features shall satisfy the following requirements
- Extracted or computed from the financial statement data
18.104.22.168 Governance features
Governance features are a set of indexes/ parameters that are used to assess the adequacy of the organizational model. Examples are given by Table 4.
Table 4 – Governance features
|Feature||Feature value||Feature type|
|1||Absolute value||Decision maker data|
|2||Index/Percentage (%)||Shareholder data|
|3||Absolute value||Shareholder data|
|4||Absolute value||Decision maker data|
|5||Absolute value||Decision maker data|
Respondents are requested to propose Governance features suitable for assessing the suitability of governance, e.g., those reported in Table 4. Governance features shall satisfy the following requirements
- Extracted or computed from the Governance data
- Numerical values
A set of values, each of them reflects the level of risk for that specific vertical risk evaluated by the company.
Respondents are requested to propose…
22.214.171.124 Decision Tree
It is a decision support tool that uses a tree-like model of decision, given the financial and governance features. It is based on the Random forest supervised learning method to predicts the value of the probability of company crisis and bankruptcy.
Respondent are requested to propose other learning methods satisfying the following requirements:
126.96.36.199 Expert-based mapping rules KB query
Expert-based mapping rules KB contains a set of rules established by experts in the field that express the vertical risk-financial feature assignment, i.e., an expression of the impact of a certain vertical risks on the financial performance of a company.
Respondent are requested to propose:
- According to the new financial features proposed, the list of risks that potentially affect these features.
- A digital representation of these mapping rules.
 Perboli G., Arabnezhad E., A Machine Learning-based DSS for Mid and Long-Term Company Crisis Prediction. CIRRELT-2020-29. July 2020.
Compression and Understanding of Industrial Data (MPAI-CUI)
Proponents: Guido Perboli (POLITO), Valeria Lazzaroli (Arisk), Mariangela Rosano (POLITO)
Description: Most economic organizations, e.g., companies, etc., produce large quantities of data, often because these are required by regulation. Users of these data maybe the company itself or Fintech and Insurtech services who need to access the flow of company data to assess and monitor financial and organizational performance, as well as the impact of vertical risks (e.g., cyber, seismic, etc.). For example, nowadays companies heavily rely on the security and dependability of their Information System for all the categories of workers, including the management of Industrial Control Systems. Adding into the risk analysis process cybersecurity-related parameters will help a more precise estimation of the actual risk exposure. Cybersecurity data will help a reassessment of financial parameters based on risk analysis data.
The sheer amount of data that need to be exchanged is an issue. Analysing those data by humans is typically onerous and may miss vitally important information. Artificial intelligence (AI) may help reduce the amount of data with a controlled loss of information and extract the most relevant information from the data. AI is considered the most promising means to achieve the goal.
Unfortunately, the syntax and semantics of the flow of data is high dependent on who has produced the data. The format of the date is typically a text file with a structure not designed for indexing, search and extraction. Therefore, in order to be able to apply AI technologies to meaningfully reduce the data flow, it is necessary to standardize the formats of the components of the data flow and make the data “AI friendly”.
Recent regulations are imposing a constant monitoring (ideally monthly). Thus, there is the possibility to have similar blocks of data in temporally consecutive sequences of data.
The company generating the data flow may need to perform compression and understanding for its own needs (e.g., to identify core and non-core data). Subsequent entities may perform further data compression and transformation.
In general, compressed data should allow for easy data search and extraction.
In a first phase MPAI-CUI is addressing primarily risk identification.
MPAI-CUI may be used in a variety of contexts, such as:
- To support the company’s board in deploying efficient strategies. A company can analyse its financial performance, identifying possible clues to the crisis or risk of bankruptcy years in advance. It may help the board of directors and decision-makers to make the proper decisions to avoid these situations, conduct what-if analysis, and devise efficient strategies.
- To assess the financial health of companies that apply for funds/financial help. A financial institution that receives a request for financial help from a troubled company, can access its financial and organizational data and make an AI-based assessment of that company, as well as a prediction of future performance. This aids the financial institution to take the right decision in funding or not that company, having a broad vision of its situation.
To assess the risk in different fields considering non-core data (e.g., non-financial data). Accurate and targeted sharing of core and non-core data that ranges from the financial and organizational information to other types of risks that affect the business continuity (e.g., environmental, seismic, infrastructure, and cyber). As an example, the cybersecurity preparedness status of a company allow better estimation of the average production parameters, like the expected number of days of production lost, which are affected by cyberattacks (e.g., days the industrial plants are stopped, days personnel cannot perform their work due to unavailability of the information system). Several parameters need to be considered, which are obtained by direct acquisition of data from the target companies that perform a cybersecurity risk analysis.
- To analyse the effects of disruptions on the national economy, e.g., performance evaluation by pre/post- pandemic analysis .
- The standards of the data in the data flow should be AI friendly. In other words, the different data required to predict a crisis/bankruptcy of a company should be gathered, carefully selected and eventually completed to be suitable for automatic analysis and then treated by an AI-based algorithm.
- The standard shall ensure efficiency of data structure, indexing and search, according to specific syntax and semantics.
- The standard shall all the extraction of main parameters with indication of their semantics.
- The standard shall support context-based compression (i.e., depending on the sequence of data).
- The standard shall support lossless compression.
- The standard shall support context-based filtering with different levels of details.
Object of standard:
Two main areas of standardization are identified:
- Inputs objects: a first set of data input related to:
- Financial data input
- Financial statements and fiscal yearly reports data (usually expressed in xls or xbrl formats). Their contents follow the accounting standards defined by the Organismo Italiano Contabilità (OIC) at the Italian level and the International Accounting Standards Board (IAS/IFRS) at the international level.
- Invoices. In Italy, the FatturaPA format is expressed in xml, but more in general invoices have to be compliant with the European standard EN 16931-1:2017.
- Financial data input
- Semantics of governance elements.
- Other economic data as company size uniformly recognized according to the size of employees or to the economic activities (e.g., classification elaborated by Eurostat and OECD data); imports and exports, etc…
- Vertical risks data input: in a preliminary phase, we will consider seismic and cyber, as vertical risks of primary interest. In the future, the object of the standard could be extended to cover other risks (e.g., related to infrastructures, sustainability). Generally, at the international level, the ISO/IEC 31000 standard defines the principles and guidelines related to the input data to consider for the risk assessment and management.
- Seismic risk. AI algorithm may help to define a socio-economic and technological model that will support companies and institution in defining reconstruction plan properly. In this direction, data input to assess seismic risk according to ASTM standards, will integrate:
- Technical data related to the existing/needed infrastructures (i.e., geolocation coordinates), architectural and urban planning data, as well as output data from the Building Information Modelling;
- Socio-demographic data, i.e., statistical data collected by certified sources (e.g., ISTAT in Italy, World Bank, International Financial Statistics, Worl Economic Outlook Databases, International Monetary Fund Statistics Data) about population figures, and their characteristics and distribution.
- Cyber risks. Considering cybersecurity-related parameters in the risk assessment will help to understand and estimate the impact of the actual risk exposure on the company performance, its financial health and business continuity.
As an example, an effective system to back up the sensitive data and testing periodically its effectiveness can be of help. Moreover, having well defined incident responses and a team prepared to deal with them, can help minimizing the effects of attacks and the time to recover. Therefore, an initial set of internationally recognised (ISO/IEC 27000 Information security management) inputs to consider are:
- Data related to assessment of organizational cyber management:
- organizational-level incident management (enumerate): no, simple plans available, detailed plan + IR Team, full integrated management (e.g., with a security operations center)
- backup management: no, user requested, automatic, automatic and tested
- vulnerability management (enumerate): no, assessement, management plans, with automatic tools
- enterprise patch management (enumerate): no, manual, automatic, testing
- specific cybersecurity and testing personnel (enumerate): no cybersecurity tasks, IT personnel with cybersecurity tasks, cybersecurity-trained roles
- cybersecurity procedure and mitigation testing (enumerate): no, occasional, planned, planned&frequent
- risk analysis: no, threats identified, assessment available, some mitigations implemented, full (mitigations implemented or justified, risks quantified)
- Data related to prevention of cyber-attacks. Being able to detect anomalies into an information system can allow preventing some attacks or discovering them before attackers, which can be estimated considering:
- monitoring: no, basic detection, integrated detection, organization-level, Security Information and Event Management (SIEM) software
- reaction: no, planned pre-configured responses, organized (some automatic), integrated tool-based & human supervised
- Data related to training. Studies report that the personnel that has followed specific training on cybersecurity aspects that are relevant for their roles are more likely to avoid errors that may compromise the information systems of their companies (e.g., use better password, less likely to click on phishing emails). Training is even more important for personnel with cybersecurity-related responsibilities, hence:
- awareness/training cybersecurity personnel: no, occasional, frequent training
- awareness/training other categories: no, occasional, frequent, frequent and tailored per tasks.
- Data related to legal issues. An additional field where measuring the preparedness is the risk of losses due to legal issues (e.g., GDPR fines for
- availability of cybesecurity certification (enumerate): no, certifications relevant for the company business obtained
- compliance to regulations (enumerate): no, minimum, adequate
- Data for quantifying exposure. Whenever available, the following aggregated values can help to quantify the overall exposure to cybersecurity risks:
- overall risk exposure: monetization value from the risk assessment phase (number, directly obtained by the target company)
- mitigated risk exposure: monetization value after the risk mitigation phase (number, directly obtained by the target company)
- value of assets by security requirement: (Confidentiality-asset value/ Integrity-asset value / Availability-asset value)
- percentage of the value of assets-by-security-requirement on the overall value of the company assets: (Confidentiality-asset value/ Integrity-asset value / Availability-asset value).
- Output objects: represented by the outcome of the AI-based assessment in a format known by the user (format: Json)? This format is expressed in terms of a set of indexes that reflect the health of a company, the appropriateness of the governance and the impact of risks on economic-financial parameters. Some of these indexes are in response to legal requirements on business bankruptcy and crisis (e.g., DSCR), others are computed by means of a proprietary machine learning algorithm.
More in detail, the parameters in output are:
- Risk index of the likelihood of company default in a time horizon of 36 months. It reflects the company performances based on the financial data.
- Index of business continuity reflects the impacts of other risks on the previous measure.
- Index of adequacy of the organizational model considers the impact of the governance on the performance of the company, highlighting possible issues in terms of conflict of interest or familiarity.
- Debt service coverage ratio is a measurement of a firm’s available cash flow to pay current debt obligations
This is depicted in Figure 1 where the object of the standard is identified to be the intermediate format and the AI machine output.
Figure 1 – MPAI-CUI model
In some cases, internationally agreed input data formats exist. In several other cases a variety of formats exist. In these cases, meta formats to which existing formats can be converted should be defined.
The current framework has been made as general as possible taking into consideration the wide range of issues related to risk management. We are expecting the architecture to be enriched and extended according to inclusion of other risks and eventually the synergies with other MPAI applications.
Data confidentiality, privacy issues, etc. are for further consideration.
Benefits: MPAI-CUI will bring benefits positively affecting:
- Technology providers need not develop full applications to put to good use their technologies. They can concentrate on gradually introducing AI-based technologies that will allow a transition from traditionally approaches based on statistical methods, overcoming their limitations. This will enhance the accuracy of prediction and improve user experience.
- Service providers (e.g., Fintech and Insurtech companies, advisors, banks) can deliver accurate products and services supporting the decision-making process, with minimising time to market as the MPAI-CUI framework enables easy combination of internal and external-party components.
- End users as companies and local government can obtain an AI-decision support system to assess the financial health and deploy efficient strategies and action plans.
- Processing modules can be reused for different risk management applications.
Bottlenecks: The full potential of AI in MPAI-CUI would be limited by a market of AI-friendly data providers and by the adoption of a vast amount of information and data strictly dependent from the company and its context.
A simplified access to the technologies under the MPAI-CUI standard will offer end users AI-based products promising for predictions and supporting the decision making in different contexts, reducing the effort of user in analyzing data and improving its experience which becomes more personal, but including a wide vision (e.g., thought benchmarking).
Moreover, the MPAI-CUI standard and the introduction of AI-based technologies will allow a transition from the present systems which are human readable, to machine readable technologies and services.
At the national level, governments can simulate the effects of public interventions and deploy proper strategies and plans in supporting the companies and economy.
MPAI CUI becomes the bridge between traditional approaches in compliance with the actual regulation on prediction of business crisis and fully AI-based systems.
 Perboli G., Arabnezhad E., A Machine Learning-based DSS for Mid and Long-Term Company Crisis Prediction. CIRRELT-2020-29. July 2020.