Data and data processing

Datum (plural: data), often used in singular form as data, can be defined as information that is available for processing. Our body can feel the temperature of the ambient air, but it is not uncommon that two persons have different feelings regarding the ambient temperature. If we use a thermometer, however, we can get an “absolute” measure of the temperature taken at a given point and at a given time. We can say that the temperature measured by the thermometer is now “data” that can be used, e.g., to correlate with the temperature at other places and at different times. We can also say that time and geographical coordinates are “metadata” of the temperature “data”, but we can also say that geographical coordinates are data, and time and temperature are metadata depending on the purpose of the processing.

Humans are good at processing data. Given a table or a stream of data, an experienced human may discover correlations that other humans do not discern. This type of processing happens at a large scale at a Stock Exchange where experienced traders make decisions to buy, hold or sell company shares, based on the data flowing in and the experience, i.e., data stored in their brains.

No matter how good certain humans can be at processing data, they have limits: they only work a certain amount of time with sufficient performance before they take a rest; their capability to ingest different streams of data is limited; their ability to “retrieve” data is constrained by their experience; their ability to correlate data is limited by the speed of the electro-chemical technology used in their brains (a few tens of kHz). While it is possible in certain cases to parallelise certain type of data processing (just think of the administrative offices of large companies 50 years ago), in other – especially time critical – cases, parallelisation is not possible unless the speed of the processing technology is increased.

The mainframe computers of the 1960s and 1970s, with their large storage and processing capabilities – as seen with the eyes of that time – essentially uprooted the way administrative offices processed data.

Figure 1 – Old style administrative office ((Photo by Chris Curry on Unsplash)

Humans were required to teach, i.e., program, machines to process data and humans were still needed to process the data resulting from the processing of machines, again to discover correlations. Application programs such as VisiCalc and their successors helped spread the notion and consolidate the practice of machines processing data for further processing by humans.

Photography was a great tool to distribute information, as wase telephony, facsimile, radio, and television. Great ingenuity was required, however, to convert such information into data: as long as image, voice, audio and video signals were “analogue”, however, it was hard to get “data” from them. Some may still remember different “meters” attached to device clamps, providing data about voltage and current from which humans could infer something to their interest.

The great phenomenon started in the late 1920s by Harry Nyquist et al. at the Bell Labs progressed through a series of milestones that would be too long to recall here, and eventually gave rise to the world of digital media as we know it. If today we are inundated by data, however, it is because the enormous streams of data measured in kbit/s, then Mbit/s, today Gbit/s and tomorrow Tbit/s were, are and will be reduced in size by exploiting the “inner structure” of those data streams.

Human ingenuity achieved that. By resorting to different tools, little by little humans could dig into those data streams and discover ways to avoid sending useless parts, or if necessary, “sacrificing” some parts because their removal did not seriously affect the intelligibility and usability of the resulting data.

As soon as media information could be converted into data, humans could apply their ingenuity to teach machines to extract meaningful information or data. This was done for speech, e.g., extracting what the speaking human is saying word by word, but also understanding the “meaning” of the words put together and even attempting to provide a reply to the sentence. Similar levels of human-like understanding of other types of data such as visual have also been achieved.

In the old-style administrative offices, humans used to perform tasks that were eventually taken over by mainframes to a large extent. The kind of minute investigations that have allowed the processing of data to achieve the present-day results are in line for a similar eventual fate. Humans are excellent at discovering correlations between data because they have a powerful “processor” and vast amounts of data resulting from years of experience. However, technology makes it possible to develop machines that have a similar learning and processing capability. They are not constrained to operate with a kHz clock but can work (today) with a GHz clock, the amount of “experience” they can ingest and process data without being constrained by working hours or by the opportunities that life gives to humans.

Unlike DP, i.e., the human programming of machines to perform operations that it would be too tedious or impractical or even impossible for humans to do, AI is the programming of machines to perform operations that require a human-like level of intelligence and discernment. ML, a subset of AI, is the programming of machines so that they can learn and adapt without following explicit instructions. A large amount of data is provided to an ML algorithm and let it explore the data and search for a model achieving what the programmers have set out to achieve.

AI is the technology that enables the transformation of data to suit the needs of an application. It can process financial data to provide Key Performance Indicators (KPI), it can convert data in the form of a stream of speech samples to words that a human can read, it can process characters expressing words and extract the meaning of those words, it can name and characterise the objects in a visual scene and can reduce the number of bits needed to transmit a high-definition video stream. It is doing this today and promises to do more in the future thanks to a large investment made by a sizeable share of worldwide research and academia stakeholders.

The same data used by a human is often needed by other humans who want them in a form to be understood by them. The role of MPAI standards is to do exactly that for machines: to define a standard data representation that is suitable for processing by AI technologies.

There is one proviso, though, because the data processing technologies that have had decades of track records are still in the process of passing their baton to their successor AI technologies. The transition is not going to happen in an instant because there are excellent data processing technologies that have been honed for decades whose life cycle is not over. MPAI focuses on data coding by AI, but its standards are meant to serve industry needs, not an abstract principle. Whenever desirable, traditional data processing-based standards will be included next to AI-based standards. MPAI standards will also support integration of data processing-based and AI-based data coding.

Data and data processing

Machine Learning and Neural Networks

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Notice