Te Kete o Karaitiana Taiuru (Blog)

synthetic

Is Synthetic Data a Taonga?

This brief post will discuss and analyse if synthetic digital data that is used with Māori Data, is itself Māori Data and therefore a Taonga. It will use traditional Māori customary values and beliefs (tikanga) and Māori Data Sovereignty principles and applying those in an Māori perspective, resulting in a non western perspective explaining if synthetic digital Data is a Taonga.

A new post will be created to discuss Synthetic Biology.

What is synthetic data (digital)

Synthetic data is data (Text, Media such as video, image, sound and Tabular data, ) that has been artificially generated by computer algorithms, as opposed to real data that has been collected from natural events.

Synthetic data is used when an organisation doesn’t have the data or enough of it. This is especially applicable to Māori Data, often caused by a lack of data due to mistrust of researchers and the Crown, or being under-represented in research.

Sometimes legacy infrastructures and siloed data systems make it uneconomical or technically too difficult to extract data. In today’s data protection regulatory landscape, it can also be a matter of legal compliance. For example in New Zealand the Privacy Act 2020 may be too restrictive for an orgnisation to use real life data.

Other reasons could include security and sovereignty concerns that the data is too sensitive to be migrated to a cloud infrastructure or off shore, or simply that it is cheaper to use synthetic data.

The use of generative AI to create synthetic data is one area that is rapidly growing, relieving the burden of obtaining real-world data so machine learning models can be trained effectively. By 2024, Gartner predicts 60% of data for AI will be synthetic to simulate reality, future scenarios and derisk AI, up from 1% in 2021.

Examples of where synthetic data is being used

  • Amazon is using synthetic data to train Alexa’s language system
  • Google’s Waymo uses synthetic data to train its self driving cars
  • Health insurance company Anthem works with Google Cloud to generate synthetic data
  • American Express & J.P. Morgan are using synthetic financial data to improve fraud detection
  • Roche is using synthetic medical data for clinical research
  • German insurance company Provinzial tests synthetic data for predictive analytics.
  • Māori health research often uses synthetic data to adjust for the lack of Māori research participants.

Implications for Māori

It is likely that many Māori Data sets are incomplete due to a lack of engagement and mistrust of data collection organisations. The previous two Census attempts are one example where data was collected from other sources to be able to produce meaningful data.

Another example is the lack of Māori engagement with health research by academics, where often the only way to provide meaningful analysis of Māori health data is to create and use synthetic data.

There are a number of bias and racial risks to consider. One example, if using synthetic video and images in advertising where there is a lack of available stock images and videos of Māori, there is a risk that personal bias could portray Māori in a negative manner.  In New Zealand where AI was used responsibly was by the National Party for their ad campaigns. But this required people to ensure the synthetic images were not using stereotypes of Māori and also requires the AI to have enough data to produce a realistic image. It is vital that any synthetic image and data is checked by Māori for ethics and to ensure there are no bias or racism associated with it.

Other risks include the fact we know that much Māori Data that is in the government eco system is likely to be bias and or require a detailed analysis of the statistics – Police, MSD and the Justice system are likely examples.

With a lack of Māori participation in past research and having limited data researched about Māori by non Māori, we need to be very cautious of the truth of the data that may be modelled to produce synthetic Māori Data.

As internationally recognised, the AI and algorithms that may be used to create synthetic Māori Data, has a high risk that that those systems contain the human bias of the developers, country of origin and their organisations.

Taonga or not?

In Te Ao Māori everything has a whakapapa (genealogy) to each other, whether it is tangible or intangible, human or non human.  I explored why Māori Data is a Taonga in this paper, and then in a much more detailed analysis in Chapter 11 of Indigenous Research Design : Transnational Perspectives in Practice.

By recognising that Māori Data is a Taonga, immediately gives Māori Data protection and rights with Te Tiriti o Waitangi and international rights with The United Nations Declaration on the Rights of Indigenous Peoples (UNDRIP).

The WAI 2255 Waitangi Tribunal decision reinforced my paper above and stated that “Māori Data with Mātauranga is a Taonga”. It is impossible to have Māori data with no mātauranga as any data didn’t just appear from no where.  This reinforces the findings of the WAI 262 claim about Intellectual Property Rights and that anything with whakapapa has mātauranga making it a Taonga.

Within the WAI 262 decision is also a section of Māori art. Synthetic data is another form of art, but machine generated. The decision in relation to art created by non Māori is controversial and is being debated, and I also critiqued the specific statements in my PhD research.

The next issue to consider is what is Māori? New Zealand legislation recognises tikanga Māori (customary lore) that states any human being that claims to have whakapapa Māori, is Māori. To state otherwise would be against traditional Māori beliefs and customs. There is no Blood Quantum requirements in New Zealand. We also need to consider personhood of Taonga and that Māori Data may one day be given personhood.

Scenario 1. The synthetic data is created using Māori Data, anonymised or not. Then the synthetic data has a whakapapa, it has mātauranga so it is a Taonga.

Scenario 2. The synthetic data is created using non Māori Data, but added to existing Māori Data so that it can produce results. Using traditional knowledge and customs of anything and anyone who has a Māori descent is Māori, then the synthetic data is Māori Data and a Taonga.

Scenario 3. If an algorithm or other AI was created by a Māori as an individual or a collective, then the synthetic data is Māori Data, has whakapapa and mātauranga and is therefore a Taonga.

Conclusion

Any synthetic data that is created by an individual Māori person or collective, uses any amount of Māori Data, whether anonymised or not, is still considered to be Māori Data as it has a genealogical connection to Māori Data or a Māori person. This is still applicable if the synthetic data uses non Māori Data as it becomes Māori Data when it is mixed with original Māori Data, in the same manner we do not have half castes or blood quantum for Māori in New Zealand.

DISCLAIMER: This post is the personal opinion of Dr Karaitiana Taiuru and is not reflective of the opinions of any organisation that Dr Karaitiana Taiuru is a member of or associates with, unless explicitly stated otherwise.

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive