Te Kete o Karaitiana Taiuru (Blog)

MRW

How AI can help Indigenous language revitalisation, and why data sovereignty is important

Using the interview with Michael Running Wolf at https://www-cbc-ca.cdn.ampproject.org/c/s/www.cbc.ca/amp/1.7290740 , I offer some commentary and key warnings for Māori language that I am already seeing occurring here in Aotearoa New Zealand.

 

Indigenous language experts working in computer science say Artificial Intelligence is a useful tool in language revitalization but communities must prioritize the ownership of their data.

“There are limitations, Running Wolf said, like sparse data and the polysynthetic nature of many Indigenous languages.”

In New Zealand we are unique in the fact that the Māori language is New Zealand’s first official language. I also argue that the Māori language is perhaps one of the most published and digitised indigenous languages in the world. We can not claim ownership of much of what has been digitised. But we can create data that is by Māori for Māori and then ensure some sovereignty and ownership over those data sets. We also need to be careful in what we digitise going forward and ensure there are copyright protections and restrictions from unauthorised usage.

Already we have seen several international companies using the digitised Māori language resources to train their own AI and then offer those systems back to companies as an AI translation program. There are several non Māori companies in New Zealand who are also using this data to train train their own AI systems.

While Māori do have at least one Māori owned company that are a significant player in AI and Māori language, the incentives to share and licence their data in the current ecosystem is likely to be against them.

 

It’s just going to be like a pencil. It’s useful but it’s not going to save our language.”

Yes, we know from international research by linguists such as Joshua Fishman, that it takes an intergenerational approach. But, an AI is not able to speak on the marae or nurture of children in te reo Māori, but it will assist with other physical initiatives and as a living language. I think we are likely to see less and less Māori language learning resources in the future as they are replaces by Generative AI.

 

“Also, languages such as Cheyenne and Blackfeet are polysynthetic and fusional, meaning prefixes and suffixes blend into words so the roots are not apparent.

He said he intends to overcome these limitations by working with communities to develop a manageable data set that will train AI.

Running Wolf emphasized the importance of the community’s agency in their language revitalization, particularly when it comes to AI.

Many Māori language speakers are telling me their concerns with AI is that the AI can never understand the true intent of some words and phrases and not have any emotional feel to the words. While this is true now, it may change. But these issues and many others that are yet to be discovered through testing and development will need to be considered. This can not and should not be done in isolation by academics and engineers but within Māori language speaking communities.

 

We have to have our own engineers. We need to have our own computer scientists using the software … We need to have sovereignty over our own data, set the terms and that’s the only way to build this AI.

 

This advise is the same for Māori. We need to ensure we create pathways for education in IT and in particular computer science, machine learning and all areas of AI.  There are many reasons why statically Māori make up about 5% of the tech industry and 0.16% of the AI workforce, we just need to work on removing those barriers and making clear pathways.

Sovereignty over our own data is much more easier in New Zealand for Māori than other Indigenous Peoples due to Te Tiriti o Waitangi and the Waitangi Tribunal findings that Māori Data is a Taonga and subject to Māori Data Governance. In addition to an ever increasing recognition and appreciation within government and the international cloud providers of Māori Data Sovereignty. It really is up to Māori to use these opportunities and partnerships to seek Data Sovereignty and to create the opportunities in the tech industry to welcome and grow our next generation of digital kaitiaki.

 

As Māori we need to take note of Runnings Wolf’s warnings experience and not leave it to the language experts alone, not academia but we need to look at a multiprong approach to sovereignty of language data sets, upskilling more Māori into AI jobs and protect future data sets of the Māori language.

 

 

 

DISCLAIMER: This post is the personal opinion of Dr Karaitiana Taiuru and is not reflective of the opinions of any organisation that Dr Karaitiana Taiuru is a member of or associates with, unless explicitly stated otherwise.

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive