How do concepts matter for language in the human-machine era?
by Justyna A. Robinson and Sandra Young
Questions of conceptual content in language are important to applications relying on human-machine models of language. In this context, Concept Analytics Lab has been awarded funding from the COST (European Cooperation in Science and Technology) Action Network initiative. The aim of the grant was to explore synergies and ideas through events organised by LITHME (Language in the Human Machine Era) research programme, such as
- ‘Language in the Human-Machine Era’ annual conference University of Groningen, Campus Fryslân, Netherlands, 15-16 May 2023
- Training school in Pristina, Kosovo, 5-9 June 2023
What is LITHME?
LITHME (Language in the Human Machine Era) was launched in 2020 as an EU COST Action network. The aim of LITHME is to explore questions relating to the interface between language and technology in what the network calls the human-machine era, because of the pervasive nature of new technologies and their disruptive potential. Language is an essential aspect of this technology, but experts across the spectrum from linguistics to computer science tend to work isolated from each other. The LITHME initiative aims bridge that gap and bring experts from the different linguistic and computer science realms together to tackle potential issues and amplify the potential benefits of state-of-the-art language technologies. The network does this through eight Working Groups, i.e. WG 1: Computational Linguistics; WG 2: Language and Law; WG 3: Language Rights; WG 4: Language Diversity, Vitality and Endangerment; WG 5: Language Learning and Teaching; WG 6: Language Ideologies, Beliefs and Attitudes; WG 7: Language Work, Language Professionals; WG 8: Language Variation. Members of Concept Analytics Lab collaborate with WGs 1 and 3.
The LITHME Conference brings together researchers and experts from various areas of linguistics and language technology to prepare and to shape language and technology research in the human-machine era. Justyna Robinson represented the research of Concept Analytics Lab at the LITHME conference by presenting a paper entitled ‘Machine-led concept extraction’. The talk instigated further discussions about the relationship between concepts, language, and NLP methodologies.
LITHME training school
The LITHME training school seeks to bring researchers and professionals working on the interconnection between language and technology to come together and share ideas about multiple aspects of this new frontier. Sandra Young attended and shares some highlights below.
The training school is primarily a networking event. It was interactive and provided an excellent opportunity to meet people across a whole spread of research and industry fields. The international nature of the event also provided an array of participants to learn with and from. I found the eclectic nature of people’s backgrounds particularly inspiring: there were doctoral students to professors, sociolinguists looking at ethics and how technologies are changing research methods, to computer scientists working LLMs and using robots and teaching aids for autistic children. It was also enriching to be in a space with people from all over Europe (and beyond), to be able to share different experiences and the differentiated experience of the same technologies or elements through different linguistic lenses.
The training school fed us with a lot of information about different aspects of language technology and our world today. Of this, information relating to the unequal access to technology and availability of linguistic data has really embedded itself into my mind: forty per cent of the world’s population have no access to/do not use the internet. That is not far from half. When we talk about ‘today’s data-driven world’ we are excluding nearly half the world population. Then, who writes the internet, and in what language? English represents over 80% of the internet. And the content is written primarily by only certain people within these societies. The question of who language technology serves and who it excludes is a huge issue that is rarely the focus of conversation, and one that needs to be central when we are thinking that LLMs and the modelling of technologies will shape our society, our thoughts and what people take to be ‘true’.
But on a theoretical level, the question that interests me most was mentioned right at the beginning by Rui Sousa de Silva and Antonio Pareja Lora was the question of understanding. I have always been of the mindset that computers don’t understand what they are generating, not in the way we understand things. It is why we need the human element within technologies to provide this real-world view, why computers produce inconsistencies that strike us as strange. They work in a different and complementary way to us. But what about humans? I have thought about humans and understanding throughout my life as a translator and interpreter, often marvelling that we understand each other at all. But I had not given it much thought specifically in the context of language technology. Does it matter that computers don’t understand? How can the abilities of computers (lots of data, computing power) be leveraged to support humans where they excel (specialist expertise and a real understanding of texts and the world)?
The training school was a melting pot of minds: from tech to human, those embarking on their careers and those reaching the pinnacles of theirs, different languages, experiences and life journeys. The meeting of minds provided by LITHME is also a key element of our work at Concept Analytics Lab—the attempt to build bridges and work together to forward the language/technology divide through shared experience. In our little corner of work, our aims in that sense align very well with LITHME aims and we look forward to exploring further shared ideas and synergies.
We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.