Category Archives: Uncategorized

Survey of English Usage zooms on concepts

Survey of English Usage zooms on Covid-19 concepts

by Caitlin Hogan

 

Lab leader Dr Justyna Robinson gave a talk at University College London (UCL) as part of the Survey of English Usage Seminar Series about the work of the Concept Analytics Lab. Her talk covered a wide range of issues in the realm of concept analytics, including how to draw out concepts from written accounts via the Mass Observation Archive dataset. She focussed in particular on the role of concept change during the COVID-19 pandemic, when lifestyle changes forced people to adapt their routine, and thus the concepts they mention in their daily accounts to shift, in some cases drastically. 

 

The Mass Observation Archive began in 1937 founded by Tom Harrisson, Charles Madge and Humphrey Jennings, and its original tenure ran until the 1960’s, at which point it became defunct. Originally inspired by the founders’ desire to capture public opinion on the abdication of King Edward VIII, by 1939 the project aimed to have ordinary people record the day-to-day experiences of their lives, and nearly 500 did. This creates an invaluable documentation of peoples’ habits, lives, and thoughts, acting almost as a time capsule. In 1981, it was revived at the University of Sussex and continues to collect qualitative accounts of ordinary peoples’ lives and opinions to this day. Every 12th of May (chosen as it was the anniversary of the coronation of King George VI), the project calls for anyone to submit a record of their activity on that day, in honour of the original 1937 call going out on that same day.  The 12th May diaries collected  during COVID-19 pandemic were digitised by a grant provided by the Wellcome Trust. Digitised diaries from the first lockdown in the UK, i.e. 12th May 2020, were the focus of Justyna’s talk. 

 

Justyna discussed how records of ordinary peoples’ activities during lockdown marked a shift towards concepts such as REGULATION, which may be expected, but also the discussion of furniture, given the struggles we all had to adapt to working from home.  Excerpts from the diaries on this theme include the following examples:

 

  • most of the online activities I could cast from my phone to the TV or could be done on my phone, which was vital during the early stages of lockdown, as XXXX was using the home laptop to work remotely, until he received a laptop through work
  • I’m working from home and the work PC is on an old computer desk so giving me a 2foot space to work in. 
  • I can also stretch and do yoga during my working day and sit at a desk that is the right size for me- I am very petite and used to feel uncomfortable in the chairs in meeting rooms, designed for men. 

 

As these examples show, participants mention the struggles of accommodating working from home with limited resources in terms of space and furniture for use while working, and the struggles coexisting while some household members work, and others use furniture for other purposes. The examples illustrate clearly that we can talk about the same concept without using the exact same words, so this commonality would be lost if we only used simple corpus linguistic techniques in this analysis. As explained in the Robinson et al (2023), terms like restriction, freeze, coordination, and clampdown emerged while talking about regulations in the COVID-19 pandemic but were not exactly the word regulation itself. Linking these lexemes together allows a clearer picture to emerge of what topics participants wrote in their diaries. The insight into which concepts participants found important during lockdown would not have been detectable without concept analysis,  and especially invoking the notion of a keyconcept (Robinson et al, 2023),

 

 

As the lab continues to refine tools for concept analysis, talks such as this one is key to spread the word to new and emerging scholars about the role of concepts when surveying English usage. 

 

References

Robinson J.A., Sandow R.J. and Piazza R. (2023) Introducing the keyconcept approach to the analysis of language: the case of REGULATION in COVID-19 diaries. Front. Artif. Intell. 6:1176283. doi: 10.3389/frai.2023.1176283 

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.

Blog

Concept Quest Event, 11th March 2024​

Concept Quest Event, 11th March 2024

by Caitlin Hogan

Concept Quest: Navigating ideas on and through linguistic concepts

Our lab will be part of an exciting event in collaboration with the University of Sussex Digital Humanities Lab and the Mass Observation project. Our session will cover our work on concept analysis through some of our recent projects. The team is excited to attend and present at such a thought-provoking gathering!

 

Be sure to check back here after the event for another blog post and photos! 

Register for the event here:

https://www.ticketsource.co.uk/shl-events-ticket/t-yamopvl

 

 

How do concepts matter for language in the human-machine era?

How do concepts matter for language in the human-machine era?

by Justyna A. Robinson and Sandra Young

Questions of conceptual content in language are important to applications relying on human-machine models of language. In this context, Concept Analytics Lab has been awarded funding from the COST (European Cooperation in Science and Technology) Action Network initiative. The aim of the grant was to explore synergies and ideas through events organised by LITHME (Language in the Human Machine Era) research programme, such as

What is LITHME?

LITHME (Language in the Human Machine Era) was launched in 2020 as an EU COST Action network. The aim of LITHME is to explore questions relating to the interface between language and technology in what the network calls the human-machine era, because of the pervasive nature of new technologies and their disruptive potential. Language is an essential aspect of this technology, but experts across the spectrum from linguistics to computer science tend to work isolated from each other. The LITHME initiative aims bridge that gap and bring experts from the different linguistic and computer science realms together to tackle potential issues and amplify the potential benefits of state-of-the-art language technologies. The network does this through eight Working Groups, i.e. WG 1: Computational Linguistics; WG 2: Language and Law; WG 3: Language Rights; WG 4: Language Diversity, Vitality and Endangerment; WG 5: Language Learning and Teaching; WG 6: Language Ideologies, Beliefs and Attitudes; WG 7: Language Work, Language Professionals; WG 8: Language Variation. Members of Concept Analytics Lab collaborate with WGs 1 and 3.

LITHME conference

The LITHME Conference brings together researchers and experts from various areas of linguistics and language technology to prepare and to shape language and technology research in the human-machine era. Justyna Robinson represented the research of Concept Analytics Lab at the LITHME conference by presenting a paper entitled ‘Machine-led concept extraction’. The talk instigated further discussions about the relationship between concepts, language, and NLP methodologies.  

LITHME training school

The LITHME training school seeks to bring researchers and professionals working on the interconnection between language and technology to come together and share ideas about multiple aspects of this new frontier. Sandra Young attended and shares some highlights below.

The training school is primarily a networking event. It was interactive and provided an excellent opportunity to meet people across a whole spread of research and industry fields. The international nature of the event also provided an array of participants to learn with and from. I found the eclectic nature of people’s backgrounds particularly inspiring: there were doctoral students to professors, sociolinguists looking at ethics and how technologies are changing research methods, to computer scientists working LLMs and using robots and teaching aids for autistic children. It was also enriching to be in a space with people from all over Europe (and beyond), to be able to share different experiences and the differentiated experience of the same technologies or elements through different linguistic lenses.

 

The training school fed us with a lot of information about different aspects of language technology and our world today. Of this, information relating to the unequal access to technology and availability of linguistic data has really embedded itself into my mind: forty per cent of the world’s population have no access to/do not use the internet. That is not far from half. When we talk about ‘today’s data-driven world’ we are excluding nearly half the world population. Then, who writes the internet, and in what language? English represents over 80% of the internet. And the content is written primarily by only certain people within these societies. The question of who language technology serves and who it excludes is a huge issue that is rarely the focus of conversation, and one that needs to be central when we are thinking that LLMs and the modelling of technologies will shape our society, our thoughts and what people take to be ‘true’.

 

But on a theoretical level, the question that interests me most was mentioned right at the beginning by Rui Sousa de Silva and Antonio Pareja Lora was the question of understanding. I have always been of the mindset that computers don’t understand what they are generating, not in the way we understand things. It is why we need the human element within technologies to provide this real-world view, why computers produce inconsistencies that strike us as strange. They work in a different and complementary way to us. But what about humans? I have thought about humans and understanding throughout my life as a translator and interpreter, often marvelling that we understand each other at all. But I had not given it much thought specifically in the context of language technology. Does it matter that computers don’t understand? How can the abilities of computers (lots of data, computing power) be leveraged to support humans where they excel (specialist expertise and a real understanding of texts and the world)?

The training school was a melting pot of minds: from tech to human, those embarking on their careers and those reaching the pinnacles of theirs, different languages, experiences and life journeys. The meeting of minds provided by LITHME is also a key element of our work at Concept Analytics Lab—the attempt to build bridges and work together to forward the language/technology divide through shared experience. In our little corner of work, our aims in that sense align very well with LITHME aims and we look forward to exploring further shared ideas and synergies.

What can a conceptual analysis tell us about public attitudes to post-Brexit trade?

What can a conceptual analysis tell us about public attitudes to post-Brexit trade?

by Rhys Sandow

Concept Analytics Lab were commissioned by the Prevention Research Partnership (PETRA) to investigate the attitudes towards trade expressed in responses to the Mass Observation Project’s 2021 UK Trade Policy directive. PETRA is a network of cross-disciplinary academics and other experts from NGOs and charities who are united by their goal of determining how trade agreements can be used to prevent disease and improve health. The long-form narratives from the general public gathered by the Mass Observation’s Trade Policy directive on the topic of trade deals that are unparalleled in scope.

Here at Concepts Analytics Lab, we employed both established and bespoke computational linguistic tools to extract key themes (concepts) in the data and then conducted corpus analysis to provide a close reading of these salient topics. Our analysis, which comprises 125 responses to the directive, totalling 56,840 words, identified the top 100 concepts that are characteristic of the Trade directive (its conceptual fingerprint, see Figure 1).

 

Figure 1: The top 100 concepts in the Mass Observation’s Trade directive.

Using these key concepts as our starting point, we conducted corpus analysis, from which we identified a range of themes in relation to trade policy including:

  • Ethical concerns
    • Human rights issues in China
    • Environmental impact of long-distance trade
    • Animal welfare in the USA and Australia
    • Likelihood of standard to decrease post-Brexit
  • A desire for health and environment to be priorities in any future trade deal
  • A perception that EU standards are world-leading
  • Leaving the EU is an opportunity to support local produce to a greater extent
  • A belief that trade deals will not impact individuals or their local communities
  • A belief that while environmental concerns were a priority for respondents, the government’s key concerns were financial
  • A general acknowledgement of a lack of awareness of the intricacies of trade agreements

The findings bear impact on our understanding of i) the public’s attitude towards trade agreements and how these vary between countries, ii) the disconnect between what the public want from a trade deal and the perceived agenda of the UK government, and iii) the value of clear communication between policy makers and the general public about trade deals and their implications. On a more methodological level, we showcase the value of employing a dual approach of conceptual and corpus analysis to provide an overview of key themes within a data set as well as a more detailed and contextualised investigation of stances and attitudes expressed in relation to these topics. But, most importantly, this research impacts the PETRA network in their research and shaping trade-related public policy in the UK.

For a more detailed account of our findings, you can read our full report here.

Bridging the gap between computational linguistics and concept analysis

Better together

Bridging the gap between computational linguistics and concept analysis

"Bridging The Gap" by MSVG is licensed under CC BY 2.0.

One of our priorities at the Concept Analytics Lab is to utilise computational approaches in innovative explorations of linguistics. In this post I explore the disciplinary origins of computer science and linguistics and present areas in which computational methodologies can make meaningful contributions to linguistic studies.

 

The documented origins of computer science and linguistics place the fields in different spheres. Computer science developed out of mathematics and mechanics. Many of the forefathers of the field, the likes of Charles Babbage and Ada Lovelace, were first and foremost mathematicians. Major players in the development of the field in the twentieth century were often mathematicians and engineers, such as John von Neumann and Claude Shannon. On the other hand, linguistics developed out of philology and traditionally took a comparative and historical outlook. It was not until the early 20th century and the work of philosophers such as Ferdinand de Saussure, when major explorations into synchronic linguistics began.

 

The distinct origins of computer science and linguistics are still visible in academia today. For example, in the UK and other western universities, computer science is situated in STEM subjects, and linguistics often finds a home with humanities and social sciences. The different academic homes given to linguistics and computer science often poses a structural barrier to interdisciplinary study and creation of synergies between the two disciplines. 

 

Recent research shows that the merging of linguistic knowledge with computer science has clear applications for the field of computer science. For example, the language model BERT (Devlin et al., 2018) has been used by Google Search to process almost every English-based query since late 2020. But we are only just beginning to take advantage of computational techniques in linguistic research. Natural language processing harnesses the power of computers and neural networks to swiftly process and analyse large amounts of texts. This analysis complements traditional linguistic approaches that involve close reading of texts, such as narrative analysis of language, discourse analysis, and lexical semantic analysis.

 

One particularly impressive application of computational linguistics in the analysis of semantic relations is the word2vec model (Mikolov et al., 2013). word2vec converts words into numerical vectors and positions them across vector space. This process involves grouping semantically and syntactically similar words and distancing semantically and syntactically different. Through this process corpora consisting of millions of words can be analysed to identify semantic relations within hours. However, this information, as rich as it is, still needs to be meaningfully interpreted. This is where the expertise of a linguist comes in. For instance, word2vec may identify pairs of vectors between which the distance increased across different time periods. As linguists, we can infer that the words these vectors represent must have changed semantically or syntactically over time. We can rely on knowledge from historians and historical linguists to offer explanations as to why that change has occurred. We may notice further that similar changes occurred amongst only one part of speech, or note that the change first occurred in language of a particular author or a group of writers. In this way, the two fields of computer science and linguistics necessarily rely on each other for efficient, robust, and insightful research.

 

At the Concept Analytics Lab, we promote the use of computational and NLP methods in linguistic research, exploring benefits brought by the convergence of scientific and philological approaches to linguistics. 

 

References

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018) ‘Bert: Pre-training of deep bidirectional transformers for language understanding.’ Available at https://arxiv.org/abs/1810.04805

 

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. (2013) ‘Efficient estimation of word representations in vector space.’ Available at https://arxiv.org/abs/1301.3781

 

Covid-19 Crisis to Net Zero: A story of dedication and doubt in recycling

Covid crisis to net zero: a story of dedication and doubt in recycling

In March 2022, the Concept Analytics Lab was awarded a grant from the Higher Education Innovation Fund (HEIF), run in partnership with the Sussex Sustainability Research Programme (SSRP). The call this year was aimed at addressing the critical challenges of Covid-19 recovery and climate change in an integrated way. We partnered with Africa New Energies (ANE), with their visualisation expertise. The Mass Observation project (MO) provided us with the data for an opportunity to apply our linguistic and computational analysis techniques to a new dataset.

Mass Observation directives focus on specific themes and ask volunteers called observers to write narrative responses to a series of questions on those themes. Every response is stamped with detailed information of each contributor-observer. In this case we studied the 2021 Household Recycling directive. This blog highlights key findings of our report and how our findings could feed into improving recycling performance in the UK.

 

What is waste?

We applied semantic corpus analysis to the 2021 MO Household Recycling directive to identify to identify key themes within the directive and then use this as a spring board to understand what people said about those themes and their positions on them. Waste, unsurprisingly, was one of those themes. We identified a number of near synonyms for waste, such as litter, debris and rubbish. We delved deeper by searching on the verbs that appear alongside the various words identified for waste, to identify what other objects are categorised as waste. Here we identified that a mix of different objects that are variably reused, recycled and thrown away depending on context:  
Figure 1: Collocates of reuse versus recycle.

Figure 1 shows us that many objects, depending on context, can either be reused or recycled: packaging, containers, waste (usually collocated as food, garden, household), pot, paper, plastic, bag.

 

Figure 2: Collocates of throw versus reuse.
Figure 3: Collocates of throw versus recycle.

In Figure 2 we see the perceived use of containers of different types in the way that they solely collocate with reuse and not throw (tubs and pots being types of containers). Jar is used in the context of both recycle and throw. When we look at the collocates this seems to be because of the contradiction between keeping, then throwing out if no use is found for said jars.

Here are some examples of the use of jars in context:

  • I do keep some lidded jars for when I make jams and chutneys and sometimes I keep plastic containers to use for sowing seeds in or if they have lids for storage.
  • Yes, I keep glass jars to reuse for leftovers, gardening, spices, herbs, food prep etc. I keep plastic containers from vegetables to use as dividers in the fridge. I reuse food bags and bread bags for leftovers and portioning items put into the freezer.
  • I regularly hang out a bag of used jam jars for a friend who returns the favour by dropping off delicious home-made preserves, pickles and jams.
  • I have a fondness for jars and biscuit tins, so some may linger in the garage until I find a use for them, or until I throw them out too.

Why is it waste?

Figures 1, 2 and 3 show us that packaging and plastic are important collocates for all three verbs (reuse, recycle and throw). When we look closer at the data we see this seems to be in relation to a number of issues: uncertainty, variation in services and multiplicity of packaging.

Here are some examples of the use of jars in context:

  • Items not recycled tend to be plastic film of food packaging, hard plastic and black pastic [sic] which is marked as unrecyclable.
  • We do not recycle black plastic, plastic bags, and polystyrene due to /a lack of facilities.
  • Plastic is the most difficult to recycle. Why can’t it all be recycled? (Like most people, we have to throw away as much plastic as we recycle) My wife and daughter are fanatical about it.
  • We can recycle most things but plastics are the most difficult items to get right.
  • I try to recycle as much as possible but also aim to reduce my consumption of goods that are plastic wrapped or in plastic-coated packaging that can’t be recycled.
  • I’m aware that black plastic food boxes or yoghurt pots can’t be recycled but uncertain about some clear plastic packaging for instance meat and fruit boxes.
  • XXXX Council has an excellent Waste Wizard on their website describing exactly what to recycle and how to prepare it. XXXX Council’s webguide is less detailed, and in general they recycle a smaller range of plastics and don’t accept tetrapacks as XXXX does.
  • In our area we are not able to recycle a huge amount of items – only plastic with numbers 1,2 and 3, for instance, so a lot of plastic packaging regrettably cannot be recycled
  • We find it annoying and frustrating to find so much plastic that cannot be recycled – e.g. plastic bags and containers – crisp bags, coffee packs, inner packing of biscuit/confectionery etc.

Where does it go?

Coupled with uncertainty about plastics and other types of recycling, respondents expressed doubt as to what happens to recycling once it is taken away.

  • I can’t say that I know what happens to the recycling once it is collected hence my very general answer but I do think I should find out.
  • I honestly don’t know what happens to our recycling once it is collected.
  • Do not really know what happens to the waste and whether it is put to good use.
  • I don’t know what happens with garden waste in terms of collection.

This is coupled with a desire to know more about what happens:

  • Its till a mystery, I don’t think local councils are transparent to what happens to our discarded rubbish and food waster.
  • I do wish our council would come clean and do a huge item in the local paper (or on video) showing us what happens to all our stuff!
  • Which types and numbers are acceptable is not given by the council, only vague guidance like plastic milk bottles are OK but not yoghurt pots – why?
  • A lot is dependant upon Government funding but there is much more than the council can do in educating the public perhaps by starting with schools.

Our duty

Despite this doubt and cynicism in the process, respondents showed their clear commitment to recycling as a process, accompanied with a feeling of duty.

  • I recycle because it is the right thing to do for environmental reasons.
  • I recycle because I believe it is my duty as a citizen to do so, it is part of my very small contribution to addressing climate change along with a general desire to, where possible, reduce my carbon footprint.
  • I recycle because I believe it is a responsible action to reduce waste to help the environment, wildlife and also less developed countries who are in immediate danger from the effects of climate change.

How can we improve?

The combination of lack of standardisation and information on the part of institutions such as the council, combined with this feeling of duty and responsibility as citizens on the part of the Mass Observation respondents, indicates that there might be a window of opportunity to intervene and improve recycling rates and quality through education and information sharing.

The findings in this report support research in other areas (Burgess et al (2021) and Zaharudin et al (2022) about the need for greater standardisation and reorganisation of recycling networks to maximise adherence to and performance of the recycling system. Our findings also suggest that another way to improve this would be the increase the information given to citizens about recycling processes, particularly in relation to what happens to recycling once it is collected.

 

References

Burgess, Martin, Helen Holmes, Maria Sharmina and Michael P. Shaver. (2021). The future of UK plastics recycling: One Bin to Rule Them All, Resources, Conservation and Recycling, January 2021, Vol. 164.

Zaharudin, Zati Aqmar, Andrew Brint, Andrea Genovese. (2022). Multi-period model for reorganising urban household waste recycling networks, Socio-economic planning sciences, July 2022.