Category Archives: Blog

How does concept-led research matter?

How does concept-led research matter? 

How does concept-led research matter?

by Caitlin Hogan


Our Concept Analytics Lab (CAL) team LOVES concepts. In our daily work, we keep seeing the value of the concept-based view of language in bringing insight to thinking, attitudes, and behaviours of people. But how important is the concept-based research for a wider linguistic community? Can concept-based research impact other disciplines and industries? Can you commercialise your concept-based knowledge?

With the aim of consolidating research and application of concept-based approaches to text analysis we gathered experts in the field for the first Concept Quest conference.


The event Concept Quest: Navigating Ideas on and Through Linguistic Concepts took place in March 20204 at the University of Sussex. It focussed on the work of CAL and other researchers from a range of academic disciplines. We hosted talks and panels from scholars studying everything from AI concepts to the impact of trade deals on the economy and commercialising concepts in the process of wine production.


Justyna Robinson, the Director of the Concept Analytics Lab, started by talking about the aims and advantages of concept mining as a methodology. Concepts are not encapsulated by a single word but are be observable by a set of words, phrases and/or constructions. This allows us to understand how individual terms might be used differently over time, and how they may come to represent different concepts. CAL’s researcher Rhys Sandow then discussed how one can visualise conceptual ontologies and showed how one can turn complex sets of lexical relations into clear diagrammatic representations. Such representations can shed light on conceptual, including socio-conceptual, differences that are inaccessible to more traditional approaches to the analysis of large texts.

Following this, Louise Sylvester (Westminster) talked about how concepts can be incorporated into studies of Medieval English. Her work focuses on the adoption of terms from French into English during this period, and through the use of a semantic hierarchy, she is able to inspect in which cases French pushed out the English variant, and in which cases this did not occur. The use of concepts allows us to see the patterns that emerge in synonym relationships, even from long ago.


Haim Dubossarsky (QMUL) approached the study of concepts from a computational angle, discussing the ways in which we currently carry out computational and corpus linguistics, such as collocations, and how we can improve on these methods. Through the projection of a word’s usage onto a series of vectors, one is able to map the meanings of the word and their change over time. This technique provides a computational boost to the analysis of meaning and represents an important link between the world of linguistics and that of computer science that the Concept Analytics Lab covets.


The talks on theoretical and methodological aspects of doing concept research were complemented by talks addressing applications of concepts in archival work and in commercial endeavours. 


Piotr Nagórka (Warsaw’s Cultural Terminology Lab) discussed the exploration of communications systems and terminological sciences. He probed how the terminology we use to refer to types of wine maps onto production process itself. In this case, for wine. His work shows how one might commercialise concept research by marrying the study of concepts with processes and techniques within the manufacturing sciences.

Angela Bachini and Kirsty Patrick, who work on the Mass Observation project helped us understand how archivists arrive at identifying important concepts in indexing of a new text. We learned a great deal from the Mass Observation team about their workflow and how we as researchers can best help archivist to automate indexing via key-concept detection.

The event finished with a panel discussion on why concepts matter led by Lynne Murphy (Sussex), in which Piotr Nagorka, Kirsty Pattrick, were joined by Julie Weeds (Sussex AI) and Alan Winters (Sussex, CITP).  Alan reflected on the value of concepts in trade analysis, particularly to understand the trade-offs that people are willing to make with regard to global trade. These kind of complex attitudes are difficult to access with other methods, particularly the quantitative methods often used in economics. The advantage of concept analysis, where participants can describe their accounts in rich detail which can then be computationally analysed, is clear in this case. Louise Sylvester added that in her work on Medieval English, concepts help us understand how people living in that era made sense of the world and what categories were meaningful for them. This helps greatly with noticing patterns of use in historical linguistics, and also helps us to understand how the concept of something like a farm has changed from the middle ages to the present day.


We continued chatting over some delicious wine (thanks to a generous sponsorship from Mass Observation) and made new connections across institutions and fields.  This is exactly the kind of result we envisage from a successful colloquium, and we were proud to have hosted such a stimulating day. Our gratitude extends to all the wonderful speakers and attendees for making this event so brilliant!


To conclude our reflections, the Concept Quest highlighted the value of concept-based and concept-led research and applications. Researching concepts matters for theory of language and knowledge representation as we consider conceptual hierarchies, lexicalised and non-lexicalised concepts, and emergence of new concepts/ideas. At a methodological level, concepts pose a challenge for traditional word-based corpus and NLP techniques. Therefore, new ways of extracting conceptual information from big data is needed.  At a more applied level, empirical ways of gaining access to conceptual information are invaluable for other sectors and disciplines which use large text data. Thus, strengthening objectivity and replicability of concept research will open up this research for other sectors which seek more expert analyses.  That development can also lead to impactful research and even commercialisation of conceptual research.


Please get in touch here to find out which key concepts and themes are revealed in your data. 



Robinson, J. A., Sandow, R. J., & Piazza, R. (2023). Introducing the keyconcept approach to the analysis of language: the case of regulation in COVID-19 diaries. Frontiers in artificial intelligence, 6.

Nagórka, P. (2021). Madeira, Port, Sherry. The Equinox Companion to Fortified Wines. Equinox Publishing Limited.

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.


Identifying key content from surveys

How can you identify key content from surveys?

by Rhys Sandow and Justyna Robinson


A case of responses to the Labour Party’s 2023 Trade Policy Forum


Surveys which collect responses to open questions are a popular and valuable way of gauging peoples’ attitudes. But they also present specific challenges for keyness analysis in corpus linguistics as the results can be misleading. For example, a high frequency of term X may be skewed by one or two documents within the corpus, rather than being representative of attitudes among the survey respondents more broadly. In such cases, traditional corpus linguistic measures of difference, such as relative frequencies or keyness are not appropriate. In such cases, we advocate for the use of measures of dispersion across a corpus, such as Average Reduced Frequency (ARF) and Document Frequency (DOCF).  This distinction between frequency and dispersion is critical to develop meaningful insights into large data sets, particularly in the context of policy consultation where an understanding of plurality and consensus is highly important.


Let us demonstrate how to solve this problem on the basis of examples from data we recently analysed.  Concept Analytics Lab (CAL) was tasked by the UK Trade Policy Observatory (UKTPO) to analyse responses to the Labour Party’s Trade Policy Forum in the build-up to the Labour Party’s annual conference in October 2023. The survey gathered 302 answers to seven questions comprising c. 250,000 words of data. Many of the submissions came from groups with very particular interests, such as specific industries or specific local communities. Therefore, some responses contained detailed discussions of issues critically important to the submitter, but not necessarily widespread among all respondents. For example, when running a keyword analysis, the eighth most key word (with the ententen21 corpus as our baseline) was gpi (genuine progress indicator) with 35 hits across the corpus. However, upon closer inspection, these hits are spread across only 2 of the 302 responses. Thus, while gpi has a high keyness score, it cannot be said that it is a salient topic across the corpus as its use is so highly concentrated across 0.66% of documents.

In order to remedy this limitation of keyness analysis, we considered the spread of terms across the corpus using Sketch Engine’s Average Reduced Frequency (ARF) statistic. ARF is a modified frequency measure that prevents results being skewed by a specific part, or a small number of parts, of a corpus (for more detail on the mathematics behind the measure, see here). Where the ARF and absolute frequency are similar, this suggests a relatively even distribution of a given term across a corpus. However, when there are large discrepancies between the absolute frequency and ARF, this is indicative of a skew towards a small subset of the corpus. For example, while the absolute frequency of gpi in the corpus is 35, the ARF is 2.7 (DOCF, 2), highlighting its lack of dispersion. Similarly, the term gender-just has an absolute frequency of 19 but an ARF of 1.32 (DOCF, 1), highlighting that this term is not characteristic of the data set as a whole, but is highly salient within a small subset of the corpus. By contrast, labour, with an absolute frequency of 1, 434 had an ARF of 725.74 (DOCF, 226), highlighting its spread across the corpus.

When analysing corpus data, methodological decisions can have highly impactful repercussions for the analysis. For example, let’s take the top 10 key multi-word terms from the Labour Party Policy Forum data set ordered by keyness score (see Table 1) and compare it with the top 10 multi-word terms ordered by the highest ARF statistic (see Table 2).

Table 1: The top multi-word terms, ordered by keyness score
Table 2: The top multi-word terms, ordered by ARF

This analysis highlights, in particular, two obvious outliers, namely ‘human rights defender’ and ‘modern slavery’. The low DOCF and ARF scores highlight that they are highly concentrated within a small number of submissions and, so, are not characteristic of the data set more broadly. 

While no multi-word term occurs in the majority of documents, table 2 provides a perspective on the most broadly dispersed multi-word terms.  It is important to note the substantial overlap between the two measurements in tables 1 and 2, e.g. ‘trade policy’, ‘trade deal’, ‘trade agreement’, ‘international trade’, and ‘labour government’, appear in both. However, the advantage of the ARF ordered data is that there are no clear outliers, skewed by individual, or a very small number of, responses. This means that it is the second data which provides a more valid overview of the content of the data set.

Using a traditional approach to keyness analysis, conclusions may recommend interventions around trade and human rights defenders or modern slavery. However, an analysis of ARF highlights that this is misleading and does not get to the essence of the data set. What is more, policy recommendations based on the former statistic only may result in the disproportionate influence of those who lobby in relation to very specific terms at the expense of more widespread priorities and concerns.


This ARF analysis formed part of our analysis of the 2023 Labour Party’s Policy Forum that we conducted for the UKTPO, which can be accessed here.


If you are interested in our data analysis services or partnering with us in any way, please contact us here



Labour Policy Forum (2023). National Policy Forum Consultation 2023. Britain in the World..

Gasiorek, M and Justyna Robinson. (2023) What can be learnt from the Labour Party’s consultation on Trade? UKPTO Blog. 

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.


Survey of English Usage zooms on concepts

Survey of English Usage zooms on Covid-19 concepts

by Caitlin Hogan


Lab leader Dr Justyna Robinson gave a talk at University College London (UCL) as part of the Survey of English Usage Seminar Series about the work of the Concept Analytics Lab. Her talk covered a wide range of issues in the realm of concept analytics, including how to draw out concepts from written accounts via the Mass Observation Archive dataset. She focussed in particular on the role of concept change during the COVID-19 pandemic, when lifestyle changes forced people to adapt their routine, and thus the concepts they mention in their daily accounts to shift, in some cases drastically. 


The Mass Observation Archive began in 1937 founded by Tom Harrisson, Charles Madge and Humphrey Jennings, and its original tenure ran until the 1960’s, at which point it became defunct. Originally inspired by the founders’ desire to capture public opinion on the abdication of King Edward VIII, by 1939 the project aimed to have ordinary people record the day-to-day experiences of their lives, and nearly 500 did. This creates an invaluable documentation of peoples’ habits, lives, and thoughts, acting almost as a time capsule. In 1981, it was revived at the University of Sussex and continues to collect qualitative accounts of ordinary peoples’ lives and opinions to this day. Every 12th of May (chosen as it was the anniversary of the coronation of King George VI), the project calls for anyone to submit a record of their activity on that day, in honour of the original 1937 call going out on that same day.  The 12th May diaries collected  during COVID-19 pandemic were digitised by a grant provided by the Wellcome Trust. Digitised diaries from the first lockdown in the UK, i.e. 12th May 2020, were the focus of Justyna’s talk. 


Justyna discussed how records of ordinary peoples’ activities during lockdown marked a shift towards concepts such as REGULATION, which may be expected, but also the discussion of furniture, given the struggles we all had to adapt to working from home.  Excerpts from the diaries on this theme include the following examples:


  • most of the online activities I could cast from my phone to the TV or could be done on my phone, which was vital during the early stages of lockdown, as XXXX was using the home laptop to work remotely, until he received a laptop through work
  • I’m working from home and the work PC is on an old computer desk so giving me a 2foot space to work in. 
  • I can also stretch and do yoga during my working day and sit at a desk that is the right size for me- I am very petite and used to feel uncomfortable in the chairs in meeting rooms, designed for men. 


As these examples show, participants mention the struggles of accommodating working from home with limited resources in terms of space and furniture for use while working, and the struggles coexisting while some household members work, and others use furniture for other purposes. The examples illustrate clearly that we can talk about the same concept without using the exact same words, so this commonality would be lost if we only used simple corpus linguistic techniques in this analysis. As explained in the Robinson et al (2023), terms like restriction, freeze, coordination, and clampdown emerged while talking about regulations in the COVID-19 pandemic but were not exactly the word regulation itself. Linking these lexemes together allows a clearer picture to emerge of what topics participants wrote in their diaries. The insight into which concepts participants found important during lockdown would not have been detectable without concept analysis,  and especially invoking the notion of a keyconcept (Robinson et al, 2023),



As the lab continues to refine tools for concept analysis, talks such as this one is key to spread the word to new and emerging scholars about the role of concepts when surveying English usage. 



Robinson J.A., Sandow R.J. and Piazza R. (2023) Introducing the keyconcept approach to the analysis of language: the case of REGULATION in COVID-19 diaries. Front. Artif. Intell. 6:1176283. doi: 10.3389/frai.2023.1176283 

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.


Concept Quest Event, 11th March 2024​

Concept Quest Event, 11th March 2024

by Caitlin Hogan

Concept Quest: Navigating ideas on and through linguistic concepts

Our lab will be part of an exciting event in collaboration with the University of Sussex Digital Humanities Lab and the Mass Observation project. Our session will cover our work on concept analysis through some of our recent projects. The team is excited to attend and present at such a thought-provoking gathering!


Be sure to check back here after the event for another blog post and photos! 

Register for the event here:



About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.


How do concepts matter for language in the human-machine era?

How do concepts matter for language in the human-machine era?

by Justyna A. Robinson and Sandra Young

Questions of conceptual content in language are important to applications relying on human-machine models of language. In this context, Concept Analytics Lab has been awarded funding from the COST (European Cooperation in Science and Technology) Action Network initiative. The aim of the grant was to explore synergies and ideas through events organised by LITHME (Language in the Human Machine Era) research programme, such as

What is LITHME?

LITHME (Language in the Human Machine Era) was launched in 2020 as an EU COST Action network. The aim of LITHME is to explore questions relating to the interface between language and technology in what the network calls the human-machine era, because of the pervasive nature of new technologies and their disruptive potential. Language is an essential aspect of this technology, but experts across the spectrum from linguistics to computer science tend to work isolated from each other. The LITHME initiative aims bridge that gap and bring experts from the different linguistic and computer science realms together to tackle potential issues and amplify the potential benefits of state-of-the-art language technologies. The network does this through eight Working Groups, i.e. WG 1: Computational Linguistics; WG 2: Language and Law; WG 3: Language Rights; WG 4: Language Diversity, Vitality and Endangerment; WG 5: Language Learning and Teaching; WG 6: Language Ideologies, Beliefs and Attitudes; WG 7: Language Work, Language Professionals; WG 8: Language Variation. Members of Concept Analytics Lab collaborate with WGs 1 and 3.

LITHME conference

The LITHME Conference brings together researchers and experts from various areas of linguistics and language technology to prepare and to shape language and technology research in the human-machine era. Justyna Robinson represented the research of Concept Analytics Lab at the LITHME conference by presenting a paper entitled ‘Machine-led concept extraction’. The talk instigated further discussions about the relationship between concepts, language, and NLP methodologies.  

LITHME training school

The LITHME training school seeks to bring researchers and professionals working on the interconnection between language and technology to come together and share ideas about multiple aspects of this new frontier. Sandra Young attended and shares some highlights below.

The training school is primarily a networking event. It was interactive and provided an excellent opportunity to meet people across a whole spread of research and industry fields. The international nature of the event also provided an array of participants to learn with and from. I found the eclectic nature of people’s backgrounds particularly inspiring: there were doctoral students to professors, sociolinguists looking at ethics and how technologies are changing research methods, to computer scientists working LLMs and using robots and teaching aids for autistic children. It was also enriching to be in a space with people from all over Europe (and beyond), to be able to share different experiences and the differentiated experience of the same technologies or elements through different linguistic lenses.


The training school fed us with a lot of information about different aspects of language technology and our world today. Of this, information relating to the unequal access to technology and availability of linguistic data has really embedded itself into my mind: forty per cent of the world’s population have no access to/do not use the internet. That is not far from half. When we talk about ‘today’s data-driven world’ we are excluding nearly half the world population. Then, who writes the internet, and in what language? English represents over 80% of the internet. And the content is written primarily by only certain people within these societies. The question of who language technology serves and who it excludes is a huge issue that is rarely the focus of conversation, and one that needs to be central when we are thinking that LLMs and the modelling of technologies will shape our society, our thoughts and what people take to be ‘true’.


But on a theoretical level, the question that interests me most was mentioned right at the beginning by Rui Sousa de Silva and Antonio Pareja Lora was the question of understanding. I have always been of the mindset that computers don’t understand what they are generating, not in the way we understand things. It is why we need the human element within technologies to provide this real-world view, why computers produce inconsistencies that strike us as strange. They work in a different and complementary way to us. But what about humans? I have thought about humans and understanding throughout my life as a translator and interpreter, often marvelling that we understand each other at all. But I had not given it much thought specifically in the context of language technology. Does it matter that computers don’t understand? How can the abilities of computers (lots of data, computing power) be leveraged to support humans where they excel (specialist expertise and a real understanding of texts and the world)?

The training school was a melting pot of minds: from tech to human, those embarking on their careers and those reaching the pinnacles of theirs, different languages, experiences and life journeys. The meeting of minds provided by LITHME is also a key element of our work at Concept Analytics Lab—the attempt to build bridges and work together to forward the language/technology divide through shared experience. In our little corner of work, our aims in that sense align very well with LITHME aims and we look forward to exploring further shared ideas and synergies.

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.


What can a conceptual analysis tell us about public attitudes to post-Brexit trade?

What can a conceptual analysis tell us about public attitudes to post-Brexit trade?

by Rhys Sandow

Concept Analytics Lab were commissioned by the Prevention Research Partnership (PETRA) to investigate the attitudes towards trade expressed in responses to the Mass Observation Project’s 2021 UK Trade Policy directive. PETRA is a network of cross-disciplinary academics and other experts from NGOs and charities who are united by their goal of determining how trade agreements can be used to prevent disease and improve health. The long-form narratives from the general public gathered by the Mass Observation’s Trade Policy directive on the topic of trade deals that are unparalleled in scope.

Here at Concepts Analytics Lab, we employed both established and bespoke computational linguistic tools to extract key themes (concepts) in the data and then conducted corpus analysis to provide a close reading of these salient topics. Our analysis, which comprises 125 responses to the directive, totalling 56,840 words, identified the top 100 concepts that are characteristic of the Trade directive (its conceptual fingerprint, see Figure 1).


Figure 1: The top 100 concepts in the Mass Observation’s Trade directive.

Using these key concepts as our starting point, we conducted corpus analysis, from which we identified a range of themes in relation to trade policy including:

  • Ethical concerns
    • Human rights issues in China
    • Environmental impact of long-distance trade
    • Animal welfare in the USA and Australia
    • Likelihood of standard to decrease post-Brexit
  • A desire for health and environment to be priorities in any future trade deal
  • A perception that EU standards are world-leading
  • Leaving the EU is an opportunity to support local produce to a greater extent
  • A belief that trade deals will not impact individuals or their local communities
  • A belief that while environmental concerns were a priority for respondents, the government’s key concerns were financial
  • A general acknowledgement of a lack of awareness of the intricacies of trade agreements

The findings bear impact on our understanding of i) the public’s attitude towards trade agreements and how these vary between countries, ii) the disconnect between what the public want from a trade deal and the perceived agenda of the UK government, and iii) the value of clear communication between policy makers and the general public about trade deals and their implications. On a more methodological level, we showcase the value of employing a dual approach of conceptual and corpus analysis to provide an overview of key themes within a data set as well as a more detailed and contextualised investigation of stances and attitudes expressed in relation to these topics. But, most importantly, this research impacts the PETRA network in their research and shaping trade-related public policy in the UK.

For a more detailed account of our findings, you can read our full report here.

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.



Bridging the gap between computational linguistics and concept analysis

Better together

Bridging the gap between computational linguistics and concept analysis

"Bridging The Gap" by MSVG is licensed under CC BY 2.0.

One of our priorities at the Concept Analytics Lab is to utilise computational approaches in innovative explorations of linguistics. In this post I explore the disciplinary origins of computer science and linguistics and present areas in which computational methodologies can make meaningful contributions to linguistic studies.


The documented origins of computer science and linguistics place the fields in different spheres. Computer science developed out of mathematics and mechanics. Many of the forefathers of the field, the likes of Charles Babbage and Ada Lovelace, were first and foremost mathematicians. Major players in the development of the field in the twentieth century were often mathematicians and engineers, such as John von Neumann and Claude Shannon. On the other hand, linguistics developed out of philology and traditionally took a comparative and historical outlook. It was not until the early 20th century and the work of philosophers such as Ferdinand de Saussure, when major explorations into synchronic linguistics began.


The distinct origins of computer science and linguistics are still visible in academia today. For example, in the UK and other western universities, computer science is situated in STEM subjects, and linguistics often finds a home with humanities and social sciences. The different academic homes given to linguistics and computer science often poses a structural barrier to interdisciplinary study and creation of synergies between the two disciplines. 


Recent research shows that the merging of linguistic knowledge with computer science has clear applications for the field of computer science. For example, the language model BERT (Devlin et al., 2018) has been used by Google Search to process almost every English-based query since late 2020. But we are only just beginning to take advantage of computational techniques in linguistic research. Natural language processing harnesses the power of computers and neural networks to swiftly process and analyse large amounts of texts. This analysis complements traditional linguistic approaches that involve close reading of texts, such as narrative analysis of language, discourse analysis, and lexical semantic analysis.


One particularly impressive application of computational linguistics in the analysis of semantic relations is the word2vec model (Mikolov et al., 2013). word2vec converts words into numerical vectors and positions them across vector space. This process involves grouping semantically and syntactically similar words and distancing semantically and syntactically different. Through this process corpora consisting of millions of words can be analysed to identify semantic relations within hours. However, this information, as rich as it is, still needs to be meaningfully interpreted. This is where the expertise of a linguist comes in. For instance, word2vec may identify pairs of vectors between which the distance increased across different time periods. As linguists, we can infer that the words these vectors represent must have changed semantically or syntactically over time. We can rely on knowledge from historians and historical linguists to offer explanations as to why that change has occurred. We may notice further that similar changes occurred amongst only one part of speech, or note that the change first occurred in language of a particular author or a group of writers. In this way, the two fields of computer science and linguistics necessarily rely on each other for efficient, robust, and insightful research.


At the Concept Analytics Lab, we promote the use of computational and NLP methods in linguistic research, exploring benefits brought by the convergence of scientific and philological approaches to linguistics. 



Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018) ‘Bert: Pre-training of deep bidirectional transformers for language understanding.’ Available at


Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. (2013) ‘Efficient estimation of word representations in vector space.’ Available at


About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.


Covid-19 Crisis to Net Zero: A story of dedication and doubt in recycling

Covid crisis to net zero: a story of dedication and doubt in recycling

In March 2022, the Concept Analytics Lab was awarded a grant from the Higher Education Innovation Fund (HEIF), run in partnership with the Sussex Sustainability Research Programme (SSRP). The call this year was aimed at addressing the critical challenges of Covid-19 recovery and climate change in an integrated way. We partnered with Africa New Energies (ANE), with their visualisation expertise. The Mass Observation project (MO) provided us with the data for an opportunity to apply our linguistic and computational analysis techniques to a new dataset.

Mass Observation directives focus on specific themes and ask volunteers called observers to write narrative responses to a series of questions on those themes. Every response is stamped with detailed information of each contributor-observer. In this case we studied the 2021 Household Recycling directive. This blog highlights key findings of our report and how our findings could feed into improving recycling performance in the UK.


What is waste?

We applied semantic corpus analysis to the 2021 MO Household Recycling directive to identify to identify key themes within the directive and then use this as a spring board to understand what people said about those themes and their positions on them. Waste, unsurprisingly, was one of those themes. We identified a number of near synonyms for waste, such as litter, debris and rubbish. We delved deeper by searching on the verbs that appear alongside the various words identified for waste, to identify what other objects are categorised as waste. Here we identified that a mix of different objects that are variably reused, recycled and thrown away depending on context:  
Figure 1: Collocates of reuse versus recycle.

Figure 1 shows us that many objects, depending on context, can either be reused or recycled: packaging, containers, waste (usually collocated as food, garden, household), pot, paper, plastic, bag.


Figure 2: Collocates of throw versus reuse.
Figure 3: Collocates of throw versus recycle.

In Figure 2 we see the perceived use of containers of different types in the way that they solely collocate with reuse and not throw (tubs and pots being types of containers). Jar is used in the context of both recycle and throw. When we look at the collocates this seems to be because of the contradiction between keeping, then throwing out if no use is found for said jars.

Here are some examples of the use of jars in context:

  • I do keep some lidded jars for when I make jams and chutneys and sometimes I keep plastic containers to use for sowing seeds in or if they have lids for storage.
  • Yes, I keep glass jars to reuse for leftovers, gardening, spices, herbs, food prep etc. I keep plastic containers from vegetables to use as dividers in the fridge. I reuse food bags and bread bags for leftovers and portioning items put into the freezer.
  • I regularly hang out a bag of used jam jars for a friend who returns the favour by dropping off delicious home-made preserves, pickles and jams.
  • I have a fondness for jars and biscuit tins, so some may linger in the garage until I find a use for them, or until I throw them out too.

Why is it waste?

Figures 1, 2 and 3 show us that packaging and plastic are important collocates for all three verbs (reuse, recycle and throw). When we look closer at the data we see this seems to be in relation to a number of issues: uncertainty, variation in services and multiplicity of packaging.

Here are some examples of the use of jars in context:

  • Items not recycled tend to be plastic film of food packaging, hard plastic and black pastic [sic] which is marked as unrecyclable.
  • We do not recycle black plastic, plastic bags, and polystyrene due to /a lack of facilities.
  • Plastic is the most difficult to recycle. Why can’t it all be recycled? (Like most people, we have to throw away as much plastic as we recycle) My wife and daughter are fanatical about it.
  • We can recycle most things but plastics are the most difficult items to get right.
  • I try to recycle as much as possible but also aim to reduce my consumption of goods that are plastic wrapped or in plastic-coated packaging that can’t be recycled.
  • I’m aware that black plastic food boxes or yoghurt pots can’t be recycled but uncertain about some clear plastic packaging for instance meat and fruit boxes.
  • XXXX Council has an excellent Waste Wizard on their website describing exactly what to recycle and how to prepare it. XXXX Council’s webguide is less detailed, and in general they recycle a smaller range of plastics and don’t accept tetrapacks as XXXX does.
  • In our area we are not able to recycle a huge amount of items – only plastic with numbers 1,2 and 3, for instance, so a lot of plastic packaging regrettably cannot be recycled
  • We find it annoying and frustrating to find so much plastic that cannot be recycled – e.g. plastic bags and containers – crisp bags, coffee packs, inner packing of biscuit/confectionery etc.

Where does it go?

Coupled with uncertainty about plastics and other types of recycling, respondents expressed doubt as to what happens to recycling once it is taken away.

  • I can’t say that I know what happens to the recycling once it is collected hence my very general answer but I do think I should find out.
  • I honestly don’t know what happens to our recycling once it is collected.
  • Do not really know what happens to the waste and whether it is put to good use.
  • I don’t know what happens with garden waste in terms of collection.

This is coupled with a desire to know more about what happens:

  • Its till a mystery, I don’t think local councils are transparent to what happens to our discarded rubbish and food waster.
  • I do wish our council would come clean and do a huge item in the local paper (or on video) showing us what happens to all our stuff!
  • Which types and numbers are acceptable is not given by the council, only vague guidance like plastic milk bottles are OK but not yoghurt pots – why?
  • A lot is dependant upon Government funding but there is much more than the council can do in educating the public perhaps by starting with schools.

Our duty

Despite this doubt and cynicism in the process, respondents showed their clear commitment to recycling as a process, accompanied with a feeling of duty.

  • I recycle because it is the right thing to do for environmental reasons.
  • I recycle because I believe it is my duty as a citizen to do so, it is part of my very small contribution to addressing climate change along with a general desire to, where possible, reduce my carbon footprint.
  • I recycle because I believe it is a responsible action to reduce waste to help the environment, wildlife and also less developed countries who are in immediate danger from the effects of climate change.

How can we improve?

The combination of lack of standardisation and information on the part of institutions such as the council, combined with this feeling of duty and responsibility as citizens on the part of the Mass Observation respondents, indicates that there might be a window of opportunity to intervene and improve recycling rates and quality through education and information sharing.

The findings in this report support research in other areas (Burgess et al (2021) and Zaharudin et al (2022) about the need for greater standardisation and reorganisation of recycling networks to maximise adherence to and performance of the recycling system. Our findings also suggest that another way to improve this would be the increase the information given to citizens about recycling processes, particularly in relation to what happens to recycling once it is collected.



Burgess, Martin, Helen Holmes, Maria Sharmina and Michael P. Shaver. (2021). The future of UK plastics recycling: One Bin to Rule Them All, Resources, Conservation and Recycling, January 2021, Vol. 164.

Zaharudin, Zati Aqmar, Andrew Brint, Andrea Genovese. (2022). Multi-period model for reorganising urban household waste recycling networks, Socio-economic planning sciences, July 2022.


About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.