Author Archives: justyna.robinson@gmail.com

Conceptual variation: Gendered differences in the lexicalization of the concept of commodity in environmental narratives

Conceptual variation: Gendered differences in the lexicalisation of the concept of COMMODITY in environmental narratives

by Justyna A. Robinson, Rhys J. Sandow, Albertus Andito

 

An updated version of this work can be found as Chapter 10 in: Justyna A. Robinson; Rhys J Sandow, & Albertus Andito. (2026).  “Conceptual variation: Gendered differences in the lexicalization of the concept of COMMODITY in environmental narratives”. In Rhys J. Sandow & Natalie Braber (Eds) Sociolinguistic Approaches to Lexical Variation in English. 173-193. Routledge.

Abstract

Within studies of lexical meaning, conceptual variation has received little attention, possibly due to methodological difficulties with operationalising concepts. In the current paper, we build on the approach developed by Robinson & Weeds (2022) to study gendered variation in the lexicalization of concepts in environmentally-themed directives from the Mass Observation Project. We broadly define a concept as a cluster of (near-)synonymous and hyponymic terms representing a shared meaning. Previous research (Robinson & Weeds 2022) shows that gendered variation exists in collocational patterns of concepts. In the current paper, we focus on the varied ways in which a concept is lexicalized by men and women. A case-study of the concept of commodity shows that taxonomic differences between genders exist, with women using more specific terms to a greater extent than men. We suggest that the socially-variable articulation of concepts represents differences in speakers’ attention afforded to given commodities represented by the concept of commodity.

Keywords: conceptual variation, keyconcept, keysense, lexis, lexicalisation, sociolinguistics, gender, Mass Observation Project

1. Introduction

Variation in lexical meaning is typically investigated from the broad perspectives of semasiology and (formal) onomasiology.1Semasiology refers to the mapping of a single word form onto multiple senses, for example, wicked ‘evil’ and ‘good’. Formal onomasiological variation investigates the distribution of a plurality of words that are used to lexicalize a given meaning at the level of (near-)synonymy, for example, sofa is lexicalized as sofa, settee, and couch. This latter type of lexical variation is the one that has most widely been studied from a sociolinguistic perspective (see Sandow & Braber, this volume). However, the focus of this chapter is on conceptual onomasiological variation (for example, Grondelaers & Geeraerts 2003), that is, the way in which concepts are distributed and lexicalized in heterogenous ways.

There is a spectrum of methodological approaches to lexical variation that vary according to the degree of control that the researcher has over the lexical usage in the data that they work with. At one end of the spectrum where control is highest, researchers use elicitation tasks, including surveys (for example, Britain et al. this volume; Robinson 2010, 2012). Lexical variation can also be attested through semi-structured interviews (for example, Braber, this volume; Bucholtz 2012). At the other end of the spectrum, there are data which had been produced with no agency of researchers, such as newspaper articles, radio recordings. Such work typically engages with methods of Corpus Linguistics (for example, see Wilson this volume).

The development of Corpus Linguistics has allowed for investigating another layer of lexical variation, i.e. variation in meaning between texts more broadly as opposed to the formal onomasiological variation that relies on functional equivalence between variants. This is typically achieved through a keyword analysis (for example, see Baker 2004). Such analysis involves profiling a target dataset against a baseline and identifying those words (or phrases) that are distinctive of the target data. Keyword analysis offers a powerful approach in identifying the ‘aboutness’ (Kilgarriff 2009) of texts in a bottom-up fashion. However, this method profiles word forms, not their meanings. Subsequently, it is up to a researcher to conduct a post hoc, and often ad hoc, interpretation of the meanings from a keyword list in search for meaningful themes in the text.

In this paper, we apply a method for extracting semantic variation between texts by analysing meanings of words, rather than just lexical forms (à la keywords). We access linguistic meaning through the unit of concepts which we broadly define as a cluster of (near-)synonymous and hyponymic terms representing a given meaning. The principle of this method was first introduced by Robinson & Weeds (2022) and Robinson et al. (2023). In that work, Natural Language Processing tools are applied to analyse conceptual variation between texts through introducing the notion of a keyconcept.2 A keyconcept is a concept that occurs more often in a corpus than expected, as compared to a reference corpus (see Section 4 Method for a mathematical clarification of the definition). This approach allows for semantic and statistical interpretation of concepts that are characteristic of a given text.

Previous research shows that the distribution of conceptual themes in the language used by gender groups is not homogenous. Based on text samples from 70 studies the United States, New Zealand, and England, Newman et al. (2008: 219–20) found that compared to women, men were more likely to talk about sports, money, occupation and less likely to talk about home, family, and friends. For a more detailed discussion of language and gender, including the related differences in socialisation practices, see Eckert & McConnell-Ginet (2013). Robinson & Weeds (2022) also discovered the existence of variation in concepts used by male and female witnesses in courtrooms as well as differences in conceptual collocation patterns across genders.  In the current research, we ask if the gendered language also varies in terms of conceptual taxonymy. More specifically, we consider if men and women lexicalize concepts differently, that is, whether they engage with specific or general levels of a conceptual hierarchy, i.e. hyponymy vs. hypernymy, to differing extents.

The paper is structured as follows. First, we contextualise the current study within the growing body of work on concepts and lexicalization. Next, we explain the methodological approach that enables a conceptual analysis. Then, we turn to the dataset, which is the Mass Observation Recycling and Environmentalism (MORE) corpus. It consists of responses to three ‘directives’ on the topic of environmentalism collected by the Mass Observation Project. The analysis profiles the concepts within the dataset, including variation between men and women, using the concept of commodity (specifically commodity.n.01) as a case study. The choice of environmental narratives and the concept of commodity is motivated by the desire to understand better the characteristics of populations’ language and thinking in this economically- and socially-important area. We show the ways in which such a conceptual perspective highlights gender-based differences in behavioural practices such as that women engage with more concepts that pertain to domestic labour. We also identify variation in the lexicalization of concepts, with women typically using more specific (hyponymic) levels of the conceptual hierarchy than men. The analysis benefits from a range of visualisation tools. We conclude by advocating for the value of conceptual analysis and the opportunities it affords for lexical variation research. We note that this research is exploratory and serves as a proof-of-concept approach to the analysis of socially-variable patterns in lexicalization of concepts.

 

2. The Concept and Lexicalization

The last decade has seen an increased focus on concept-led linguistic research. One area that has led investigations into concepts is language change. A body of conceptual research has built on established historical thesauri, such the Bilingual Thesaurus of Everyday Life in Medieval England (BTh, Sylvester et al. 2017) or the Historical Thesaurus of the Oxford English Dictionary (HT, Kay et al. 2023), or on large corpora as in the Linguistic DNA project (Fitzmaurice et al. 2017).

Variation exists in the way the unit of a concept is operationalised in the studies of language. The HT and BTh operationalise concept as a sense or a group of senses expressed by a term or terms and placed within a taxonomic structure with other meanings. The HT presents a taxonomy which begins with the most general ways of expressing a concept, such as categories of The World, The Mind, Society, and moves hierarchically downwards to the most specific [ones]”.3 The HT takes the structure of Roget’s Thesaurus and imposes it on historical data, with a sensitivity to representing historical senses of words.The BTh draws on the HT structure with modifications that are suitable to mirror the medieval world. However, BTh classifies vocabulary into semantic roles, rather than a hierarchy.

A departure from this view of concepts is presented by the Linguistic DNA project which sees concepts as discursive clusters. According to Fitzmaurice et al. (2017: 25) “In any particular historical moment, a concept might not be encapsulated in any single word, phrase or construction; instead it will be observable only via a complete set of words, phrases or constructions in syntagmatic or paradigmatic relations to each other in discourse”. A discursive concept is made up of paradigmatic terms which habitually co-occur in language across large proximity windows. For example, Mehl (2022) shows that the discursive concept diversity-opinion-religion is made up of terms diversity, opinion, religion habitually co-occurring around 5000 times in EEBO-TCP. In other words, a frequent and mathematically significant occurrence of the trio diversity-opinion-religion indicates a possibility of an existence of an idea that was expressed by these three nouns in conjunction rather than by an individual term. Close reading of extracts representing the discursive concept allows for tracing the formulation of ideas regardless of whether they ever become encapsulated in a single term.

Language change research also pursues questions of modifications of different levels of conceptual hierarchy that happen as an outcome of language contact.  Sylvester et al. (2022) show that terms making up conceptual categories get reorganised when distinct communities come to contact. In research querying the absorption of French-origin borrowings into Middle English, Sylvester et al. (2020: 28) show that these borrowings tended to enter hypernymic (more general) levels of conceptual categories. Surprisingly, these French tended to occupy semantic spaces where there was more, not less, lexical choice. In research exploring the obsolescence, Vogelsanger (2024: 24) finds that most lexical loss happens also at hypernymic levels as “the more specific the concept, the fewer words and senses we find, but in turn they seem to be more resilient, since they show much lower rates of obsolescence”.

A significant area of study focuses on lexicalization of concepts. While lexicalization generally refers to “the assignment of lexeme to a meaning” (Murphy 2010: 16), historical linguists tend to focus on various aspects of this process. Thus, Trousdale (2008) asks how once closed-class words or phrases develop lexical meaning. Alexander (2018) or Dallachy (2024) investigate how words that map new concepts are added to a language’s lexicon. Sylvester and Tiddeman (2024) develop measures of density of lexicalization. These studies show potential for using a conceptual view on language as a way making exploring social cognition, with lexicalization being a measure of “cultural attention” (Alexander 2018, Dallachy 2024) and a “function of speakers’ needs” (Sylvester et al. 2020: 28).  

The approach to concept and lexicalization pursued in the current work can broadly be categorised in the tradition of the aforementioned thesauri-based approaches in that we consider a term’s usage, its sense, as a base for a concept. We also consider the concept as belonging to a network of hierarchically-structured meanings, i.e. structured horizontally in terms of (near-)synonymy and co-hyponymy and vertically in terms of hyponymy and hypernymy. We use WordNet (Fellbaum 1998), specifically WordNet 3 (Princeton University 2010), to profile senses and model the conceptual structure including semantic relations. WordNet has the advantage in this respect as it is made up of twenty hierarchical levels (Mohamed & Oussalah 2014), as opposed to, for example, the seven levels of the Historical Thesaurus (Piao et al. 2017), thus it enables greater granularity of analysis when it comes to researching hierarchical semantic relations. 

WordNet is a large lexical database of English that groups words according to their meaning. It enables meaning to be modelled through two main semantic units, senses and concepts. Working at the sense level means that we consider words in a text by their meaning as tagged by WordNet. Concepts additionally include all of the hyponyms of the sense. As an illustration of the difference between a sense and concept, consider the WordNet tag of person.n.01, which is defined as ‘a human being’.4 The sense person.n.01 is lexicalized by words that are sense-tagged as person.n.01, including person, individual, someone, and somebody. These lemmas exist at a single level of the semantic hierarchy, that is, the semantic relationship between them is broadly that of synonymy. Meanwhile the concept person.n.01, refers to the words that are sense-tagged as person.n.01 and to words that are sense-tagged as the hyponyms of person.n.01, which include the co-hyponyms child.n.03 and adult.n.01, hyponyms of these hyponyms, such as woman.n.01, recursively until there are no more hyponyms left. This process is unidirectional, that is while the concept of person.n.01 includes hyponyms such as adult.n.01, it does not include hypernyms such as organism.n.01.
The proposed view of text semantics, in which each word is tagged for its sense and position in the WordNet hierarchy allows for testing a whole set of hypotheses on categorisation of meaning, concepts, and lexicalization. The current research centers lexicalization understood as the ways in which concepts are represented through words or multi-word constructions. We investigate gendered variation in using terms at hypernymic and hyponymic levels in the concept of commodity.
In identifying the scope of this research, we are motivated by methods and findings of Robinson & Weeds (2022) on conceptual gendered variation. Robinson & Weeds (2022) discover that in the 19th century concepts varied in terms of their differing collocational patterns across genders. For example, while the concept of woman was used at similar frequency by male and female speakers, the adjectival concepts that the concept woman collocated with demonstrated variation across gender. Women were more likely than men to describe other women using the concept AP.02.b [individual character] and the concept AW.04.a [poverty]. These concepts include adjectives such as single and poor and their usage with the concept woman is evidenced by statements such as ‘She is a married woman’ and ‘I am a poor unfortunate woman’ (Robinson & Weeds 2022: 421). Unlike Robinson & Weeds (2022) who focus primarily on collocation patterns, we focus on the ways in which men and women lexicalize concepts and the different levels of the conceptual hierarchy with which they engage.

3. Data

Data used in this research are provided by the Mass Observation Project (Mass Observation 2010, henceforth MOP), a British national life-writing project. Thrice yearly, the MOP issue open-ended questionnaires, or ‘directives’, on a variety of topics, from Royal coronations, to attitudes towards gender. These are sent to a panel of c.500 ‘observers’ who are invited to submit their response to the directive. We focus on a collection of three directives with a broad theme of environmentalism, which are titled ‘Future of consumption’ (2018), ‘You and plastics’ (2019) and ‘Household recycling’ (2021) (see here). The ‘Future of consumption’ directive asked participants to consider the way in which consumption practices are likely to change for future generations. ‘You and plastics’ directive asked observers to reflect on their use of, particularly single-use, plastics in the past, present, and future. ‘Household recycling’ directive asked respondents to consider what and how often they recycle, as well as their motivations for recycling. While each Mass Observation directive receives a small number of handwritten responses, we focus on digitally-submitted files. These responses to the three directives number 395 submissions from ‘observers’, totalling 416,754 words. These three directives form the target corpus, henceforth the Mass Observation Recycling and Environmentalism (MORE) corpus.

Senses and concepts become key if they occur frequently enough in a target corpus in comparison to a reference corpus. The choice of a reference corpus depends on a rage of criteria (cf. Baker 2004). In the current research the reference corpus comes from data collected by the MOP. As well as the directives, the MOP also issues calls for ‘Day diaries’ on the 12th of May each year since 2010. These diaries include descriptions of daily activities, thoughts that the writer has throughout the day, and generally provide an insight into the life of the diarists. The digitally-submitted responses to these diaries from 2010–2019 form the baseline with 4,101,605 words, from 3,070 diary entries (see Robinson et al. 2023).

Consistently, the respondents to the MOP’s calls are disproportionately women, older, middle-class and from the South-East of England (see Robinson et al. 2023). In terms of gender, participants are asked to self-identify their gender, and some do so with labels outside of the male-female binary.5 While the unbalanced nature of the sample is a limitation, the size of the dataset  and subsets for each demographic group still allow for  a robust comparative analysis.  Excluding those for whom relevant data was not provided, the gender and the decade of birth distribution of the contributors to both the target and reference corpora are presented in Figure 1. The two datasets in Figure 1 are broadly similar in relation to socio-demographic categories, although the MORE corpus (left) has slightly more male respondents (28.8%) as opposed to the baseline (right, 17.7%).

🔍 Hover to Zoom

The age (decade of birth) and gender distribution among respondents, with the MORE corpus on the left and reference corpus on the right
Figure 1: The age (decade of birth) and gender distribution among respondents, with the MORE corpus on the left and reference corpus on the right

4. Method

In the current research, Word Sense Disambiguation (WSD) is employed to determine which sense is the most appropriate for each word in the data based on the word’s context. We use SupWSD (for more detail, including an evaluation of its accuracy, see Papandrea et al., 2017), a WSD tool that uses a machine learning algorithm, i.e. Support Vector Machine, and WordNet (Fellbaum 1998, WordNet 3.0 (2010)) as the sense inventory, taking into account part-of-speech, surrounding words, and local collocations.

Unlike other knowledge bases (for example, the HT), the WordNet hierarchy only applies to nouns and verbs. While other part-of-speech categories, such as adjectives and adverbs are tagged for senses in WordNet, they do not form a hierarchy, that is, they are represented in a horizontal dimension only. Thus, the current conceptual analysis is limited to verbs and nouns. Verbal and nominal senses are assigned a level, ranging from the broadest concepts at level 0, to steadily more specific concepts at levels with higher numbers. At level 0, the highest level of hyponymy, a variety of verbs, such as those tagged with senses trade.v.01 and degrade.v.01, are present, while all nouns converge on entity.n.01. At the other end of the hierarchy, there are much more specific senses, such as cow.n.01 at level 17. That is, there are 16 hypernyms6 that separate cow.n.01 and entity.n.01, for example, physical_entity.n.01 at level 1 and cattle.n.01 at level 16.

After each word in the corpora has been tagged by the appropriate sense, we perform analyses of the corpus through a bespoke Application Programming Interface. In order to identify differences in the usage of a senses or in the use of concepts between the target and reference corpora we use a measure of Pointwise Mutual Information (PMI, see for example, Huang et al. 2009, for a discussion of its application to conceptual analysis, see Robinson & Weeds 2022, Robinson et al. 2023). PMI enables the identification of keysenses and keyconcepts, i.e. senses or concepts which appear in the corpus more often than one would expect, given their frequency in a reference corpus. The higher the PMI, the more distinctive the sense or the concept of the target dataset relative to the reference corpus. The PMI is established in the way presented in Equation 1, where A is a sense or concept, B is a target corpus, P(A|B) is the probability of encountering a sense A or a concept A given a target corpus B, and Pref(A) is the probability of a sense A or concept A in the reference corpus.

\[ \text{PMI}(A, B) = \log \left( \frac{P(A \mid B)}{Pref(A)} \right) \]
... Equation 1

In Section 5.1, using the tools discussed here, we explore the semantics of the MORE corpus. More specifically, we provide an overview of the distinctive senses and distinctive concepts.  In Section 5.2, we focus on the case study of the concept of commodity to answer questions pertaining to taxonomic differences in lexicalization patterns between men and women.

5. Results and Analysis

5.1. Semantics of the MORE corpus: Keysenses and keyconcepts

In order to provide a semantic overview of the environmental narratives, we identify top keysenses and a conceptual profile of the data. The top 50 keysenses in the MORE corpus, which are measured against the senses in the reference corpus, are presented in Figure 2. The senses with the highest PMI values appear in the top left with the largest tile size and darkest colour, conversely, the 50th top sense appears in the bottom right with the smallest tile size and palest colour. The sense-level analysis clusters (near-)synonymous lexical items. For example, in some cases the adjectives reusable and recyclable are used synonymously enough to be tagged with the same sense, i.e. reclaimable.s.01.7 Additionally, this analysis disambiguates polysemous words and is preferable when the focus is on the meaning rather than form. For instance, it distinguishes between waste.n.01 ‘any materials unused and rejected as worthless or unwanted’ and waste.n.02 ‘useless or profitless activity; using or expending or consuming thoughtlessly or carelessly’.

🔍 Hover to Zoom

Figure 2: The top 50 senses in the MORE corpus, with corresponding PMI values

Figure 10.2 provides a semantic overview of the MORE corpus. The most distinctive sense of the corpus is coronavirus.n.01. This result occurs due to the very low frequency of this sense in the baseline corpus, coupled with its much higher frequency in the target corpus, particularly the ‘Household recycling’ directive, where the directive prompt specifically asked about the effect of Covid‑19 on recycling practices. Other senses provide a largely intuitive account of the content of the responses to the three directives, such as materials, for example, plastic.n.01, cardboard.n.01, and cellophane.n.01, as well as practices associated with environmentalism, such as recycle.v.02 and flatten.v.01.

The keyconcept approach develops a keysense analysis by considering (near-)synonymy alongside hyponymy. One way to visualise the verbal and nominal concepts in the target corpus is by using a sunburst, as in Figure 3.8 This conceptual profile presents the concepts at all levels of the conceptual hierarchy. Each level in conceptual hierarchy corresponds to one ring on the sunburst, with the highest levels the conceptual hierarchy, such as entity.n.01, being close to the centre of the figure at level 0, then its daughter nodes physical_entity.n.01 and abstraction.n.06 at level 1, and so on.9 Hyponyms radiate out from their parent node. However, where the sense of the parent node is used, this does not generate a daughter node and is, instead, represented by empty space. For example, in the concept container.n.01 on the right of Figure 3 at level 6, the nodes that radiate from this concept highlight that is has a variety of hyponyms, such as, vessel.n.03 and bag.n.01. However, there is also space not occupied by daughter nodes, this is where the sense container.n.01 was used directly. The colour and size of each node are also meaningful dimensions of this visualisation. The size of each node represents raw frequency of concepts in the target corpus, and the intensity of colour indicates each concept’s PMI value. The bar on the right of the figure provides a guide as to the PMI range.  For example, matter.n.03 is less frequent but more distinctive in the target corpus than object.n.01 (both concepts are at level 2).  While it is not intended that each node should be readable, the figure serves as a compass, directing the researcher to the most distinctive conceptual areas of the dataset. For example, at the more general levels of the conceptual hierarchy (for example, 0–3), the PMI values are very low. However, distinctiveness is more likely to be found at lower levels of the hierarchy. For example, container.n.01 has a particularly high PMI (2.8) at level 6, as a hyponym of instrumentality.n.01. However, this high PMI is not simply a result of the sense container.n.01, but of its hyponyms, too. Within container.n.01 some of its daughter concepts also display high PMIs, such as bottle.n.01 (PMI=4.5) and bin.n.01 (PMI=5.72).

🔍 Hover to Zoom

Figure 3: The conceptual profile of women in the MORE corpus, with men as the baseline

Given that the environmentally-themed directives make up the MORE corpus, it is not surprising that the respondents discuss types of containers and their uses, including innovative repurposing, as well as their recycling practices. However, not all daughter concepts of container.n.01 are highly distinctive. For example, bath.n.01 and boiler.n.01 occur less often than in the baseline. Other concepts with particularly high PMI values include use.n.01 (PMI=4.4), and its daughter concept recycling.n.01 (PMI=6.2), and waste.n.01(PMI=4.8) and its daughter concept rubbish.n.01 (PMI=3.3).

 

This current section illustrates an approach to semantic characterisation of data. The MORE corpus is semantically described from the perspective of senses and concepts. Figure 2 shows the most distinctive senses of the MORE corpus, including lockdown.n.01, pandemic.n.01, and plastic.s.02. The conceptual approach complements this by considering the taxonomic relation of hyponymy. For example, while a sense-level analysis highlights container.n.01 in the top 50 most distinctive senses, a conceptual analysis shows how not only is this sense distinctive of the corpus, but so are some of its hyponyms such as bin.n.01 and botte.n.01.

5.2. Gendered variation in the MORE corpus: Keyconcepts and lexicalization patterns

Having established the semantic characteristics of the MORE corpus through the keysense and keyconcept analysis, we turn to the question of gendered semantic variation. Firstly, we investigate conceptual differences in environmental narratives between male and female respondents. Secondly, we ask how lexicalization of a concept, commodity, varies across gender groups.
Gendered differences in concepts used in the MORE corpus are extracted through a keyconcept analysis. Each concept’s PMIs for each gender is measured against the other gender group within the MORE corpus.10 That is, when the female data are the target, the male data are the reference, and vice-versa (see Figures 4 and 5).

🔍 Hover to Zoom

Figure 4: The conceptual profile of women in the MORE corpus, with men as the baseline

🔍 Hover to Zoom

The conceptual profile of men in the MORE corpus, with women as the baseline
Figure 5: The conceptual profile of men in the MORE corpus, with women as the baseline

Relative to men, for women, the PMI of charity.n.01 is 1.4, husband.n.01 is 3.6, and home.n.01 is 1.3. Relative to women, for men the PMI of internet.n.01 is 1.2, alcohol.n.01 is 1.1, and wife.n.01 is 4.1 highlighting that men engage with these concepts to a greater extent than women do in the MORE corpus. Such results testify to heterogenous behaviours and practices across genders.

Even when concepts have similar overall rates of usage across demographic groups, their internal structure can also differ. This is exemplified through the concept of commodity.n.01 which is selected as a case study. This concept is used similarly by men and women in the dataset as measured by PMI. It is used very slightly more by women with a PMI=0.18, when compared with men. The raw frequency of usage is N=1021 for women, and N=445 for men. While the PMI for commodity.n.01 for both genders is similar, the internal structure of the concept displays a great deal of variation across the two gender groups. The conceptual profile for commodity.n.01 among female writers is presented in Figure 6, while the male equivalent is presented in Figure 7.

🔍 Hover to Zoom

Figure 6: The conceptual profile for commodity.n.01 among female observers

🔍 Hover to Zoom

The conceptual profile for commodity.n.01 among male observers
Figure 7: The conceptual profile for commodity.n.01 among male observers

The conceptual profiles for males and females display a number of differences in the type of artifacts with which males and females interact. Numerous examples of gendered clothing items are distributed disproportionately across the gendered groups, with concepts such as dress.n.01 (PMI= 0.3), negligee.n.01 (PMI=0.9), brassiere.n.01 (PMI=1.7), and skirt.n.02 (PMI=0.1) displaying positive PMIs in the female data, and suit.n.01 (PMI=2.2) and jean.n.01 (PMI=1.8) displaying positive PMIs in the male data. There is also a greater attention paid to domestic labour evident in women in the conceptual profiles of commodity.n.01. For example, white_goods.n.01 is used more by women (PMI=0.8). Within the concept of white_goods.n.01, the female data has positive PMIs for refrigerator.n.01 (PMI=0.5), dishwasher.n.01 (PMI=2.0), and washer.n.01 (PMI=2.2), while the only hyponym with a positive PMI in the male data is cooler.n.01 (PMI=1.8). Similarly, laundry.n.01 has a positive PMI in the female data (PMI=1.1). There are also examples of the asymmetric distribution of childcare, with diaper.n.01 (PMI=2.8) having a positive PMI in the female data.

One surprising result is that shirt.n.01 is used more by the female observers (PMI=0.82), despite it being an artifact stereotypically associated with men. However, we can account for this result by identifying that many of these examples involve accounts of interactions with male clothing by women, such as Example (1):

  1. I reuse my husband’s cotton shirts in crafting. [Household recycling, female born in the 1960s]

The observed differences in conceptual profiles for males and females suggest that concepts provide a window into community behaviours. The artifacts represented by the concept of commodity and the asymmetric gender distribution with which they are engaged with, are a medium through which the physical world is experienced by men and women.

Another perspective in which the current research highlights gendered conceptual variation is through taxonomic differences, i.e. the levels of the conceptual hierarchy that men and women typically engage with. This differential engagement is evident in the fact that the sunburst in Figure 6 is noticeably ‘busier’ than the one in Figure 7. That is, the male data in Figure 6 includes more empty space, that not occupied by daughter nodes (hyponyms). Take clothing.n.01 as an example. Proportionally, men are more likely to use the more general level sense clothing.n.01, while women are more likely to conceptualise clothing with a greater degree of specificity. When the concept clothing.n.01 is used, men use the sense clothing.n.01 32.2% of the time, compared to 24.7% for women. Thus, women are more likely to use a hyponym. To illustrate this, example (2) is more typical of a male conceptualisation of clothing.n.01 at the more general category level, whereas the Example (3) is more typical of a female conceptualisation at a greater level of specificity (brassierre.n.01):

  1. Used batteries are taken to local supermarkets where they can be recycled, also clothing can be recycled at various points around the area. [Household Recycling, male born in the 1950s]
  2. I’d like to be able to recycle old bras [Household Recycling, female born in the 1950s]
Similarly, the use of concept of merchandise.n.01 is more likely to be realised as a hyponym by women, rather than men. For women 74.2% of uses of this concept are in the sense merchandise.n.01 and for men this value is higher at 81.6%. This is also true of the broader concept of commodity.n.01, with men using the sense commodity.n.01 17.9% of the time, compared with 12.4% for women.11

The internal structural differences in concepts evidence variation in the ways men and women express those concepts by their use of words, that is, in the ways they lexicalize those concepts.  In the analysis of commodity.n.01, differences arise in the levels within the conceptual hierarchy at which concepts are lexicalized, with men lexicalizing concepts at more generic levels, and women lexicalizing concepts at more specific levels.

6. Summary and Conclusions

This research presents a new approach to querying semantics of texts by engaging with horizontal ((near-)synonymous and co-hyponymic) and vertical (hyponymic and hypernymic) semantic relations. The current approach highlights hyponymy as a critical aspect of language variation alongside more widely-research relations in socio-semantics, such as synonymy and polysemy. Exploring texts through the lenses of keysense and keyconcept enables semantic content to be profiled which can empirically navigate further analysis and close reading. In one way, the concept-driven approach is more specific than more traditional alternatives, such as the keyword analysis, in that it distinguishes polysemous senses of the same word form. In another way, it is more general as in a conceptual approach, it is less relevant which (near-)synonym is used, what matters is the meaning expressed. This perspective enables the analysis to centre meaning, while variation in word form is secondary to this approach. We advocate for using the current conceptual approach that offers a bird’s-eye view of the text meaning with conjunction with close reading in order to develop the most robust insights into a text’s semantics (for example, Robinson et al. 2023). 

The current research demonstrates the ways in which a conceptual perspective can tell stories of socially-asymmetric behavioural practices. Examples of this include the way in which the use of clothing items exhibit gendered patterns, such as brassiere.n.01 being used relatively more by women and suit.n.01 by men. Similarly, there is a higher frequency of concepts pertaining to domestic labour and childcare in the female data. Results that testify to heterogenous behaviours and practices across genders are corroborated by the parallels with other studies, such as those observing gender-based differences in the share of domestic duties (Bianchi et al. 2012; Thébaud et al. 2021). The parallels between such previous and the current research speak to the validity of this approach.

The current concept-driven approach enables insights into lexicalization. Even when concepts have similar overall rates of usage across demographic groups, their internal structure can also differ. We show that men and women differ in the way they use different levels of conceptual hierarchy when they express or lexicalize the same concept. In the case of commodity.n.01. men are more likely to use general levels terms, such as, clothing, women are more likely to engage with more precise classifications in their lexicalization, that is, specific types of clothing, such as, bra. This idea is redolent of Lakoff’s (1973) suggestion that women tend to use more specific colour terms, such as, lavender, whereas men lexicalize these same colours at a broader level of conceptualisation, such as, purple.12 Lakoff’s (1973) observations alongside the current research lead us to hypothesise that differential lexicalisation patterns may hold for gendered or socio-demographic conceptual variation more broadly.

Another question to consider is why lexicalization would take place at different levels of the conceptual hierarchy for men and women. One possibility lies in the notion of cultural and social needs speakers express via lexicalisation practices (cf. Alexander, 2018, Dallachy 2024, Sylvester et al. 2020). Another possibility considers engaging with cognitive foundations of language and perception biases among men and women. At this stage, we suggest that the lexicalization of commodity.n.01 reflects the attention afforded to the artifacts with which men and women engage. The question of why the lexical representations of objects display differential attention across community could to be explored further via socio-cognitive research frameworks (Pütz et al. 2014).  

Findings of the current research have implications for language used by policy and industry decision-makers in the climate and environmentalism space. By integrating insights from the Nudge Theory, especially the idea of “choice architecture” (Thaler and Sunstein 2008), relevant communication and engagement strategies can be optimised to a specific audience. By using gender-characteristic language, such as a more general or detailed terms to describe commodities, population can be gently steered towards a desired behaviour, such a more accurate sorting of those commodities when they become waste. The policy makers also require precise categorisation in waste management (EU Waste Framework 2008/98/EC)13, which the current research provides through the categorisation of commodities people handle. By fine-tuning public communication to resonate with socio-demographic groups, and embedding this clarity into policy wording, policy makers can more effectively reduce landfill waste. These strategies work together to create an environment where an individual is gently steered toward more sustainable behaviour without the need for overt financial incentives.

To conclude, the proposed concept-led approach shows potential beyond a case of gendered variation. The current analysis could apply to any demographic category, including cross-sectional categories. It remains to be seen the extent to which the presented results hold for other concepts in a systematic way. Also, motivated by studies outlined in Section 2, such as Sylvester et al. (2020) or Vogelsanger (2024), further research could explore the relationship between lexicalization and different levels of the conceptual hierarchy in the context of language change. While it is argued elsewhere (for example, Sandow & Braber, this volume), that lexis can provide a lens into society, we argue here that a conceptual approach lends a particularly felicitous perspective to this endeavour. The proposed conceptual approach affords further methodological, theoretical, and applied opportunities in sociolinguistics and beyond.

Footnotes

  1. Acknowledgements: We would like to thank two anonymous Reviewers for their helpful comments on the earlier draft of this manuscript. We would like to thank the broader team at Concept Analytics Lab at the University of Sussex, particularly Julie Weeds, Willam Kearney, Yassir Laaouach, and Ray Davey for the work on the API that underpins the research presented here. Supported by the Arts & Humanities Research Council (AHRC) Impact Acceleration Account, at the University of Sussex (AH/X003531/1).↩︎
  2. Robinson & Weeds (2022) use the term characteristic concept.↩︎
  3. https://ht.ac.uk/classification/↩︎
  4. In the WordNet sense inventory, the refers to the nominal part-of-speech category. The 01 identifies this as the first sense of this form in WordNet. By way of example, person.n.01 is defined as ‘a human being’ and person.n.02 is defined as ‘a human body (usually including the clothing)’.↩︎
  5. Figure 1 does include these individuals, but as there are very few in number, they are not clearly visible.↩︎
  6. These are hypernyms from the perspective of cow.n.01, but hyponyms from the perspective of entity.n.01.↩︎
  7. For example, ‘The video I watched suggested that rice and pasta should be stored in reusable glass containers’ [‘Future of consumption’ directive, male born in the 1970s] and ‘some of the plastic containers are recyclable but I guess that several are single-use’ [‘You and plastics’ directive, female born in the 1930s].↩︎
  8. Concepts with a negative PMI also appear in Figure 3 and 4. In terms of the colour coding key on the figures, these are coloured as if their PMI value is 0.↩︎
  9. The highest levels in the conceptual hierarchy correspond to lowest numbers in the WordNet hierarchy. Thus, the highest levels are entity.n.01 at level 0, physical_entity.n.01 at level 1 and so forth.↩︎
  10. This task requires a modification of the PMI Equation (1) in terms of target and reference datasets.↩︎
  11. Beyond the concept of commodity.n.01, this effect holds for other concepts where men are more likely to use the broader sense and women are more likely to use a hyponym. For example, in the concept of child.n.02, men refer to the specific sense 53.6% of the time, compared to 40.7% for women; in chemical.n.01 the values are 3.9% for men, 1.9% for women, and in waste.n.01, the values are 76.9% for men and 71.2% for women. ↩︎
  12. We thank an anonymous Reviewer for this observation.↩︎
  13. https://eur-lex.europa.eu/eli/dir/2008/98/oj/eng, see especially Paragraph 2.↩︎

References

Alexander, Marc. 2018. Lexicalization Pressure. Plenary lecture delivered at 20th International Conference on English Historical Linguistics, University of Edinburgh.

 

Baker, Paul. 2004. Querying keywords: Questions of difference, frequency, and sense in keyword analysis. Journal of English Linguistics, 32: 346-59.

 

Bianchi, Suzanne M., Liana C. Sayer, Melissa A. Milkie, & John P. Robinson. 2012. Housework: Who did, does or will do it, and how much does it matter? Social Forces, 91: 55-63.

 

Bucholtz, Mary. 2012. Word Up: Social meanings of slang in California youth culture. In Leila Monaghan, Jane E. Goodman, & Jennifer Meta Robinson (eds.), A Cultural Approach to Interpersonal Communication: Essential Readings, 2nd ed, 274-97. Chichester: Wiley.

 

Dallachy, Fraser. 2024. A human-scale set of categories for the Historical Thesaurus of English. Dictionaries, 45: 145-68.

 

Eckert, Penelope. & Sally McConnell-Ginet. 2013. Language and Gender, 2nd edition. Cambridge: Cambridge University Press.

 

Fellbaum, Christiane (ed.). 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

 

Fitzmaurice, Susan. & Seth Mehl. 2022. Volatile concepts: Analysing discursive change through underspecification in co-occurrence quads. International Journal of Corpus Linguistics, 27: 428-50.

 

Fitzmaurice, Susan., Justyna A. Robinson, Marc Alexander, Iona C. Hine, Seth Mehl, & Fraser Dallachy. 2017. Linguistic DNA: Investigating conceptual change in Early Modern English Discourse. Studia Neophilologica, 89: 21–38.

 

Grondelaers, Stefan. & Dirk Geeraerts. 2003. Towards a pragmatic model of cognitive onomasiology. In Hubert Cuyckens , René Dirven and John R. Taylor (eds.), Cognitive Approaches to Lexical Semantics, 67-92. Berlin: Mouton.

 

Hoang, Hung H., Su N. Kim, and Min-Yen Kan. 2009. A re-examination of lexical association measures. In Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, 31-9.

 

Kay, Christian., Marc Alexander, Fraser Dallachy, Jane Roberts, Michael Samuels, and Irené Wotherspoon (eds). 2023. The Historical Thesaurus of English (2nd edn., version 5.0). University of Glasgow. https://ht.ac.uk/. Accessed September 11, 2024.

 

Kilgrarriff, Adam. 2009. Simple maths for keywords. In Michaela Mahlberg, Victorina González-Díaz, & Catherine Smith (eds.), Proceedings of the Corpus Linguistics Conference CL2009. Available at: https://www.sketchengine.eu/wp-content/uploads/2015/04/2009-Simple-maths-for-keywords.pdf. Accessed 18 July 2024.

 

Lakoff, Robin. 1973. Language and woman’s place. Language in Society, 2 (1), 45-80.

 

Mass Observation. 2010. Mass Observation Archive. Available online at: http://www.massobs.org.uk/. Accessed August 21, 2024.

 

Mehl, Seth. 2022. Discursive Quads: New Kinds of Lexical Co‐occurrence Data With Linguistic Concept Modelling. Transactions of the Philological Society120 (3), 474-88.

 

Mohamed, Muhidin. & M. Oussalah. 2014. A comparative study of conversion aided methods for WordNet sentence textual similarity. Proceedings of the AHA! Workshop on Information Discovery in Text, 37-42.

 

Murphy, Lynne M. 2010. Lexical Meaning. Cambridge: Cambridge University Press.

 

Newman, Matthew L., Carla J. Groom., Lori D. Handelman., & James W. Pennebaker. 2008. Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes, 45: 211–36.

 

Papandrea, Simone., Alessandro Raganato, and Claudio D. Bovi. 2017. “SupWSD: a flexible toolkit for supervised word sense disambiguation,” in Proceedings of the 2017 EMNLP System Demonstrations, 103-8.

 

Piao, Scott., Fraser Dallachy, Alistair Brown, Jane Demmen, Steve Wattam, Phillip Durkin, James McCracken, Paul Rayson, & Marc Alexander. 2017. A time-sensitive historical thesaurus-based semantic tagger for deep semantic annotation. Computer Speech & Language, 46: 113–35.

 

Princeton University. 2010. About WordNet. Available at: https://wordnet.princeton.edu/. Accessed 10/12/2024.

 

Pütz, Martin, Reif, Monika, and Justyna A. Robinson (eds.). 2014. Cognitive Sociolinguistics. Amsterdam: John Benjamins.

 

Robinson, Justyna A. 2010. Awesome insights into semantic variation. In Dirk Geeraerts, Gitte Kristiansen, & Yves Piersman (eds.), Advances in Cognitive Sociolinguistics, 85-109. Berlin: Mouton de Gruyter.

 

Robinson, Justyna A. 2012. A gay paper: Why should sociolinguistics bother with semantics? English Today, 28: 38-54.

 

Robinson, Justyna A. & Julie Weeds. 2022. Cognitive sociolinguistic variation in the Old Bailey Voices Corpus: The case for a new concept-led framework. Transactions of the Philological Society, 120, 399-426.

 

Robinson, Justyna A., Rhys J. Sandow, & Roberta Piazza. 2023. Introducing the keyconcept approach to the analysis of language: The case of regulation in Covid-19 diaries. Frontiers in Artificial Intelligence, 6, doi: https://doi.org/10.3389/frai.2023.1176283.

 

Sylvester, Louise., Imogen Marcus, & Richard Ingham. 2017. A bilingual thesaurus of everyday life in Medieval England: Some issues at the interface of semantics and lexicography. International Journal of Lexicography, 30: 309–21.

 

Sylvester, Louise., Megan Tiddeman, and Richard Ingham. 2020. An analysis of French borrowings at the hypernymic and hyponymic levels of Middle English. Lexis: Journal of English Lexicography, 16, doi: 10.4000/lexis.4841.

 

Sylvester, Louise, Megan Tiddeman, and Richard Ingham. 2022. Semantic Shift in Middle English: Farming and Trade as Test Cases. Transactions of the Philological Society120: 427–46.

 

Sylvester, Louise and Megan Tiddeman. 2024. Lexicalization, polysemy and loanwords in anger: A comparison with non-affective domains in Middle English”, Lexis 3, doi: https://doi.org/10.4000/12ize

 

Thaler, Richard H. and, Sunstein Cass R. 2008. Nudge: Improving Decisions About Health, Wealth, and Happiness. New Haven: Yale University Press.

 

Thébaud, Sarah., Sabino Kornrich, & Leah Ruppanner. 2021. Good housekeeping, great expectations: Gender and housework norms. Sociological Methods & Research, 50: 1186-214.

 

Trousdale, Graeme. 2008. Constructions in grammaticalization and lexicalization: Evidence from the history of a composite predicate construction in English. In Graeme Trousdale & Nikolas Gisborne (eds.), Constructional Approaches to English Grammar, 33-67. Berlin: Mouton.

 

Vogelsanger, Johanna. 2024. Obsolescence and innovation in the Middle English religious lexicon. To appear in Transactions of the Philological Society. Advance Online Publication, https://doi.org/10.1111/1467-968X.12310.

 

WordNet. 2010. WordNet 3.0. Available online at: http://wordnet.princeton.edu (accessed February 27, 2023).

“Show me the meaning of being lonely”

“Show me the meaning of being lonely”

by Rhys Sandow

At the turn of the millennium Backstreet Boys released their hit single “Show me the meaning of being lonely”. The lyrics talk about realities of heartbreak and the need for connection and understanding associated with losing a loved one. 25 years later the concept of loneliness brings a far more complex meaning and disturbing statistics. Across the globe loneliness is rapidly becoming one of the most urgent public health risks (see the new report from the World Health Organization, From Loneliness to Social Connection: Charting the Path to Healthier Societies). Nearly half of adults in the UK feel lonely occasionally, sometimes, often, or always (see here). Some groups are especially at risk of loneliness, particularly young, disabled people (see here).  

In order to proactively reduce loneliness, we need a greater understanding of  people’s experiences and feelings they describe under the concept of LONELINESS and how these are talked about (see here). This task was explored by Justyna Robinson (Concepts Analytics Lab) and Faith Matcham (Psychology) at the University of Sussex , whose work towards creating a personalised chatbot for loneliness intervention was funded by Sussex Digital Humanities Lab. Ultimately, the headline findings are that

  • Loneliness is generally experienced with similar intensity across demographic groups.
  • But the risk factors differ greatly, particularly according to age.
    • Social media is discussed as a catalyst for loneliness for younger people.
    • Outright social isolation is framed as a more pressing issue for older people.

For this project we explored data on loneliness collected by Mass Observation Project (MOP). In 2019 MOP issued a survey, the directive on Loneliness and Belonging (see here). The directive consists of two parts. Firstly, participants were asked to provide five words that they associate with loneliness, e.g. despair, fear, frustration, quiet, and sad (see Figure 1). Secondly, they provided long-form narrative responses to a series of questions related to the broader topic of loneliness.

The full analysis of this data is presented in our working paper here.

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.

Blog

How language reveals different faces of loneliness?

How language reveals different faces of loneliness

by Justyna A. Robinson, Gayathri Sooraj, and Caitlin Hogan

The issue of loneliness has been discussed in public platforms more frequently in recent years. Just within the past week newspapers have talked about student loneliness, male loneliness, loneliness epidemic. A recently published WHO Report on Social isolation shows that 1 in 6 people globally suffers from loneliness.  The data on loneliness in the UK shows that nearly half of adults feel lonely occasionally, sometimes, often, or always (see here). Some groups are especially at risk of loneliness, such as, young, disabled people (see here). However, the available reports rarely dive into exploring  how people themselves  describe loneliness or what language choice people make when they talk about  loneliness (but see here). These questions are explored by Justyna Robinson (Concepts Analytics Lab) and Faith Matcham (Psychology) at the University of Sussex , whose work towards creating a personalised chatbot for loneliness intervention was funded by Sussex Digital Humanities Lab. Ultimately, our headline findings are that  

  • Loneliness is generally experienced with similar intensity across demographic groups.
  • But the risk factors differ greatly, particularly according to age.
    • Social media is discussed as a catalyst for loneliness for younger people.
    • Outright social isolation is framed as a more pressing issue for older people.  

For this project we explored data on loneliness collected by Mass Observation Project (MOP). In 2019 MOP issued a survey, the directive on Loneliness and Belonging (see here). The directive consisted of two parts. Firstly, participants were asked to provide five words that they associate with loneliness, e.g. despair, fear, frustration, quiet, and sad (see Figure 1). Secondly, they provided long-form narrative responses to a series of questions related to the broader topic of loneliness.  

Loneliness in five words

70 Writers shared the five words they associate with loneliness. In total, 417 individual words were submitted (sentence responses excluded) with 206 different wordsThe top responses included terms such as isolation, sadness, alone. All 206  words expressed various aspects of affective meaning which we measured by adopting the NRC Valence, Arousal, and Dominance (NRC-VAD) Lexicon classification (Mohammed 2025). Each word is assigned a numerical value, from 0 to 1, for each of the three dimensions depending on how strongly they represent a given dimension.

  • Valence refers to the positive (higher score) or negative (lower score) feelings associated with each word.
  • Arousal relates to the intensity of the emotion, with higher scores reflecting greater intensity.
  • Dominance refers to the degree of control exerted by a stimulus, with higher scores corresponding to greater control. To exemplify this, Figure 1 presents the words provided by a younger female of low socioeconomic status, alongside their scores across the three dimensions.
Figure 1: The valence, arousal, and dominance scores for five words provided by a younger female of lower socioeconomic status

Findings highlight remarkable consistency across demographic groups in their conceptualisation of loneliness across the dimensions of valence, arousal, and dominance. Figure 2 highlights that the average NRC-VAD scores for male and female respondents are almost identical.  

A scatter diagram for the valence, arousal, and dominance scores for male and female respondents
Figure 2: A scatter diagram for the valence, arousal, and dominance scores for male and female respondents

Indeed, statistical analysis identified very few meaningful differences in the data set. This speaks to a remarkable homogeneity in the conceptualisation of loneliness measure by the NRC-VAD scores. The one exception to this was that on the dominance dimension there was an interaction between age and socioeconomic status. While age exhibited minimal differences among those in the lower status socioeconomic group, there was a much greater difference in the higher status group (see Figure 3). We consider age as a binary category for the purposes of this analysis, distinguishing between those who qualified for the state pension at the time of data collection (the state pension age was 65 in 2019) and those who did not.  

Figure 3: The interaction between age and socioeconomic status for the dominance scores

Figure 3 shows that the effect of age on the experience of loneliness is much greater among higher status respondents, with those who are older in this category conceptualising loneliness as something that exerts a greater degree of control (0.34), relative to their younger counterparts (0.19). In particular, there is a large difference between status among younger respondents, with those of lower status (0.28) conceptualising loneliness as exerting greater dominance than their higher status counterparts (0.19). 

Loneliness in long-form narratives

The longer, narrative responses to the directive discussed experiences of loneliness both on a personal level and at a broader societal level as well as speculation as to the causes of loneliness as well as potential solutions. In order to understand the ‘aboutness’ of the data we look at key terms.  

The outputs of modified ‘keyword’ analysis (see our blog about modified keyword analysis here) are presented in Table 1 and the key multi-word terms in Table 2 with the responses to the directive compared against the Ententen 2020 corpus as a baseline (see here). 

The top 25 keywords in the responses to the Loneliness & Belonging directive
Table 1: The top 25 keywords in the responses to the Loneliness & Belonging directive
The top 25 multi-word terms in the responses to the Loneliness & Belonging directive
Table 2: The top 25 multi-word terms in the responses to the Loneliness & Belonging directive

While a full analysis of these results is far beyond the scope of the blog here we consider one case-study, that of age.  

Many of the key terms in Table 2 relate to the theme of age, e.g. young people, old age, old people, and elderly people. We have seen from the quantitative analysis of the affective meaning of the valence, arousal, and dominance that there was very little difference between age groups in the conceptualisation of loneliness (save for the interaction between socioeconomic status and age for the dominance scores). However, the longer-form responses highlight something else. 

While both younger and older people are mentioned in the contexts of loneliness, the risk factors associated with these groups differ. Indeed, the discourses in which mentions of these groups appear highlight that the challenges relating to loneliness are not uniform. For example, social media was often mentioned in the context of young people: 

  • In the news it is reported that loneliness is on the increase, particularly amongst young people. This surprises me as young people are constantly communicating with others on social media. However, perhaps seeing what looks like a perfect life on someone’s Facebook account serves to exacerbate feelings of dissatisfaction and loneliness in people whose lives are not running smoothly.  
  • I understand that social media can cause young people to feel lonely and that other people are living a more interesting life. Unfortunately, it is full of lies but people don’t really understand that.

In contrast, discussions of older people’s loneliness were often framed in terms of isolation in more absolute terms, e.g.: 

  • I know that there are old people today who feel lonely. Particularly following the death of their spouse.  
  • Inevitably, as people live longer and stay in their own homes, isolation can become a serious problem. I believe the older generation can suffer considerably in this respect, particularly in rural communities where the shop and pub are now closed, the bus service has been withdrawn and children have grown up and live elsewhere. 

Ultimately, using a suite of analytical tools, from sentiment analysis, to keyword analysis, to discourse analysis, enables one to garner insight into the conceptualisation of loneliness that is often missed in survey-type data. The discursive nature of this data enables lived-experiences of loneliness and its perceived risk factors to be unpacked in detail- the sort of detail that computational linguistic methods can cut through, to pick out key patterns and affective meanings. Ultimately, the findings from this report will inform the development of a chatbot that will serve as a tool to combat the loneliness crisis.  

If you are interested in working with Concept Analytics Lab, please do contact us at Justyna.Robinson@sussex.ac.uk

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.

Blog

Accessibility

Accessibility

Effective Date: 28.05.2025

Concept Analytics Lab is committed to ensuring digital accessibility for people with disabilities. We are continually improving the user experience for everyone and applying the relevant accessibility standards in line with our academic values and obligations.

Scope and Commitment

This accessibility statement applies to the Concept Analytics Lab website (https://conceptanalytics.org.uk/). We aim to make our website as accessible and usable as possible, following the principles of the Web Content Accessibility Guidelines (WCAG) 2.2, Level AA.

As a research initiative affiliated with the University of Sussex, we align our accessibility practices with the University’s broader digital inclusion strategy, while reflecting our own commitment to equitable access to research and knowledge.

Accessibility Standards

We aim to:

  • Ensure text is readable and appropriately contrasted
  • Provide alternative text for images
  • Enable navigation using a keyboard
  • Maintain consistent and meaningful HTML structure
  • Use descriptive link text
  • Ensure content is accessible via screen readers

These principles are built into our ongoing design and content updates.

Known Limitations

Some content may not fully meet accessibility standards. For example:

  • Some images do not have alt tags. This means users of assistive technologies may not have access to information conveyed in images. This fails WCAG 2.2 success criterion 1.1.1 (Non-text Content).
  • Some externally hosted content or links may not conform to the same standards
  • Some user interface elements do not meet colour contrast standards. This fails WCAG 2.2 success criterion 1.4.3: Contrast (Minimum).

We are working to address these issues as part of our ongoing review.

Feedback and Contact

If you encounter any accessibility barriers or would like to request content in an alternative format, please contact us:

Email: Justyna.Robinson@sussex.ac.uk
Postal Address: Sussex Digital Humanities Lab, Silverstone SB211, Arts Road, Falmer, East Sussex, BN1 9RG

Technical Information and Compliance

This website is partially compliant with the Web Content Accessibility Guidelines version 2.2 AA standard, due to the non-compliances and exemptions listed above.

We periodically test our site using automated tools such as WAVE and manual assessments. We prioritise accessibility fixes based on severity and user impact.

Enforcement

If you are not satisfied with our response to your accessibility concern, you can contact the Equality Advisory and Support Service (EASS) at https://www.equalityadvisoryservice.com/ for further advice.

Compatibility with Browsers and Assistive Technology

The Concept Analytics Lab website is designed to be compatible with the following technologies:

  • The latest versions of Chrome, Firefox, Safari, and Microsoft Edge
  • Screen readers such as NVDA and VoiceOver
  • Keyboard-only navigation
  • Operating systems including Windows, macOS, iOS, and Android

Assessment Approach

The accessibility of the Concept Analytics Lab website is assessed by:

  • Automated testing tools (e.g., WAVE)
  • Axe – Web Accessibility Testing extension for Chrome
  • Internal manual checks during content updates
  • Peer-based usability reviews with screen reader users (where feasible)

Preparation of This Statement

This statement was prepared on 16.05.2025 and last reviewed on 28.05.2025.The accessibility of this website was evaluated using a combination of manual and automated tests.

 

Privacy Policy

Privacy Policy

Effective Date: 16.05.2025
This Privacy Policy explains how Concept Analytics Lab (“we”, “us”, or “our”) collects, uses, and protects any personal data obtained through our website at https://conceptanalytics.org.uk/. We are an academic research initiative affiliated with the Sussex Digital Humanities Lab at the University of Sussex. As such, our data practices are aligned with the University of Sussex’s policies and responsibilities as a registered data controller under the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018.

Who We Are

Concept Analytics Lab operates within the Sussex Digital Humanities Lab, based at the University of Sussex. The University is the registered data controller (ICO registration number: Z6428144). For questions or concerns please contact:

  • Email: Justyna.Robinson@sussex.ac.uk
  • Postal Address: Sussex Digital Humanities Lab, University of Sussex, Silverstone SB211, Arts Road, Falmer, East Sussex, BN1 9RG

 For more information, refer to the University’s Data Protection Policy.

What Data We Collect

We do not collect personal data through this website unless you choose to contact us directly (e.g. via email or form submissions). In such cases, your data will be processed in line with the University of Sussex’s policies.

We may collect and process the following types of personal data:

  • Your name and email address (e.g. via contact forms or event registrations)
  • Institutional affiliation or research role (if submitted)
  • Content related to research engagement (e.g. blog contributions, survey responses)
  • Technical data such as IP address, browser type, and site usage

We do not knowingly collect special category (sensitive) personal data through the website interface.

How We Use Your Data

We process personal data only for the purpose for which it was provided, including:

  • Responding to enquiries or submissions
  • Managing event participation or collaboration proposals
  • Internal reporting and evaluation
  • Monitoring website usage to improve functionality and accessibility

We do not use your data for marketing purposes or share it with third parties unless explicitly required or permitted by law.

We rely on one or more of the following legal bases for processing data under the UK GDPR:

  • Your consent (e.g. when submitting an enquiry)
  • Legitimate interests (e.g. maintaining academic engagement)
  • Public task (e.g. conducting academic research)

Research Data

As part of our academic mission, we may collect and process personal data for research purposes. This includes, but is not limited to, data collected through:

  • Surveys, interviews, or focus groups
  • Publicly available datasets
  • Third-party data providers or institutions (e.g. Department for Education)

All research involving human participants undergoes ethical review. Participants receive an information sheet outlining the data collected, purposes of use, and their rights. When research involves special category data (e.g. health or ethnicity), processing is done under Article 9(2)(j) UK GDPR for scientific or historical research purposes in the public interest.

Some data subject rights (such as deletion) may be limited in the context of completed or anonymised research to ensure integrity and reproducibility.

Data Sharing

We do not sell or share personal data for commercial purposes. Data may be shared only with:

  • University of Sussex departments or service providers
  • Trusted third-party tools (e.g. Google Analytics), under strict data protection terms
Data shared is limited to what is necessary and is handled securely.

Cookies and Tracking Technologies

Cookies are small files placed on your device to collect standard internet log and visitor behaviour information. This helps us understand user interactions, remember your preferences, and improve website functionality.

We do not use cookies for advertising or personalised tracking. However, our website uses Google Analytics, to collect anonymised information about how visitors use the site. This helps us understand user behaviour, such as which pages are most frequently viewed and how users navigate the site, so we can improve functionality and content.

Google Analytics may collect data including:

  • Your IP address (anonymised)
  • Browser type and version
  • Pages visited and time spent
  • Device type and operating system

These cookies do not identify you personally. We do not combine this data with any other personal information.

You can manage or block cookies via your browser settings. For more information on how Google Analytics handles your data, please refer to Google’s Privacy Policy and Google Analytics Cookie Usage.

Your Rights

Under the UK GDPR, you have rights to:

  • Access the personal data we hold about you
  • Request corrections or deletions
  • Object to or restrict certain types of processing
  • Withdraw consent at any time
  • Lodge a complaint with the Information Commissioner’s Office (ICO)
To exercise these rights, please contact: Justyna.Robinson@sussex.ac.uk

Changes To This Policy

We may update this Privacy Policy to reflect changes in law, institutional requirements, or technical developments. Updates will be published on this page with a revised effective date. Continued use of the website implies acceptance of the current version.

The Labour Party’s 2024 Manifesto – The Voice of Members or Lobbyists?

The Labour Party’s 2024 Manifesto - The Voice of Members or Lobbyists?

by Justyna A. Robinson and Rhys Sandow

CAL Briefing Paper 1

published online on 3rd July 2024

The Labour party has been criticised recently for the volume of its candidates in the 2024 general election with backgrounds in lobbying [1,2,3]. Indeed, this led Novara Media (2024) to question whether the Labour Party is ‘The Lobbyists’ Party’ [4, also 5]. In this article, we assess this claim by considering extent to which the Labour Party’s 2024 general election manifesto is consistent with either the desires of party members or of lobby groups, as articulated in the Labour party’s 2023 National Policy Forum Consultation on progressive trade policy [6]. 

While Labour’s 2023 National Policy Forum (hereafter, NPF) considered a variety of policy areas, our focus here is its forum on trade. Excluding duplicates, there are 302 submissions to the consultation, with 109 submitted by guests (i.e. business and other lobby interest groups), 187 by Labour Party members and the rest by National Policy Forum representatives. In total, there are 244,894 words submitted to the forum. While the majority of submitters were Labour Party members, they produce a minority of the total word count, with 24.7% of words produced by members, as opposed to 71.2% by guests (the rest by NPF representatives). For a more detailed take in the data see Gasiorek et al (2024) [7].

We contrast the top multi-word terms (i.e. phrases) that are used disproportionately by Labour Party guests versus and Labour Party members (see Table 1), and vice versa (Table 2) [8]. Note that the results are ordered by ARF (Average Reduced Frequency) which is a modified frequency measure that accounts for distribution across the submissions, so that one response does not skew the results, see Sandow & Robinson (2024) [9]).  

Table 1 shows that relative to members, guests are more concerned with the mechanisms and process of trade, as highlighted by phrases such as supply chain, due diligence, value chain, and new law. The guests are also more concerned with environmental issues, such as environment act and environmental harm. Additionally, guests focus on human rights, which is a theme represented by phrases such as human rights defender, forced labour, modern slavery, and gender equality.

So, does the Labour Party Manifesto address these considerations?

The Labour Party Manifesto [9] identifies the need for resilient supply chains (resilient supply chains is in the wording of one of the questions in the NPF), as expressed with the following:

“We will ensure a strong defence sector and resilient supply chains, including steel, across the whole of the UK.” (2024 Labour Manifesto)

The environment is a key theme in the Manifesto, with energy and climate occurring 68 and 24 times, respectively. In particular, the ways in which trade agreements can be a vehicle for pursuing a green agenda is recognised in the Manifesto in the following way:

“We will seek a new strategic partnership with India, including a free trade agreement, as well as deepening co-operation in areas like security, education, technology and climate change.” (2024 Labour Manifesto)

Human rights, including gender equality, are also mentioned throughout the Manifesto, although not discussed directly in the context of trade.

We will use the UK’s unique position in NATO, the UN, G7, G20, and the Commonwealth to address the threats we face, and to uphold human rights and international law.” (2024 Labour Manifesto)

 

“Labour will take action to reduce the gender pay gap, building on the legacy of Barbara Castle’s Equal Pay Act.” (2024 Labour Manifesto)

The Manifesto does not mention due diligence, although it is reasonable to assume that this is implicit under the umbrella of good practice and does not mention any new laws in relation to trade, although they do commit to other new laws, e.g. ‘Martyn’s Law to strengthen security of public events.

 

Now, turning the attention to the submissions by the Labour party members. Table 2 shows the phrases that are used disproportionately by Labour Party members, relative to guests.

 

Note, due to the lower quantities of text produced by Party members, here we present the top ten phrases (Table 2), as opposed to the top twenty phrases for the Party guests presented in Table 1.

Party members are more likely to discuss arms which is evident via their use of phrases such as, arms export, arms trade, counter proliferation, including, but not limited to, the Israeli government. They are also more likely to discuss local government and asylum seekers. In context, these phrases typically reference a desire for greater regulation and moral consideration on those with whom we trade arms, greater power for local governments, and a more welcoming environment for asylum seekers.

In terms of the theme of arms, the Manifesto makes a commitment to upholding international law, but avoids references any specific nation or any specific commitments beyond that:

Labour will support industry to benefit from export opportunities in line with a robust arms export regime committed to upholding international law.” (2024 Labour manifesto)

While the Manifesto does not mention Israel/Palestine in relation to arms exports, they do address this elsewhere:

Long-term peace and security in the Middle East will be an immediate focus. Labour will continue to pish for an immediate ceasefire, the release of all hostages, the upholding of international law, and a rapid increase of aid into Gaza. Palestinian statehood is the inalienable right of the Palestinian people. It is not in the gift of any neighbour and is also essential to the long-term security of Israel. We are committed to recognising a Palestinian state as a contribution to a renewed peace process which results in a two-state solution with a safe and secure Israel alongside a viable and sovereign Palestinian state.” (2024 Labour Manifesto)

There are multiple commitments in the Manifesto to provide greater autonomy for local and devolved governments, such as:

“Local government is facing acute financial challenges because of the Conservatives’ economic mismanagement which sent interest rates soaring, along with their failures on public services. To provide greater stability, a Labour government will give council multi-year funding settlements and end wasteful competitive bidding.”  (2024 Labour Manifesto)

Lastly, while both the Party members’ responses to the NPF and the Manifesto discuss asylum seekers, the framing of asylum seekers differs in each context. In the NPF, members implore the Labour Party to provide greater support for asylum seekers:

We must promote Labour values by treating refugees and asylum seekers with dignity and respect. This means providing a safe, legal route for refugees and ending detention in camps while claims are being processed. Asylum claims must be processed more quickly and people should be permitted to work while their claims are being processed.” (response to the 2023 Labour National Policy Forum Consultation on progressive trade)

Whereas, in the Manifesto, asylum seekers are discussed in the context of perceived Conservative failures:

Rather than a serious plan to confront the crisis, the Conservatives have offered nothing but desperate gimmicks. Their flagship policy- to fly a tiny number of asylum seekers to Rwanda- has already cost hundreds of millions of pounds. Even if it got off the ground, this scheme can only address fewer than one per cent of the asylum seekers arriving.” (2024 Labour Manifesto)

Or in the context of returning failed asylum seekers to safe countries:

“We will negotiate returns arrangements to speed up returns and increase the number of safe countries that failed asylum seekers can swiftly be sent back to.” (2024 Labour Manifesto)

To conclude, the Labour Party Manifesto speaks to both the concerns of both lobbyists and Party members, as well as where they intersect, for example by aligning more closely with the EU.  However, the concerns highlighted in the forum are not always addressed directly in relation to trade, but in broader policy commitments. Also, the ways in which these topics were addressed were not always consistent between the Forum and Manifesto. For example, while NPF responses advocated greater dignity and respect for asylum seekers, this was not explicit in the Manifesto. Ultimately, by and large, the Manifesto does not demonstrate a bias towards lobbyists, but manages to find an equilibrium between the desires of two distinct, yet critically important, groups.

References

[1] https://novaramedia.com/2024/06/13/meet-the-labour-candidates-lobbying-for-oil-gas-and-arms-companies/ . Accessed 3rd July 2024.

[2] Concern over ‘corrosive’ impact of Labour candidate working as lobbyist. The National. https://www.thenational.scot/news/24205441.concern-corrosive-impact-labour-candidate-working-lobbyis/. Accessed 3rd July 2024.

[3] Labour’s corporate lobbying links with Polly Smythe. Macrodose Election Economics. https://open.spotify.com/episode/6UmKWzzXx8XwU1Q8iPKH3A . Accessed 3rd July 2024.

[4] Novara Media @novaramedia. (2024). (video). Tik Tok. https://www.tiktok.com/@novaramedia/video/7382148715156376865?lang=en . Accessed 3rd July 2024.

[5] The Labour Party becomes the Lobbyists Party. Morning Star. The Labour Party becomes the Lobbyists Party | Morning Star (morningstaronline.co.uk) . Accessed 3rd July 2024.

[6] National Policy Forum Consultation (2023) Available via  https://policyforum.labour.org.uk/commissions. Accessed 13th June 2024.

[7] Gasiorek, Michael, Justyna A. Robinson, and Rhys Sandow (2024) Labour’s Progressive Trade Policy: Consultations and policy formulation. UKTPO Briefing Paper 81 – June 2024. Available via https://blogs.sussex.ac.uk/uktpo/publications/labours-progressive-trade-policy-consultations-and-policy-formulation/ Accessed 3rd July 2024.

[8] The analysis is done via SketchEngine. Available via http://www.sketchengine.eu/

[9] Sandow, Rhys and Justyna A. Robinson (2024) How can you identify key content from surveys?  Concept Analytics Lab Blog. Available via https://conceptanalytics.org.uk/identifying-key-content-from-surveys/ Accessed 13th June 2024.

[10] Labour Party Manifesto. (2024). Available via https://labour.org.uk/wp-content/uploads/2024/06/Labour-Party-manifesto-2024.pdf. Accessed 13th June 2024.

How does concept-led research matter?

How does concept-led research matter? 

How does concept-led research matter?

by Caitlin Hogan

 

Our Concept Analytics Lab (CAL) team LOVES concepts. In our daily work, we keep seeing the value of the concept-based view of language in bringing insight to thinking, attitudes, and behaviours of people. But how important is the concept-based research for a wider linguistic community? Can concept-based research impact other disciplines and industries? Can you commercialise your concept-based knowledge?

With the aim of consolidating research and application of concept-based approaches to text analysis we gathered experts in the field for the first Concept Quest conference.

 

The event Concept Quest: Navigating Ideas on and Through Linguistic Concepts took place in March 20204 at the University of Sussex. It focussed on the work of CAL and other researchers from a range of academic disciplines. We hosted talks and panels from scholars studying everything from AI concepts to the impact of trade deals on the economy and commercialising concepts in the process of wine production.

 

Justyna Robinson, the Director of the Concept Analytics Lab, started by talking about the aims and advantages of concept mining as a methodology. Concepts are not encapsulated by a single word but are be observable by a set of words, phrases and/or constructions. This allows us to understand how individual terms might be used differently over time, and how they may come to represent different concepts. CAL’s researcher Rhys Sandow then discussed how one can visualise conceptual ontologies and showed how one can turn complex sets of lexical relations into clear diagrammatic representations. Such representations can shed light on conceptual, including socio-conceptual, differences that are inaccessible to more traditional approaches to the analysis of large texts.

Following this, Louise Sylvester (Westminster) talked about how concepts can be incorporated into studies of Medieval English. Her work focuses on the adoption of terms from French into English during this period, and through the use of a semantic hierarchy, she is able to inspect in which cases French pushed out the English variant, and in which cases this did not occur. The use of concepts allows us to see the patterns that emerge in synonym relationships, even from long ago.

 

Haim Dubossarsky (QMUL) approached the study of concepts from a computational angle, discussing the ways in which we currently carry out computational and corpus linguistics, such as collocations, and how we can improve on these methods. Through the projection of a word’s usage onto a series of vectors, one is able to map the meanings of the word and their change over time. This technique provides a computational boost to the analysis of meaning and represents an important link between the world of linguistics and that of computer science that the Concept Analytics Lab covets.

 

The talks on theoretical and methodological aspects of doing concept research were complemented by talks addressing applications of concepts in archival work and in commercial endeavours. 

 

Piotr Nagórka (Warsaw’s Cultural Terminology Lab) discussed the exploration of communications systems and terminological sciences. He probed how the terminology we use to refer to types of wine maps onto production process itself. In this case, for wine. His work shows how one might commercialise concept research by marrying the study of concepts with processes and techniques within the manufacturing sciences.

Angela Bachini and Kirsty Patrick, who work on the Mass Observation project helped us understand how archivists arrive at identifying important concepts in indexing of a new text. We learned a great deal from the Mass Observation team about their workflow and how we as researchers can best help archivist to automate indexing via key-concept detection.

The event finished with a panel discussion on why concepts matter led by Lynne Murphy (Sussex), in which Piotr Nagorka, Kirsty Pattrick, were joined by Julie Weeds (Sussex AI) and Alan Winters (Sussex, CITP).  Alan reflected on the value of concepts in trade analysis, particularly to understand the trade-offs that people are willing to make with regard to global trade. These kind of complex attitudes are difficult to access with other methods, particularly the quantitative methods often used in economics. The advantage of concept analysis, where participants can describe their accounts in rich detail which can then be computationally analysed, is clear in this case. Louise Sylvester added that in her work on Medieval English, concepts help us understand how people living in that era made sense of the world and what categories were meaningful for them. This helps greatly with noticing patterns of use in historical linguistics, and also helps us to understand how the concept of something like a farm has changed from the middle ages to the present day.

 

We continued chatting over some delicious wine (thanks to a generous sponsorship from Mass Observation) and made new connections across institutions and fields.  This is exactly the kind of result we envisage from a successful colloquium, and we were proud to have hosted such a stimulating day. Our gratitude extends to all the wonderful speakers and attendees for making this event so brilliant!

 

To conclude our reflections, the Concept Quest highlighted the value of concept-based and concept-led research and applications. Researching concepts matters for theory of language and knowledge representation as we consider conceptual hierarchies, lexicalised and non-lexicalised concepts, and emergence of new concepts/ideas. At a methodological level, concepts pose a challenge for traditional word-based corpus and NLP techniques. Therefore, new ways of extracting conceptual information from big data is needed.  At a more applied level, empirical ways of gaining access to conceptual information are invaluable for other sectors and disciplines which use large text data. Thus, strengthening objectivity and replicability of concept research will open up this research for other sectors which seek more expert analyses.  That development can also lead to impactful research and even commercialisation of conceptual research.

 

Please get in touch here to find out which key concepts and themes are revealed in your data. 

 

References

Robinson, J. A., Sandow, R. J., & Piazza, R. (2023). Introducing the keyconcept approach to the analysis of language: the case of regulation in COVID-19 diaries. Frontiers in artificial intelligence, 6.

Nagórka, P. (2021). Madeira, Port, Sherry. The Equinox Companion to Fortified Wines. Equinox Publishing Limited.

Identifying key content from surveys

How can you identify key content from surveys?

by Rhys Sandow and Justyna Robinson

 

A case of responses to the Labour Party’s 2023 Trade Policy Forum

 

Surveys which collect responses to open questions are a popular and valuable way of gauging peoples’ attitudes. But they also present specific challenges for keyness analysis in corpus linguistics as the results can be misleading. For example, a high frequency of term X may be skewed by one or two documents within the corpus, rather than being representative of attitudes among the survey respondents more broadly. In such cases, traditional corpus linguistic measures of difference, such as relative frequencies or keyness are not appropriate. In such cases, we advocate for the use of measures of dispersion across a corpus, such as Average Reduced Frequency (ARF) and Document Frequency (DOCF).  This distinction between frequency and dispersion is critical to develop meaningful insights into large data sets, particularly in the context of policy consultation where an understanding of plurality and consensus is highly important.

 

Let us demonstrate how to solve this problem on the basis of examples from data we recently analysed.  Concept Analytics Lab (CAL) was tasked by the UK Trade Policy Observatory (UKTPO) to analyse responses to the Labour Party’s Trade Policy Forum in the build-up to the Labour Party’s annual conference in October 2023. The survey gathered 302 answers to seven questions comprising c. 250,000 words of data. Many of the submissions came from groups with very particular interests, such as specific industries or specific local communities. Therefore, some responses contained detailed discussions of issues critically important to the submitter, but not necessarily widespread among all respondents. For example, when running a keyword analysis, the eighth most key word (with the ententen21 corpus as our baseline) was gpi (genuine progress indicator) with 35 hits across the corpus. However, upon closer inspection, these hits are spread across only 2 of the 302 responses. Thus, while gpi has a high keyness score, it cannot be said that it is a salient topic across the corpus as its use is so highly concentrated across 0.66% of documents.

In order to remedy this limitation of keyness analysis, we considered the spread of terms across the corpus using Sketch Engine’s Average Reduced Frequency (ARF) statistic. ARF is a modified frequency measure that prevents results being skewed by a specific part, or a small number of parts, of a corpus (for more detail on the mathematics behind the measure, see here). Where the ARF and absolute frequency are similar, this suggests a relatively even distribution of a given term across a corpus. However, when there are large discrepancies between the absolute frequency and ARF, this is indicative of a skew towards a small subset of the corpus. For example, while the absolute frequency of gpi in the corpus is 35, the ARF is 2.7 (DOCF, 2), highlighting its lack of dispersion. Similarly, the term gender-just has an absolute frequency of 19 but an ARF of 1.32 (DOCF, 1), highlighting that this term is not characteristic of the data set as a whole, but is highly salient within a small subset of the corpus. By contrast, labour, with an absolute frequency of 1, 434 had an ARF of 725.74 (DOCF, 226), highlighting its spread across the corpus.

When analysing corpus data, methodological decisions can have highly impactful repercussions for the analysis. For example, let’s take the top 10 key multi-word terms from the Labour Party Policy Forum data set ordered by keyness score (see Table 1) and compare it with the top 10 multi-word terms ordered by the highest ARF statistic (see Table 2).

Table 1: The top multi-word terms, ordered by keyness score
 
Table 2: The top multi-word terms, ordered by ARF
 

This analysis highlights, in particular, two obvious outliers, namely ‘human rights defender’ and ‘modern slavery’. The low DOCF and ARF scores highlight that they are highly concentrated within a small number of submissions and, so, are not characteristic of the data set more broadly. 

While no multi-word term occurs in the majority of documents, table 2 provides a perspective on the most broadly dispersed multi-word terms.  It is important to note the substantial overlap between the two measurements in tables 1 and 2, e.g. ‘trade policy’, ‘trade deal’, ‘trade agreement’, ‘international trade’, and ‘labour government’, appear in both. However, the advantage of the ARF ordered data is that there are no clear outliers, skewed by individual, or a very small number of, responses. This means that it is the second data which provides a more valid overview of the content of the data set.

Using a traditional approach to keyness analysis, conclusions may recommend interventions around trade and human rights defenders or modern slavery. However, an analysis of ARF highlights that this is misleading and does not get to the essence of the data set. What is more, policy recommendations based on the former statistic only may result in the disproportionate influence of those who lobby in relation to very specific terms at the expense of more widespread priorities and concerns.

 

This ARF analysis formed part of our analysis of the 2023 Labour Party’s Policy Forum that we conducted for the UKTPO, which can be accessed here.

 

If you are interested in our data analysis services or partnering with us in any way, please contact us here

 

References

Labour Policy Forum (2023). National Policy Forum Consultation 2023. Britain in the World..

Gasiorek, M and Justyna Robinson. (2023) What can be learnt from the Labour Party’s consultation on Trade? UKPTO Blog. 

Survey of English Usage zooms on concepts

Survey of English Usage zooms on Covid-19 concepts

by Caitlin Hogan

 

Lab leader Dr Justyna Robinson gave a talk at University College London (UCL) as part of the Survey of English Usage Seminar Series about the work of the Concept Analytics Lab. Her talk covered a wide range of issues in the realm of concept analytics, including how to draw out concepts from written accounts via the Mass Observation Archive dataset. She focussed in particular on the role of concept change during the COVID-19 pandemic, when lifestyle changes forced people to adapt their routine, and thus the concepts they mention in their daily accounts to shift, in some cases drastically. 

 

The Mass Observation Archive began in 1937 founded by Tom Harrisson, Charles Madge and Humphrey Jennings, and its original tenure ran until the 1960’s, at which point it became defunct. Originally inspired by the founders’ desire to capture public opinion on the abdication of King Edward VIII, by 1939 the project aimed to have ordinary people record the day-to-day experiences of their lives, and nearly 500 did. This creates an invaluable documentation of peoples’ habits, lives, and thoughts, acting almost as a time capsule. In 1981, it was revived at the University of Sussex and continues to collect qualitative accounts of ordinary peoples’ lives and opinions to this day. Every 12th of May (chosen as it was the anniversary of the coronation of King George VI), the project calls for anyone to submit a record of their activity on that day, in honour of the original 1937 call going out on that same day.  The 12th May diaries collected  during COVID-19 pandemic were digitised by a grant provided by the Wellcome Trust. Digitised diaries from the first lockdown in the UK, i.e. 12th May 2020, were the focus of Justyna’s talk. 

 

Justyna discussed how records of ordinary peoples’ activities during lockdown marked a shift towards concepts such as REGULATION, which may be expected, but also the discussion of furniture, given the struggles we all had to adapt to working from home.  Excerpts from the diaries on this theme include the following examples:

 

  • most of the online activities I could cast from my phone to the TV or could be done on my phone, which was vital during the early stages of lockdown, as XXXX was using the home laptop to work remotely, until he received a laptop through work
  • I’m working from home and the work PC is on an old computer desk so giving me a 2foot space to work in. 
  • I can also stretch and do yoga during my working day and sit at a desk that is the right size for me- I am very petite and used to feel uncomfortable in the chairs in meeting rooms, designed for men. 

 

As these examples show, participants mention the struggles of accommodating working from home with limited resources in terms of space and furniture for use while working, and the struggles coexisting while some household members work, and others use furniture for other purposes. The examples illustrate clearly that we can talk about the same concept without using the exact same words, so this commonality would be lost if we only used simple corpus linguistic techniques in this analysis. As explained in the Robinson et al (2023), terms like restriction, freeze, coordination, and clampdown emerged while talking about regulations in the COVID-19 pandemic but were not exactly the word regulation itself. Linking these lexemes together allows a clearer picture to emerge of what topics participants wrote in their diaries. The insight into which concepts participants found important during lockdown would not have been detectable without concept analysis,  and especially invoking the notion of a keyconcept (Robinson et al, 2023),

 

 

As the lab continues to refine tools for concept analysis, talks such as this one is key to spread the word to new and emerging scholars about the role of concepts when surveying English usage. 

 

References

Robinson J.A., Sandow R.J. and Piazza R. (2023) Introducing the keyconcept approach to the analysis of language: the case of REGULATION in COVID-19 diaries. Front. Artif. Intell. 6:1176283. doi: 10.3389/frai.2023.1176283 

Concept Quest Event, 11th March 2024​

Concept Quest Event, 11th March 2024

by Caitlin Hogan

Concept Quest: Navigating ideas on and through linguistic concepts

Our lab will be part of an exciting event in collaboration with the University of Sussex Digital Humanities Lab and the Mass Observation project. Our session will cover our work on concept analysis through some of our recent projects. The team is excited to attend and present at such a thought-provoking gathering!

 

Be sure to check back here after the event for another blog post and photos! 

Register for the event here:

https://www.ticketsource.co.uk/shl-events-ticket/t-yamopvl