Concept Analytics Lab

Conceptual variation: Gendered differences in the lexicalisation of the concept of COMMODITY in environmental narratives

by Justyna A. Robinson, Rhys J. Sandow, Albertus Andito

 

An updated version of this work can be found as Chapter 10 in: Justyna A. Robinson; Rhys J Sandow, & Albertus Andito. (2026).  “Conceptual variation: Gendered differences in the lexicalization of the concept of COMMODITY in environmental narratives”. In Rhys J. Sandow & Natalie Braber (Eds) Sociolinguistic Approaches to Lexical Variation in English. 173-193. Routledge.

Abstract

Within studies of lexical meaning, conceptual variation has received little attention, possibly due to methodological difficulties with operationalising concepts. In the current paper, we build on the approach developed by Robinson & Weeds (2022) to study gendered variation in the lexicalization of concepts in environmentally-themed directives from the Mass Observation Project. We broadly define a concept as a cluster of (near-)synonymous and hyponymic terms representing a shared meaning. Previous research (Robinson & Weeds 2022) shows that gendered variation exists in collocational patterns of concepts. In the current paper, we focus on the varied ways in which a concept is lexicalized by men and women. A case-study of the concept of commodity shows that taxonomic differences between genders exist, with women using more specific terms to a greater extent than men. We suggest that the socially-variable articulation of concepts represents differences in speakers’ attention afforded to given commodities represented by the concept of commodity.

Keywords: conceptual variation, keyconcept, keysense, lexis, lexicalisation, sociolinguistics, gender, Mass Observation Project

1. Introduction

Variation in lexical meaning is typically investigated from the broad perspectives of semasiology and (formal) onomasiology.1Semasiology refers to the mapping of a single word form onto multiple senses, for example, wicked ‘evil’ and ‘good’. Formal onomasiological variation investigates the distribution of a plurality of words that are used to lexicalize a given meaning at the level of (near-)synonymy, for example, sofa is lexicalized as sofa, settee, and couch. This latter type of lexical variation is the one that has most widely been studied from a sociolinguistic perspective (see Sandow & Braber, this volume). However, the focus of this chapter is on conceptual onomasiological variation (for example, Grondelaers & Geeraerts 2003), that is, the way in which concepts are distributed and lexicalized in heterogenous ways.

There is a spectrum of methodological approaches to lexical variation that vary according to the degree of control that the researcher has over the lexical usage in the data that they work with. At one end of the spectrum where control is highest, researchers use elicitation tasks, including surveys (for example, Britain et al. this volume; Robinson 2010, 2012). Lexical variation can also be attested through semi-structured interviews (for example, Braber, this volume; Bucholtz 2012). At the other end of the spectrum, there are data which had been produced with no agency of researchers, such as newspaper articles, radio recordings. Such work typically engages with methods of Corpus Linguistics (for example, see Wilson this volume).

The development of Corpus Linguistics has allowed for investigating another layer of lexical variation, i.e. variation in meaning between texts more broadly as opposed to the formal onomasiological variation that relies on functional equivalence between variants. This is typically achieved through a keyword analysis (for example, see Baker 2004). Such analysis involves profiling a target dataset against a baseline and identifying those words (or phrases) that are distinctive of the target data. Keyword analysis offers a powerful approach in identifying the ‘aboutness’ (Kilgarriff 2009) of texts in a bottom-up fashion. However, this method profiles word forms, not their meanings. Subsequently, it is up to a researcher to conduct a post hoc, and often ad hoc, interpretation of the meanings from a keyword list in search for meaningful themes in the text.

In this paper, we apply a method for extracting semantic variation between texts by analysing meanings of words, rather than just lexical forms (à la keywords). We access linguistic meaning through the unit of concepts which we broadly define as a cluster of (near-)synonymous and hyponymic terms representing a given meaning. The principle of this method was first introduced by Robinson & Weeds (2022) and Robinson et al. (2023). In that work, Natural Language Processing tools are applied to analyse conceptual variation between texts through introducing the notion of a keyconcept.2 A keyconcept is a concept that occurs more often in a corpus than expected, as compared to a reference corpus (see Section 4 Method for a mathematical clarification of the definition). This approach allows for semantic and statistical interpretation of concepts that are characteristic of a given text.

Previous research shows that the distribution of conceptual themes in the language used by gender groups is not homogenous. Based on text samples from 70 studies the United States, New Zealand, and England, Newman et al. (2008: 219–20) found that compared to women, men were more likely to talk about sports, money, occupation and less likely to talk about home, family, and friends. For a more detailed discussion of language and gender, including the related differences in socialisation practices, see Eckert & McConnell-Ginet (2013). Robinson & Weeds (2022) also discovered the existence of variation in concepts used by male and female witnesses in courtrooms as well as differences in conceptual collocation patterns across genders.  In the current research, we ask if the gendered language also varies in terms of conceptual taxonymy. More specifically, we consider if men and women lexicalize concepts differently, that is, whether they engage with specific or general levels of a conceptual hierarchy, i.e. hyponymy vs. hypernymy, to differing extents.

The paper is structured as follows. First, we contextualise the current study within the growing body of work on concepts and lexicalization. Next, we explain the methodological approach that enables a conceptual analysis. Then, we turn to the dataset, which is the Mass Observation Recycling and Environmentalism (MORE) corpus. It consists of responses to three ‘directives’ on the topic of environmentalism collected by the Mass Observation Project. The analysis profiles the concepts within the dataset, including variation between men and women, using the concept of commodity (specifically commodity.n.01) as a case study. The choice of environmental narratives and the concept of commodity is motivated by the desire to understand better the characteristics of populations’ language and thinking in this economically- and socially-important area. We show the ways in which such a conceptual perspective highlights gender-based differences in behavioural practices such as that women engage with more concepts that pertain to domestic labour. We also identify variation in the lexicalization of concepts, with women typically using more specific (hyponymic) levels of the conceptual hierarchy than men. The analysis benefits from a range of visualisation tools. We conclude by advocating for the value of conceptual analysis and the opportunities it affords for lexical variation research. We note that this research is exploratory and serves as a proof-of-concept approach to the analysis of socially-variable patterns in lexicalization of concepts.

 

2. The Concept and Lexicalization

The last decade has seen an increased focus on concept-led linguistic research. One area that has led investigations into concepts is language change. A body of conceptual research has built on established historical thesauri, such the Bilingual Thesaurus of Everyday Life in Medieval England (BTh, Sylvester et al. 2017) or the Historical Thesaurus of the Oxford English Dictionary (HT, Kay et al. 2023), or on large corpora as in the Linguistic DNA project (Fitzmaurice et al. 2017).

Variation exists in the way the unit of a concept is operationalised in the studies of language. The HT and BTh operationalise concept as a sense or a group of senses expressed by a term or terms and placed within a taxonomic structure with other meanings. The HT presents a taxonomy which begins with the most general ways of expressing a concept, such as categories of The World, The Mind, Society, and moves hierarchically downwards to the most specific [ones]”.3 The HT takes the structure of Roget’s Thesaurus and imposes it on historical data, with a sensitivity to representing historical senses of words.The BTh draws on the HT structure with modifications that are suitable to mirror the medieval world. However, BTh classifies vocabulary into semantic roles, rather than a hierarchy.

A departure from this view of concepts is presented by the Linguistic DNA project which sees concepts as discursive clusters. According to Fitzmaurice et al. (2017: 25) “In any particular historical moment, a concept might not be encapsulated in any single word, phrase or construction; instead it will be observable only via a complete set of words, phrases or constructions in syntagmatic or paradigmatic relations to each other in discourse”. A discursive concept is made up of paradigmatic terms which habitually co-occur in language across large proximity windows. For example, Mehl (2022) shows that the discursive concept diversity-opinion-religion is made up of terms diversity, opinion, religion habitually co-occurring around 5000 times in EEBO-TCP. In other words, a frequent and mathematically significant occurrence of the trio diversity-opinion-religion indicates a possibility of an existence of an idea that was expressed by these three nouns in conjunction rather than by an individual term. Close reading of extracts representing the discursive concept allows for tracing the formulation of ideas regardless of whether they ever become encapsulated in a single term.

Language change research also pursues questions of modifications of different levels of conceptual hierarchy that happen as an outcome of language contact.  Sylvester et al. (2022) show that terms making up conceptual categories get reorganised when distinct communities come to contact. In research querying the absorption of French-origin borrowings into Middle English, Sylvester et al. (2020: 28) show that these borrowings tended to enter hypernymic (more general) levels of conceptual categories. Surprisingly, these French tended to occupy semantic spaces where there was more, not less, lexical choice. In research exploring the obsolescence, Vogelsanger (2024: 24) finds that most lexical loss happens also at hypernymic levels as “the more specific the concept, the fewer words and senses we find, but in turn they seem to be more resilient, since they show much lower rates of obsolescence”.

A significant area of study focuses on lexicalization of concepts. While lexicalization generally refers to “the assignment of lexeme to a meaning” (Murphy 2010: 16), historical linguists tend to focus on various aspects of this process. Thus, Trousdale (2008) asks how once closed-class words or phrases develop lexical meaning. Alexander (2018) or Dallachy (2024) investigate how words that map new concepts are added to a language’s lexicon. Sylvester and Tiddeman (2024) develop measures of density of lexicalization. These studies show potential for using a conceptual view on language as a way making exploring social cognition, with lexicalization being a measure of “cultural attention” (Alexander 2018, Dallachy 2024) and a “function of speakers’ needs” (Sylvester et al. 2020: 28).  

The approach to concept and lexicalization pursued in the current work can broadly be categorised in the tradition of the aforementioned thesauri-based approaches in that we consider a term’s usage, its sense, as a base for a concept. We also consider the concept as belonging to a network of hierarchically-structured meanings, i.e. structured horizontally in terms of (near-)synonymy and co-hyponymy and vertically in terms of hyponymy and hypernymy. We use WordNet (Fellbaum 1998), specifically WordNet 3 (Princeton University 2010), to profile senses and model the conceptual structure including semantic relations. WordNet has the advantage in this respect as it is made up of twenty hierarchical levels (Mohamed & Oussalah 2014), as opposed to, for example, the seven levels of the Historical Thesaurus (Piao et al. 2017), thus it enables greater granularity of analysis when it comes to researching hierarchical semantic relations. 

WordNet is a large lexical database of English that groups words according to their meaning. It enables meaning to be modelled through two main semantic units, senses and concepts. Working at the sense level means that we consider words in a text by their meaning as tagged by WordNet. Concepts additionally include all of the hyponyms of the sense. As an illustration of the difference between a sense and concept, consider the WordNet tag of person.n.01, which is defined as ‘a human being’.4 The sense person.n.01 is lexicalized by words that are sense-tagged as person.n.01, including person, individual, someone, and somebody. These lemmas exist at a single level of the semantic hierarchy, that is, the semantic relationship between them is broadly that of synonymy. Meanwhile the concept person.n.01, refers to the words that are sense-tagged as person.n.01 and to words that are sense-tagged as the hyponyms of person.n.01, which include the co-hyponyms child.n.03 and adult.n.01, hyponyms of these hyponyms, such as woman.n.01, recursively until there are no more hyponyms left. This process is unidirectional, that is while the concept of person.n.01 includes hyponyms such as adult.n.01, it does not include hypernyms such as organism.n.01.
The proposed view of text semantics, in which each word is tagged for its sense and position in the WordNet hierarchy allows for testing a whole set of hypotheses on categorisation of meaning, concepts, and lexicalization. The current research centers lexicalization understood as the ways in which concepts are represented through words or multi-word constructions. We investigate gendered variation in using terms at hypernymic and hyponymic levels in the concept of commodity.
In identifying the scope of this research, we are motivated by methods and findings of Robinson & Weeds (2022) on conceptual gendered variation. Robinson & Weeds (2022) discover that in the 19th century concepts varied in terms of their differing collocational patterns across genders. For example, while the concept of woman was used at similar frequency by male and female speakers, the adjectival concepts that the concept woman collocated with demonstrated variation across gender. Women were more likely than men to describe other women using the concept AP.02.b [individual character] and the concept AW.04.a [poverty]. These concepts include adjectives such as single and poor and their usage with the concept woman is evidenced by statements such as ‘She is a married woman’ and ‘I am a poor unfortunate woman’ (Robinson & Weeds 2022: 421). Unlike Robinson & Weeds (2022) who focus primarily on collocation patterns, we focus on the ways in which men and women lexicalize concepts and the different levels of the conceptual hierarchy with which they engage.

3. Data

Data used in this research are provided by the Mass Observation Project (Mass Observation 2010, henceforth MOP), a British national life-writing project. Thrice yearly, the MOP issue open-ended questionnaires, or ‘directives’, on a variety of topics, from Royal coronations, to attitudes towards gender. These are sent to a panel of c.500 ‘observers’ who are invited to submit their response to the directive. We focus on a collection of three directives with a broad theme of environmentalism, which are titled ‘Future of consumption’ (2018), ‘You and plastics’ (2019) and ‘Household recycling’ (2021) (see here). The ‘Future of consumption’ directive asked participants to consider the way in which consumption practices are likely to change for future generations. ‘You and plastics’ directive asked observers to reflect on their use of, particularly single-use, plastics in the past, present, and future. ‘Household recycling’ directive asked respondents to consider what and how often they recycle, as well as their motivations for recycling. While each Mass Observation directive receives a small number of handwritten responses, we focus on digitally-submitted files. These responses to the three directives number 395 submissions from ‘observers’, totalling 416,754 words. These three directives form the target corpus, henceforth the Mass Observation Recycling and Environmentalism (MORE) corpus.

Senses and concepts become key if they occur frequently enough in a target corpus in comparison to a reference corpus. The choice of a reference corpus depends on a rage of criteria (cf. Baker 2004). In the current research the reference corpus comes from data collected by the MOP. As well as the directives, the MOP also issues calls for ‘Day diaries’ on the 12th of May each year since 2010. These diaries include descriptions of daily activities, thoughts that the writer has throughout the day, and generally provide an insight into the life of the diarists. The digitally-submitted responses to these diaries from 2010–2019 form the baseline with 4,101,605 words, from 3,070 diary entries (see Robinson et al. 2023).

Consistently, the respondents to the MOP’s calls are disproportionately women, older, middle-class and from the South-East of England (see Robinson et al. 2023). In terms of gender, participants are asked to self-identify their gender, and some do so with labels outside of the male-female binary.5 While the unbalanced nature of the sample is a limitation, the size of the dataset  and subsets for each demographic group still allow for  a robust comparative analysis.  Excluding those for whom relevant data was not provided, the gender and the decade of birth distribution of the contributors to both the target and reference corpora are presented in Figure 1. The two datasets in Figure 1 are broadly similar in relation to socio-demographic categories, although the MORE corpus (left) has slightly more male respondents (28.8%) as opposed to the baseline (right, 17.7%).

🔍 Hover to Zoom

The age (decade of birth) and gender distribution among respondents, with the MORE corpus on the left and reference corpus on the right
Figure 1: The age (decade of birth) and gender distribution among respondents, with the MORE corpus on the left and reference corpus on the right

4. Method

In the current research, Word Sense Disambiguation (WSD) is employed to determine which sense is the most appropriate for each word in the data based on the word’s context. We use SupWSD (for more detail, including an evaluation of its accuracy, see Papandrea et al., 2017), a WSD tool that uses a machine learning algorithm, i.e. Support Vector Machine, and WordNet (Fellbaum 1998, WordNet 3.0 (2010)) as the sense inventory, taking into account part-of-speech, surrounding words, and local collocations.

Unlike other knowledge bases (for example, the HT), the WordNet hierarchy only applies to nouns and verbs. While other part-of-speech categories, such as adjectives and adverbs are tagged for senses in WordNet, they do not form a hierarchy, that is, they are represented in a horizontal dimension only. Thus, the current conceptual analysis is limited to verbs and nouns. Verbal and nominal senses are assigned a level, ranging from the broadest concepts at level 0, to steadily more specific concepts at levels with higher numbers. At level 0, the highest level of hyponymy, a variety of verbs, such as those tagged with senses trade.v.01 and degrade.v.01, are present, while all nouns converge on entity.n.01. At the other end of the hierarchy, there are much more specific senses, such as cow.n.01 at level 17. That is, there are 16 hypernyms6 that separate cow.n.01 and entity.n.01, for example, physical_entity.n.01 at level 1 and cattle.n.01 at level 16.

After each word in the corpora has been tagged by the appropriate sense, we perform analyses of the corpus through a bespoke Application Programming Interface. In order to identify differences in the usage of a senses or in the use of concepts between the target and reference corpora we use a measure of Pointwise Mutual Information (PMI, see for example, Huang et al. 2009, for a discussion of its application to conceptual analysis, see Robinson & Weeds 2022, Robinson et al. 2023). PMI enables the identification of keysenses and keyconcepts, i.e. senses or concepts which appear in the corpus more often than one would expect, given their frequency in a reference corpus. The higher the PMI, the more distinctive the sense or the concept of the target dataset relative to the reference corpus. The PMI is established in the way presented in Equation 1, where A is a sense or concept, B is a target corpus, P(A|B) is the probability of encountering a sense A or a concept A given a target corpus B, and Pref(A) is the probability of a sense A or concept A in the reference corpus.

\[ \text{PMI}(A, B) = \log \left( \frac{P(A \mid B)}{Pref(A)} \right) \]
... Equation 1

In Section 5.1, using the tools discussed here, we explore the semantics of the MORE corpus. More specifically, we provide an overview of the distinctive senses and distinctive concepts.  In Section 5.2, we focus on the case study of the concept of commodity to answer questions pertaining to taxonomic differences in lexicalization patterns between men and women.

5. Results and Analysis

5.1. Semantics of the MORE corpus: Keysenses and keyconcepts

In order to provide a semantic overview of the environmental narratives, we identify top keysenses and a conceptual profile of the data. The top 50 keysenses in the MORE corpus, which are measured against the senses in the reference corpus, are presented in Figure 2. The senses with the highest PMI values appear in the top left with the largest tile size and darkest colour, conversely, the 50th top sense appears in the bottom right with the smallest tile size and palest colour. The sense-level analysis clusters (near-)synonymous lexical items. For example, in some cases the adjectives reusable and recyclable are used synonymously enough to be tagged with the same sense, i.e. reclaimable.s.01.7 Additionally, this analysis disambiguates polysemous words and is preferable when the focus is on the meaning rather than form. For instance, it distinguishes between waste.n.01 ‘any materials unused and rejected as worthless or unwanted’ and waste.n.02 ‘useless or profitless activity; using or expending or consuming thoughtlessly or carelessly’.

🔍 Hover to Zoom

Figure 2: The top 50 senses in the MORE corpus, with corresponding PMI values

Figure 10.2 provides a semantic overview of the MORE corpus. The most distinctive sense of the corpus is coronavirus.n.01. This result occurs due to the very low frequency of this sense in the baseline corpus, coupled with its much higher frequency in the target corpus, particularly the ‘Household recycling’ directive, where the directive prompt specifically asked about the effect of Covid‑19 on recycling practices. Other senses provide a largely intuitive account of the content of the responses to the three directives, such as materials, for example, plastic.n.01, cardboard.n.01, and cellophane.n.01, as well as practices associated with environmentalism, such as recycle.v.02 and flatten.v.01.

The keyconcept approach develops a keysense analysis by considering (near-)synonymy alongside hyponymy. One way to visualise the verbal and nominal concepts in the target corpus is by using a sunburst, as in Figure 3.8 This conceptual profile presents the concepts at all levels of the conceptual hierarchy. Each level in conceptual hierarchy corresponds to one ring on the sunburst, with the highest levels the conceptual hierarchy, such as entity.n.01, being close to the centre of the figure at level 0, then its daughter nodes physical_entity.n.01 and abstraction.n.06 at level 1, and so on.9 Hyponyms radiate out from their parent node. However, where the sense of the parent node is used, this does not generate a daughter node and is, instead, represented by empty space. For example, in the concept container.n.01 on the right of Figure 3 at level 6, the nodes that radiate from this concept highlight that is has a variety of hyponyms, such as, vessel.n.03 and bag.n.01. However, there is also space not occupied by daughter nodes, this is where the sense container.n.01 was used directly. The colour and size of each node are also meaningful dimensions of this visualisation. The size of each node represents raw frequency of concepts in the target corpus, and the intensity of colour indicates each concept’s PMI value. The bar on the right of the figure provides a guide as to the PMI range.  For example, matter.n.03 is less frequent but more distinctive in the target corpus than object.n.01 (both concepts are at level 2).  While it is not intended that each node should be readable, the figure serves as a compass, directing the researcher to the most distinctive conceptual areas of the dataset. For example, at the more general levels of the conceptual hierarchy (for example, 0–3), the PMI values are very low. However, distinctiveness is more likely to be found at lower levels of the hierarchy. For example, container.n.01 has a particularly high PMI (2.8) at level 6, as a hyponym of instrumentality.n.01. However, this high PMI is not simply a result of the sense container.n.01, but of its hyponyms, too. Within container.n.01 some of its daughter concepts also display high PMIs, such as bottle.n.01 (PMI=4.5) and bin.n.01 (PMI=5.72).

🔍 Hover to Zoom

Figure 3: The conceptual profile of women in the MORE corpus, with men as the baseline

Given that the environmentally-themed directives make up the MORE corpus, it is not surprising that the respondents discuss types of containers and their uses, including innovative repurposing, as well as their recycling practices. However, not all daughter concepts of container.n.01 are highly distinctive. For example, bath.n.01 and boiler.n.01 occur less often than in the baseline. Other concepts with particularly high PMI values include use.n.01 (PMI=4.4), and its daughter concept recycling.n.01 (PMI=6.2), and waste.n.01(PMI=4.8) and its daughter concept rubbish.n.01 (PMI=3.3).

 

This current section illustrates an approach to semantic characterisation of data. The MORE corpus is semantically described from the perspective of senses and concepts. Figure 2 shows the most distinctive senses of the MORE corpus, including lockdown.n.01, pandemic.n.01, and plastic.s.02. The conceptual approach complements this by considering the taxonomic relation of hyponymy. For example, while a sense-level analysis highlights container.n.01 in the top 50 most distinctive senses, a conceptual analysis shows how not only is this sense distinctive of the corpus, but so are some of its hyponyms such as bin.n.01 and botte.n.01.

5.2. Gendered variation in the MORE corpus: Keyconcepts and lexicalization patterns

Having established the semantic characteristics of the MORE corpus through the keysense and keyconcept analysis, we turn to the question of gendered semantic variation. Firstly, we investigate conceptual differences in environmental narratives between male and female respondents. Secondly, we ask how lexicalization of a concept, commodity, varies across gender groups.
Gendered differences in concepts used in the MORE corpus are extracted through a keyconcept analysis. Each concept’s PMIs for each gender is measured against the other gender group within the MORE corpus.10 That is, when the female data are the target, the male data are the reference, and vice-versa (see Figures 4 and 5).

🔍 Hover to Zoom

Figure 4: The conceptual profile of women in the MORE corpus, with men as the baseline

🔍 Hover to Zoom

The conceptual profile of men in the MORE corpus, with women as the baseline
Figure 5: The conceptual profile of men in the MORE corpus, with women as the baseline

Relative to men, for women, the PMI of charity.n.01 is 1.4, husband.n.01 is 3.6, and home.n.01 is 1.3. Relative to women, for men the PMI of internet.n.01 is 1.2, alcohol.n.01 is 1.1, and wife.n.01 is 4.1 highlighting that men engage with these concepts to a greater extent than women do in the MORE corpus. Such results testify to heterogenous behaviours and practices across genders.

Even when concepts have similar overall rates of usage across demographic groups, their internal structure can also differ. This is exemplified through the concept of commodity.n.01 which is selected as a case study. This concept is used similarly by men and women in the dataset as measured by PMI. It is used very slightly more by women with a PMI=0.18, when compared with men. The raw frequency of usage is N=1021 for women, and N=445 for men. While the PMI for commodity.n.01 for both genders is similar, the internal structure of the concept displays a great deal of variation across the two gender groups. The conceptual profile for commodity.n.01 among female writers is presented in Figure 6, while the male equivalent is presented in Figure 7.

🔍 Hover to Zoom

Figure 6: The conceptual profile for commodity.n.01 among female observers

🔍 Hover to Zoom

The conceptual profile for commodity.n.01 among male observers
Figure 7: The conceptual profile for commodity.n.01 among male observers

The conceptual profiles for males and females display a number of differences in the type of artifacts with which males and females interact. Numerous examples of gendered clothing items are distributed disproportionately across the gendered groups, with concepts such as dress.n.01 (PMI= 0.3), negligee.n.01 (PMI=0.9), brassiere.n.01 (PMI=1.7), and skirt.n.02 (PMI=0.1) displaying positive PMIs in the female data, and suit.n.01 (PMI=2.2) and jean.n.01 (PMI=1.8) displaying positive PMIs in the male data. There is also a greater attention paid to domestic labour evident in women in the conceptual profiles of commodity.n.01. For example, white_goods.n.01 is used more by women (PMI=0.8). Within the concept of white_goods.n.01, the female data has positive PMIs for refrigerator.n.01 (PMI=0.5), dishwasher.n.01 (PMI=2.0), and washer.n.01 (PMI=2.2), while the only hyponym with a positive PMI in the male data is cooler.n.01 (PMI=1.8). Similarly, laundry.n.01 has a positive PMI in the female data (PMI=1.1). There are also examples of the asymmetric distribution of childcare, with diaper.n.01 (PMI=2.8) having a positive PMI in the female data.

One surprising result is that shirt.n.01 is used more by the female observers (PMI=0.82), despite it being an artifact stereotypically associated with men. However, we can account for this result by identifying that many of these examples involve accounts of interactions with male clothing by women, such as Example (1):

  1. I reuse my husband’s cotton shirts in crafting. [Household recycling, female born in the 1960s]

The observed differences in conceptual profiles for males and females suggest that concepts provide a window into community behaviours. The artifacts represented by the concept of commodity and the asymmetric gender distribution with which they are engaged with, are a medium through which the physical world is experienced by men and women.

Another perspective in which the current research highlights gendered conceptual variation is through taxonomic differences, i.e. the levels of the conceptual hierarchy that men and women typically engage with. This differential engagement is evident in the fact that the sunburst in Figure 6 is noticeably ‘busier’ than the one in Figure 7. That is, the male data in Figure 6 includes more empty space, that not occupied by daughter nodes (hyponyms). Take clothing.n.01 as an example. Proportionally, men are more likely to use the more general level sense clothing.n.01, while women are more likely to conceptualise clothing with a greater degree of specificity. When the concept clothing.n.01 is used, men use the sense clothing.n.01 32.2% of the time, compared to 24.7% for women. Thus, women are more likely to use a hyponym. To illustrate this, example (2) is more typical of a male conceptualisation of clothing.n.01 at the more general category level, whereas the Example (3) is more typical of a female conceptualisation at a greater level of specificity (brassierre.n.01):

  1. Used batteries are taken to local supermarkets where they can be recycled, also clothing can be recycled at various points around the area. [Household Recycling, male born in the 1950s]
  2. I’d like to be able to recycle old bras [Household Recycling, female born in the 1950s]
Similarly, the use of concept of merchandise.n.01 is more likely to be realised as a hyponym by women, rather than men. For women 74.2% of uses of this concept are in the sense merchandise.n.01 and for men this value is higher at 81.6%. This is also true of the broader concept of commodity.n.01, with men using the sense commodity.n.01 17.9% of the time, compared with 12.4% for women.11

The internal structural differences in concepts evidence variation in the ways men and women express those concepts by their use of words, that is, in the ways they lexicalize those concepts.  In the analysis of commodity.n.01, differences arise in the levels within the conceptual hierarchy at which concepts are lexicalized, with men lexicalizing concepts at more generic levels, and women lexicalizing concepts at more specific levels.

6. Summary and Conclusions

This research presents a new approach to querying semantics of texts by engaging with horizontal ((near-)synonymous and co-hyponymic) and vertical (hyponymic and hypernymic) semantic relations. The current approach highlights hyponymy as a critical aspect of language variation alongside more widely-research relations in socio-semantics, such as synonymy and polysemy. Exploring texts through the lenses of keysense and keyconcept enables semantic content to be profiled which can empirically navigate further analysis and close reading. In one way, the concept-driven approach is more specific than more traditional alternatives, such as the keyword analysis, in that it distinguishes polysemous senses of the same word form. In another way, it is more general as in a conceptual approach, it is less relevant which (near-)synonym is used, what matters is the meaning expressed. This perspective enables the analysis to centre meaning, while variation in word form is secondary to this approach. We advocate for using the current conceptual approach that offers a bird’s-eye view of the text meaning with conjunction with close reading in order to develop the most robust insights into a text’s semantics (for example, Robinson et al. 2023). 

The current research demonstrates the ways in which a conceptual perspective can tell stories of socially-asymmetric behavioural practices. Examples of this include the way in which the use of clothing items exhibit gendered patterns, such as brassiere.n.01 being used relatively more by women and suit.n.01 by men. Similarly, there is a higher frequency of concepts pertaining to domestic labour and childcare in the female data. Results that testify to heterogenous behaviours and practices across genders are corroborated by the parallels with other studies, such as those observing gender-based differences in the share of domestic duties (Bianchi et al. 2012; Thébaud et al. 2021). The parallels between such previous and the current research speak to the validity of this approach.

The current concept-driven approach enables insights into lexicalization. Even when concepts have similar overall rates of usage across demographic groups, their internal structure can also differ. We show that men and women differ in the way they use different levels of conceptual hierarchy when they express or lexicalize the same concept. In the case of commodity.n.01. men are more likely to use general levels terms, such as, clothing, women are more likely to engage with more precise classifications in their lexicalization, that is, specific types of clothing, such as, bra. This idea is redolent of Lakoff’s (1973) suggestion that women tend to use more specific colour terms, such as, lavender, whereas men lexicalize these same colours at a broader level of conceptualisation, such as, purple.12 Lakoff’s (1973) observations alongside the current research lead us to hypothesise that differential lexicalisation patterns may hold for gendered or socio-demographic conceptual variation more broadly.

Another question to consider is why lexicalization would take place at different levels of the conceptual hierarchy for men and women. One possibility lies in the notion of cultural and social needs speakers express via lexicalisation practices (cf. Alexander, 2018, Dallachy 2024, Sylvester et al. 2020). Another possibility considers engaging with cognitive foundations of language and perception biases among men and women. At this stage, we suggest that the lexicalization of commodity.n.01 reflects the attention afforded to the artifacts with which men and women engage. The question of why the lexical representations of objects display differential attention across community could to be explored further via socio-cognitive research frameworks (Pütz et al. 2014).  

Findings of the current research have implications for language used by policy and industry decision-makers in the climate and environmentalism space. By integrating insights from the Nudge Theory, especially the idea of “choice architecture” (Thaler and Sunstein 2008), relevant communication and engagement strategies can be optimised to a specific audience. By using gender-characteristic language, such as a more general or detailed terms to describe commodities, population can be gently steered towards a desired behaviour, such a more accurate sorting of those commodities when they become waste. The policy makers also require precise categorisation in waste management (EU Waste Framework 2008/98/EC)13, which the current research provides through the categorisation of commodities people handle. By fine-tuning public communication to resonate with socio-demographic groups, and embedding this clarity into policy wording, policy makers can more effectively reduce landfill waste. These strategies work together to create an environment where an individual is gently steered toward more sustainable behaviour without the need for overt financial incentives.

To conclude, the proposed concept-led approach shows potential beyond a case of gendered variation. The current analysis could apply to any demographic category, including cross-sectional categories. It remains to be seen the extent to which the presented results hold for other concepts in a systematic way. Also, motivated by studies outlined in Section 2, such as Sylvester et al. (2020) or Vogelsanger (2024), further research could explore the relationship between lexicalization and different levels of the conceptual hierarchy in the context of language change. While it is argued elsewhere (for example, Sandow & Braber, this volume), that lexis can provide a lens into society, we argue here that a conceptual approach lends a particularly felicitous perspective to this endeavour. The proposed conceptual approach affords further methodological, theoretical, and applied opportunities in sociolinguistics and beyond.

Footnotes

  1. Acknowledgements: We would like to thank two anonymous Reviewers for their helpful comments on the earlier draft of this manuscript. We would like to thank the broader team at Concept Analytics Lab at the University of Sussex, particularly Julie Weeds, Willam Kearney, Yassir Laaouach, and Ray Davey for the work on the API that underpins the research presented here. Supported by the Arts & Humanities Research Council (AHRC) Impact Acceleration Account, at the University of Sussex (AH/X003531/1).↩︎
  2. Robinson & Weeds (2022) use the term characteristic concept.↩︎
  3. https://ht.ac.uk/classification/↩︎
  4. In the WordNet sense inventory, the refers to the nominal part-of-speech category. The 01 identifies this as the first sense of this form in WordNet. By way of example, person.n.01 is defined as ‘a human being’ and person.n.02 is defined as ‘a human body (usually including the clothing)’.↩︎
  5. Figure 1 does include these individuals, but as there are very few in number, they are not clearly visible.↩︎
  6. These are hypernyms from the perspective of cow.n.01, but hyponyms from the perspective of entity.n.01.↩︎
  7. For example, ‘The video I watched suggested that rice and pasta should be stored in reusable glass containers’ [‘Future of consumption’ directive, male born in the 1970s] and ‘some of the plastic containers are recyclable but I guess that several are single-use’ [‘You and plastics’ directive, female born in the 1930s].↩︎
  8. Concepts with a negative PMI also appear in Figure 3 and 4. In terms of the colour coding key on the figures, these are coloured as if their PMI value is 0.↩︎
  9. The highest levels in the conceptual hierarchy correspond to lowest numbers in the WordNet hierarchy. Thus, the highest levels are entity.n.01 at level 0, physical_entity.n.01 at level 1 and so forth.↩︎
  10. This task requires a modification of the PMI Equation (1) in terms of target and reference datasets.↩︎
  11. Beyond the concept of commodity.n.01, this effect holds for other concepts where men are more likely to use the broader sense and women are more likely to use a hyponym. For example, in the concept of child.n.02, men refer to the specific sense 53.6% of the time, compared to 40.7% for women; in chemical.n.01 the values are 3.9% for men, 1.9% for women, and in waste.n.01, the values are 76.9% for men and 71.2% for women. ↩︎
  12. We thank an anonymous Reviewer for this observation.↩︎
  13. https://eur-lex.europa.eu/eli/dir/2008/98/oj/eng, see especially Paragraph 2.↩︎

References

Alexander, Marc. 2018. Lexicalization Pressure. Plenary lecture delivered at 20th International Conference on English Historical Linguistics, University of Edinburgh.

 

Baker, Paul. 2004. Querying keywords: Questions of difference, frequency, and sense in keyword analysis. Journal of English Linguistics, 32: 346-59.

 

Bianchi, Suzanne M., Liana C. Sayer, Melissa A. Milkie, & John P. Robinson. 2012. Housework: Who did, does or will do it, and how much does it matter? Social Forces, 91: 55-63.

 

Bucholtz, Mary. 2012. Word Up: Social meanings of slang in California youth culture. In Leila Monaghan, Jane E. Goodman, & Jennifer Meta Robinson (eds.), A Cultural Approach to Interpersonal Communication: Essential Readings, 2nd ed, 274-97. Chichester: Wiley.

 

Dallachy, Fraser. 2024. A human-scale set of categories for the Historical Thesaurus of English. Dictionaries, 45: 145-68.

 

Eckert, Penelope. & Sally McConnell-Ginet. 2013. Language and Gender, 2nd edition. Cambridge: Cambridge University Press.

 

Fellbaum, Christiane (ed.). 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

 

Fitzmaurice, Susan. & Seth Mehl. 2022. Volatile concepts: Analysing discursive change through underspecification in co-occurrence quads. International Journal of Corpus Linguistics, 27: 428-50.

 

Fitzmaurice, Susan., Justyna A. Robinson, Marc Alexander, Iona C. Hine, Seth Mehl, & Fraser Dallachy. 2017. Linguistic DNA: Investigating conceptual change in Early Modern English Discourse. Studia Neophilologica, 89: 21–38.

 

Grondelaers, Stefan. & Dirk Geeraerts. 2003. Towards a pragmatic model of cognitive onomasiology. In Hubert Cuyckens , René Dirven and John R. Taylor (eds.), Cognitive Approaches to Lexical Semantics, 67-92. Berlin: Mouton.

 

Hoang, Hung H., Su N. Kim, and Min-Yen Kan. 2009. A re-examination of lexical association measures. In Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, 31-9.

 

Kay, Christian., Marc Alexander, Fraser Dallachy, Jane Roberts, Michael Samuels, and Irené Wotherspoon (eds). 2023. The Historical Thesaurus of English (2nd edn., version 5.0). University of Glasgow. https://ht.ac.uk/. Accessed September 11, 2024.

 

Kilgrarriff, Adam. 2009. Simple maths for keywords. In Michaela Mahlberg, Victorina González-Díaz, & Catherine Smith (eds.), Proceedings of the Corpus Linguistics Conference CL2009. Available at: https://www.sketchengine.eu/wp-content/uploads/2015/04/2009-Simple-maths-for-keywords.pdf. Accessed 18 July 2024.

 

Lakoff, Robin. 1973. Language and woman’s place. Language in Society, 2 (1), 45-80.

 

Mass Observation. 2010. Mass Observation Archive. Available online at: http://www.massobs.org.uk/. Accessed August 21, 2024.

 

Mehl, Seth. 2022. Discursive Quads: New Kinds of Lexical Co‐occurrence Data With Linguistic Concept Modelling. Transactions of the Philological Society120 (3), 474-88.

 

Mohamed, Muhidin. & M. Oussalah. 2014. A comparative study of conversion aided methods for WordNet sentence textual similarity. Proceedings of the AHA! Workshop on Information Discovery in Text, 37-42.

 

Murphy, Lynne M. 2010. Lexical Meaning. Cambridge: Cambridge University Press.

 

Newman, Matthew L., Carla J. Groom., Lori D. Handelman., & James W. Pennebaker. 2008. Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes, 45: 211–36.

 

Papandrea, Simone., Alessandro Raganato, and Claudio D. Bovi. 2017. “SupWSD: a flexible toolkit for supervised word sense disambiguation,” in Proceedings of the 2017 EMNLP System Demonstrations, 103-8.

 

Piao, Scott., Fraser Dallachy, Alistair Brown, Jane Demmen, Steve Wattam, Phillip Durkin, James McCracken, Paul Rayson, & Marc Alexander. 2017. A time-sensitive historical thesaurus-based semantic tagger for deep semantic annotation. Computer Speech & Language, 46: 113–35.

 

Princeton University. 2010. About WordNet. Available at: https://wordnet.princeton.edu/. Accessed 10/12/2024.

 

Pütz, Martin, Reif, Monika, and Justyna A. Robinson (eds.). 2014. Cognitive Sociolinguistics. Amsterdam: John Benjamins.

 

Robinson, Justyna A. 2010. Awesome insights into semantic variation. In Dirk Geeraerts, Gitte Kristiansen, & Yves Piersman (eds.), Advances in Cognitive Sociolinguistics, 85-109. Berlin: Mouton de Gruyter.

 

Robinson, Justyna A. 2012. A gay paper: Why should sociolinguistics bother with semantics? English Today, 28: 38-54.

 

Robinson, Justyna A. & Julie Weeds. 2022. Cognitive sociolinguistic variation in the Old Bailey Voices Corpus: The case for a new concept-led framework. Transactions of the Philological Society, 120, 399-426.

 

Robinson, Justyna A., Rhys J. Sandow, & Roberta Piazza. 2023. Introducing the keyconcept approach to the analysis of language: The case of regulation in Covid-19 diaries. Frontiers in Artificial Intelligence, 6, doi: https://doi.org/10.3389/frai.2023.1176283.

 

Sylvester, Louise., Imogen Marcus, & Richard Ingham. 2017. A bilingual thesaurus of everyday life in Medieval England: Some issues at the interface of semantics and lexicography. International Journal of Lexicography, 30: 309–21.

 

Sylvester, Louise., Megan Tiddeman, and Richard Ingham. 2020. An analysis of French borrowings at the hypernymic and hyponymic levels of Middle English. Lexis: Journal of English Lexicography, 16, doi: 10.4000/lexis.4841.

 

Sylvester, Louise, Megan Tiddeman, and Richard Ingham. 2022. Semantic Shift in Middle English: Farming and Trade as Test Cases. Transactions of the Philological Society120: 427–46.

 

Sylvester, Louise and Megan Tiddeman. 2024. Lexicalization, polysemy and loanwords in anger: A comparison with non-affective domains in Middle English”, Lexis 3, doi: https://doi.org/10.4000/12ize

 

Thaler, Richard H. and, Sunstein Cass R. 2008. Nudge: Improving Decisions About Health, Wealth, and Happiness. New Haven: Yale University Press.

 

Thébaud, Sarah., Sabino Kornrich, & Leah Ruppanner. 2021. Good housekeeping, great expectations: Gender and housework norms. Sociological Methods & Research, 50: 1186-214.

 

Trousdale, Graeme. 2008. Constructions in grammaticalization and lexicalization: Evidence from the history of a composite predicate construction in English. In Graeme Trousdale & Nikolas Gisborne (eds.), Constructional Approaches to English Grammar, 33-67. Berlin: Mouton.

 

Vogelsanger, Johanna. 2024. Obsolescence and innovation in the Middle English religious lexicon. To appear in Transactions of the Philological Society. Advance Online Publication, https://doi.org/10.1111/1467-968X.12310.

 

WordNet. 2010. WordNet 3.0. Available online at: http://wordnet.princeton.edu (accessed February 27, 2023).