Author Archives: justyna.robinson@gmail.com

How does concept-led research matter?

How does concept-led research matter? 

How does concept-led research matter?

by Caitlin Hogan

 

Our Concept Analytics Lab (CAL) team LOVES concepts. In our daily work, we keep seeing the value of the concept-based view of language in bringing insight to thinking, attitudes, and behaviours of people. But how important is the concept-based research for a wider linguistic community? Can concept-based research impact other disciplines and industries? Can you commercialise your concept-based knowledge?

With the aim of consolidating research and application of concept-based approaches to text analysis we gathered experts in the field for the first Concept Quest conference.

 

The event Concept Quest: Navigating Ideas on and Through Linguistic Concepts took place in March 20204 at the University of Sussex. It focussed on the work of CAL and other researchers from a range of academic disciplines. We hosted talks and panels from scholars studying everything from AI concepts to the impact of trade deals on the economy and commercialising concepts in the process of wine production.

 

Justyna Robinson, the Director of the Concept Analytics Lab, started by talking about the aims and advantages of concept mining as a methodology. Concepts are not encapsulated by a single word but are be observable by a set of words, phrases and/or constructions. This allows us to understand how individual terms might be used differently over time, and how they may come to represent different concepts. CAL’s researcher Rhys Sandow then discussed how one can visualise conceptual ontologies and showed how one can turn complex sets of lexical relations into clear diagrammatic representations. Such representations can shed light on conceptual, including socio-conceptual, differences that are inaccessible to more traditional approaches to the analysis of large texts.

Following this, Louise Sylvester (Westminster) talked about how concepts can be incorporated into studies of Medieval English. Her work focuses on the adoption of terms from French into English during this period, and through the use of a semantic hierarchy, she is able to inspect in which cases French pushed out the English variant, and in which cases this did not occur. The use of concepts allows us to see the patterns that emerge in synonym relationships, even from long ago.

 

Haim Dubossarsky (QMUL) approached the study of concepts from a computational angle, discussing the ways in which we currently carry out computational and corpus linguistics, such as collocations, and how we can improve on these methods. Through the projection of a word’s usage onto a series of vectors, one is able to map the meanings of the word and their change over time. This technique provides a computational boost to the analysis of meaning and represents an important link between the world of linguistics and that of computer science that the Concept Analytics Lab covets.

 

The talks on theoretical and methodological aspects of doing concept research were complemented by talks addressing applications of concepts in archival work and in commercial endeavours. 

 

Piotr Nagórka (Warsaw’s Cultural Terminology Lab) discussed the exploration of communications systems and terminological sciences. He probed how the terminology we use to refer to types of wine maps onto production process itself. In this case, for wine. His work shows how one might commercialise concept research by marrying the study of concepts with processes and techniques within the manufacturing sciences.

Angela Bachini and Kirsty Patrick, who work on the Mass Observation project helped us understand how archivists arrive at identifying important concepts in indexing of a new text. We learned a great deal from the Mass Observation team about their workflow and how we as researchers can best help archivist to automate indexing via key-concept detection.

The event finished with a panel discussion on why concepts matter led by Lynne Murphy (Sussex), in which Piotr Nagorka, Kirsty Pattrick, were joined by Julie Weeds (Sussex AI) and Alan Winters (Sussex, CITP).  Alan reflected on the value of concepts in trade analysis, particularly to understand the trade-offs that people are willing to make with regard to global trade. These kind of complex attitudes are difficult to access with other methods, particularly the quantitative methods often used in economics. The advantage of concept analysis, where participants can describe their accounts in rich detail which can then be computationally analysed, is clear in this case. Louise Sylvester added that in her work on Medieval English, concepts help us understand how people living in that era made sense of the world and what categories were meaningful for them. This helps greatly with noticing patterns of use in historical linguistics, and also helps us to understand how the concept of something like a farm has changed from the middle ages to the present day.

 

We continued chatting over some delicious wine (thanks to a generous sponsorship from Mass Observation) and made new connections across institutions and fields.  This is exactly the kind of result we envisage from a successful colloquium, and we were proud to have hosted such a stimulating day. Our gratitude extends to all the wonderful speakers and attendees for making this event so brilliant!

 

To conclude our reflections, the Concept Quest highlighted the value of concept-based and concept-led research and applications. Researching concepts matters for theory of language and knowledge representation as we consider conceptual hierarchies, lexicalised and non-lexicalised concepts, and emergence of new concepts/ideas. At a methodological level, concepts pose a challenge for traditional word-based corpus and NLP techniques. Therefore, new ways of extracting conceptual information from big data is needed.  At a more applied level, empirical ways of gaining access to conceptual information are invaluable for other sectors and disciplines which use large text data. Thus, strengthening objectivity and replicability of concept research will open up this research for other sectors which seek more expert analyses.  That development can also lead to impactful research and even commercialisation of conceptual research.

 

Please get in touch here to find out which key concepts and themes are revealed in your data. 

 

References

Robinson, J. A., Sandow, R. J., & Piazza, R. (2023). Introducing the keyconcept approach to the analysis of language: the case of regulation in COVID-19 diaries. Frontiers in artificial intelligence, 6.

Nagórka, P. (2021). Madeira, Port, Sherry. The Equinox Companion to Fortified Wines. Equinox Publishing Limited.

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.

Blog

Identifying key content from surveys

How can you identify key content from surveys?

by Rhys Sandow and Justyna Robinson

 

A case of responses to the Labour Party’s 2023 Trade Policy Forum

 

Surveys which collect responses to open questions are a popular and valuable way of gauging peoples’ attitudes. But they also present specific challenges for keyness analysis in corpus linguistics as the results can be misleading. For example, a high frequency of term X may be skewed by one or two documents within the corpus, rather than being representative of attitudes among the survey respondents more broadly. In such cases, traditional corpus linguistic measures of difference, such as relative frequencies or keyness are not appropriate. In such cases, we advocate for the use of measures of dispersion across a corpus, such as Average Reduced Frequency (ARF) and Document Frequency (DOCF).  This distinction between frequency and dispersion is critical to develop meaningful insights into large data sets, particularly in the context of policy consultation where an understanding of plurality and consensus is highly important.

 

Let us demonstrate how to solve this problem on the basis of examples from data we recently analysed.  Concept Analytics Lab (CAL) was tasked by the UK Trade Policy Observatory (UKTPO) to analyse responses to the Labour Party’s Trade Policy Forum in the build-up to the Labour Party’s annual conference in October 2023. The survey gathered 302 answers to seven questions comprising c. 250,000 words of data. Many of the submissions came from groups with very particular interests, such as specific industries or specific local communities. Therefore, some responses contained detailed discussions of issues critically important to the submitter, but not necessarily widespread among all respondents. For example, when running a keyword analysis, the eighth most key word (with the ententen21 corpus as our baseline) was gpi (genuine progress indicator) with 35 hits across the corpus. However, upon closer inspection, these hits are spread across only 2 of the 302 responses. Thus, while gpi has a high keyness score, it cannot be said that it is a salient topic across the corpus as its use is so highly concentrated across 0.66% of documents.

In order to remedy this limitation of keyness analysis, we considered the spread of terms across the corpus using Sketch Engine’s Average Reduced Frequency (ARF) statistic. ARF is a modified frequency measure that prevents results being skewed by a specific part, or a small number of parts, of a corpus (for more detail on the mathematics behind the measure, see here). Where the ARF and absolute frequency are similar, this suggests a relatively even distribution of a given term across a corpus. However, when there are large discrepancies between the absolute frequency and ARF, this is indicative of a skew towards a small subset of the corpus. For example, while the absolute frequency of gpi in the corpus is 35, the ARF is 2.7 (DOCF, 2), highlighting its lack of dispersion. Similarly, the term gender-just has an absolute frequency of 19 but an ARF of 1.32 (DOCF, 1), highlighting that this term is not characteristic of the data set as a whole, but is highly salient within a small subset of the corpus. By contrast, labour, with an absolute frequency of 1, 434 had an ARF of 725.74 (DOCF, 226), highlighting its spread across the corpus.

When analysing corpus data, methodological decisions can have highly impactful repercussions for the analysis. For example, let’s take the top 10 key multi-word terms from the Labour Party Policy Forum data set ordered by keyness score (see Table 1) and compare it with the top 10 multi-word terms ordered by the highest ARF statistic (see Table 2).

Table 1: The top multi-word terms, ordered by keyness score
 
Table 2: The top multi-word terms, ordered by ARF
 

This analysis highlights, in particular, two obvious outliers, namely ‘human rights defender’ and ‘modern slavery’. The low DOCF and ARF scores highlight that they are highly concentrated within a small number of submissions and, so, are not characteristic of the data set more broadly. 

While no multi-word term occurs in the majority of documents, table 2 provides a perspective on the most broadly dispersed multi-word terms.  It is important to note the substantial overlap between the two measurements in tables 1 and 2, e.g. ‘trade policy’, ‘trade deal’, ‘trade agreement’, ‘international trade’, and ‘labour government’, appear in both. However, the advantage of the ARF ordered data is that there are no clear outliers, skewed by individual, or a very small number of, responses. This means that it is the second data which provides a more valid overview of the content of the data set.

Using a traditional approach to keyness analysis, conclusions may recommend interventions around trade and human rights defenders or modern slavery. However, an analysis of ARF highlights that this is misleading and does not get to the essence of the data set. What is more, policy recommendations based on the former statistic only may result in the disproportionate influence of those who lobby in relation to very specific terms at the expense of more widespread priorities and concerns.

 

This ARF analysis formed part of our analysis of the 2023 Labour Party’s Policy Forum that we conducted for the UKTPO, which can be accessed here.

 

If you are interested in our data analysis services or partnering with us in any way, please contact us here

 

References

Labour Policy Forum (2023). National Policy Forum Consultation 2023. Britain in the World..

Gasiorek, M and Justyna Robinson. (2023) What can be learnt from the Labour Party’s consultation on Trade? UKPTO Blog. 

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.

Blog

Survey of English Usage zooms on concepts

Survey of English Usage zooms on Covid-19 concepts

by Caitlin Hogan

 

Lab leader Dr Justyna Robinson gave a talk at University College London (UCL) as part of the Survey of English Usage Seminar Series about the work of the Concept Analytics Lab. Her talk covered a wide range of issues in the realm of concept analytics, including how to draw out concepts from written accounts via the Mass Observation Archive dataset. She focussed in particular on the role of concept change during the COVID-19 pandemic, when lifestyle changes forced people to adapt their routine, and thus the concepts they mention in their daily accounts to shift, in some cases drastically. 

 

The Mass Observation Archive began in 1937 founded by Tom Harrisson, Charles Madge and Humphrey Jennings, and its original tenure ran until the 1960’s, at which point it became defunct. Originally inspired by the founders’ desire to capture public opinion on the abdication of King Edward VIII, by 1939 the project aimed to have ordinary people record the day-to-day experiences of their lives, and nearly 500 did. This creates an invaluable documentation of peoples’ habits, lives, and thoughts, acting almost as a time capsule. In 1981, it was revived at the University of Sussex and continues to collect qualitative accounts of ordinary peoples’ lives and opinions to this day. Every 12th of May (chosen as it was the anniversary of the coronation of King George VI), the project calls for anyone to submit a record of their activity on that day, in honour of the original 1937 call going out on that same day.  The 12th May diaries collected  during COVID-19 pandemic were digitised by a grant provided by the Wellcome Trust. Digitised diaries from the first lockdown in the UK, i.e. 12th May 2020, were the focus of Justyna’s talk. 

 

Justyna discussed how records of ordinary peoples’ activities during lockdown marked a shift towards concepts such as REGULATION, which may be expected, but also the discussion of furniture, given the struggles we all had to adapt to working from home.  Excerpts from the diaries on this theme include the following examples:

 

  • most of the online activities I could cast from my phone to the TV or could be done on my phone, which was vital during the early stages of lockdown, as XXXX was using the home laptop to work remotely, until he received a laptop through work
  • I’m working from home and the work PC is on an old computer desk so giving me a 2foot space to work in. 
  • I can also stretch and do yoga during my working day and sit at a desk that is the right size for me- I am very petite and used to feel uncomfortable in the chairs in meeting rooms, designed for men. 

 

As these examples show, participants mention the struggles of accommodating working from home with limited resources in terms of space and furniture for use while working, and the struggles coexisting while some household members work, and others use furniture for other purposes. The examples illustrate clearly that we can talk about the same concept without using the exact same words, so this commonality would be lost if we only used simple corpus linguistic techniques in this analysis. As explained in the Robinson et al (2023), terms like restriction, freeze, coordination, and clampdown emerged while talking about regulations in the COVID-19 pandemic but were not exactly the word regulation itself. Linking these lexemes together allows a clearer picture to emerge of what topics participants wrote in their diaries. The insight into which concepts participants found important during lockdown would not have been detectable without concept analysis,  and especially invoking the notion of a keyconcept (Robinson et al, 2023),

 

 

As the lab continues to refine tools for concept analysis, talks such as this one is key to spread the word to new and emerging scholars about the role of concepts when surveying English usage. 

 

References

Robinson J.A., Sandow R.J. and Piazza R. (2023) Introducing the keyconcept approach to the analysis of language: the case of REGULATION in COVID-19 diaries. Front. Artif. Intell. 6:1176283. doi: 10.3389/frai.2023.1176283 

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.

Blog

Concept Quest Event, 11th March 2024​

Concept Quest Event, 11th March 2024

by Caitlin Hogan

Concept Quest: Navigating ideas on and through linguistic concepts

Our lab will be part of an exciting event in collaboration with the University of Sussex Digital Humanities Lab and the Mass Observation project. Our session will cover our work on concept analysis through some of our recent projects. The team is excited to attend and present at such a thought-provoking gathering!

 

Be sure to check back here after the event for another blog post and photos! 

Register for the event here:

https://www.ticketsource.co.uk/shl-events-ticket/t-yamopvl

 

 

About Us

We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.

Blog