One of our priorities at the Concept Analytics Lab is to utilise computational approaches in innovative explorations of linguistics. In this post I explore the disciplinary origins of computer science and linguistics and present areas in which computational methodologies can make meaningful contributions to linguistic studies.
The documented origins of computer science and linguistics place the fields in different spheres. Computer science developed out of mathematics and mechanics. Many of the forefathers of the field, the likes of Charles Babbage and Ada Lovelace, were first and foremost mathematicians. Major players in the development of the field in the twentieth century were often mathematicians and engineers, such as John von Neumann and Claude Shannon. On the other hand, linguistics developed out of philology and traditionally took a comparative and historical outlook. It was not until the early 20th century and the work of philosophers such as Ferdinand de Saussure, when major explorations into synchronic linguistics began.
The distinct origins of computer science and linguistics are still visible in academia today. For example, in the UK and other western universities, computer science is situated in STEM subjects, and linguistics often finds a home with humanities and social sciences. The different academic homes given to linguistics and computer science often poses a structural barrier to interdisciplinary study and creation of synergies between the two disciplines.
Recent research shows that the merging of linguistic knowledge with computer science has clear applications for the field of computer science. For example, the language model BERT (Devlin et al., 2018) has been used by Google Search to process almost every English-based query since late 2020. But we are only just beginning to take advantage of computational techniques in linguistic research. Natural language processing harnesses the power of computers and neural networks to swiftly process and analyse large amounts of texts. This analysis complements traditional linguistic approaches that involve close reading of texts, such as narrative analysis of language, discourse analysis, and lexical semantic analysis.
One particularly impressive application of computational linguistics in the analysis of semantic relations is the word2vec model (Mikolov et al., 2013). word2vec converts words into numerical vectors and positions them across vector space. This process involves grouping semantically and syntactically similar words and distancing semantically and syntactically different. Through this process corpora consisting of millions of words can be analysed to identify semantic relations within hours. However, this information, as rich as it is, still needs to be meaningfully interpreted. This is where the expertise of a linguist comes in. For instance, word2vec may identify pairs of vectors between which the distance increased across different time periods. As linguists, we can infer that the words these vectors represent must have changed semantically or syntactically over time. We can rely on knowledge from historians and historical linguists to offer explanations as to why that change has occurred. We may notice further that similar changes occurred amongst only one part of speech, or note that the change first occurred in language of a particular author or a group of writers. In this way, the two fields of computer science and linguistics necessarily rely on each other for efficient, robust, and insightful research.
At the Concept Analytics Lab, we promote the use of computational and NLP methods in linguistic research, exploring benefits brought by the convergence of scientific and philological approaches to linguistics.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018) ‘Bert: Pre-training of deep bidirectional transformers for language understanding.’ Available at https://arxiv.org/abs/1810.04805
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. (2013) ‘Efficient estimation of word representations in vector space.’ Available at https://arxiv.org/abs/1301.3781
We identify conceptual patterns and change in human thought through a combination of distant text reading and corpus linguistics techniques.