The Macroscope is a powerful tool built to interpret historical structure of language. It is designed to provide comprehensive analyses of historical meaning with the click of a button. This enables quick insights for a curious explorer, requiring little to no knowledge of linguistics. At the same time, the function to download the raw data of any figure caters to researchers, content creators or specialists who don't want to design their own solutions from scratch.
We offer a suite of eight analytical packages, each explaining a different piece of the puzzle. The analyses comes with predefined settings to work out of the box. Where possible, the cogs are exposed to the user allowing for the finetuning of parameters.
It is recommended to try analysing a few words with the default settings. Go over your results and then tweak the settings where necessary. Many of the settings (e.g. the similarity thresholds of networks) will have optimal values on a case by case basis. A setting that works for the word risk may not be ideal for a less frequent word like bitter. In this case, try adjusting the parameters to get the insights you are looking for.
- Open the settings panel, using the button in the top right. Enable the analyses you are interested in. It is advised to start with only a few analyses at first. Enabling everything at one is possible, but will increase your processing time.
- Switch to the results panel. Input a word into the search bar and fire up the Macroscope.
- Your request will travel to our server cluster, where our team of trained hamsters processes your analyses one by one. Your website will start displaying the results as soon as they come in.
- Once all the processing is done, the search bar will appear again. Adjust your settings or search a new word.
The Macroscope provides researchers with the ability to examine two distinct but related aspects of linguistic change in individual words over historical time as shown in the conceptual framework above. First, diachronic word embeddings computed from the co-occurrence matrix enable us to discover words that are semantically similar to a given word for a given year (i.e., the semantic or synonym structure surrounding a word). These semantically related words are referred to as synonyms (top half of Figure 2). Second, the co-occurrence matrix provides information regarding the context of a given word at a given year. Words that co-occur with the target word are referred to as context words (bottom half of Figure 2).
On top of being able to "focus" the Macroscope on the semantics and contextual structure of an individual word in a particular year, the true power of the Macroscope is harnessed when the researcher "zooms" out to obtain a bird's eye view of changes in the semantic and contextual structure of words over historical time. The Macroscope can be used to examine the semantic (synonym) and contextual (co-occurrence) structure of individual words for a specific year (i.e., zooming in) and over historical time (i.e., zooming out).
1. Identify Synonyms
We constructed the Google Ngram Corpus into a large co-occurrence matrix. That means each word is represented by a long vector that contains co-occurrence information with all other words in the vocabulary list of 50,000 words. By transforming count-based co-occurrence to pointwise mutual information (PMI) and regularizing data using singular value decomposition, we constructed diachronic word vectors across 200 years.
We defined synonyms as words used in similar context. By computing cosine similarity between two word vectors (essentially comparing contexts of two words), we can quantify semantic similarity on the scale from 0 (not similar at all) to 1 (identical). Identifying synonyms of a target word is simply finding out words whose vectors are most similar to the vector representation of the target word.
2. Semantic Structure
This analysis helps you to understand to what extent two words are semantically similar to each other.
Figure on the left (a) shows semantic structure of happy and sad. Each nodes represents happy and sad and their top-4 synonyms. Link suggests similarity between two words exceed certain threshold (default value is set equal to 0.6). Despite being antonyms, happy and sad are conceptually associated and therefore often appears in similar context. Therefore, the semantic similarity inferred from contextual structure would suggest they are semantically similar. But, as network suggests, their synonyms are not linked with each other and therefore happy and sad are strongly associated but not semantically similar.
One example of two words that are truly semantically similar to each other is gay and lesbian, also show in the figure on the left (b).
3. Contextual Structure
You can visualize contextual structure of a word in a given year. Contextual structure reveals both number of sense and contextual diversity. These two concepts partially overlap but are not identical. For example, the network on the right shows although the word gay was used in the sense of homosexuality consistently across contexts in year 2000,it appeared in a number of distinctive contexts, including homosexuality, political movement on gender equality, association with HIV, and academic interests in gender study. Try to set year 1850to see how contextual structure ofgay change dramatically.
Edges (links): A link between two nodes suggests the pointwise mutual information (PMI) value between the two words isgreater than the selected threshold (we set the default value equal to 3).
Color: The colors represent the community structure of nodes in the network and each community is represented with a different color. Communities are sub-groupings of nodes that are more likely to be connected to each other than to other nodes within the network.
Size: The size of nodes maps tothe co-occurrence frequency between that word and the target word (in this example, gay).The size of the target word is manually set tobe large enough tobe visible.
Note: Community structures of the network are detected using an algorithm introduced by Blondel et al (2008) based on modularity optimization that uses an iterative process which defines each node as a community at the first step and merges them until modularity (a measure of the strength of the communities) is optimized.
4. Drift in Semantic Space
You can visualize semantic drift of a word over a specified historical period. The example on the left shows how the word gay changed its meaning from 1850 to 2000. The semantic space of gay is defined by its top-k (k=15 by default) synonyms in 1850 and 2000. The longer the path indicates greater semantic change.
Tip: In case the annotations overlap with one and another, you can drag to separate them.
5a. Change of Contextual Structures
You can visualize how a selection of words co-occurs with the target word over the past 200 years separately. Alternatively, you can also aggregate co-occurrence frequencies with the target word across a selection of words. Simply click the aggregate button to switch between two modes of co-occurrence frequency.
This function can be used in conjunction with the contextual structure analysis . Contextual structure analysis provides a list of words that defines (one of) meanings of the target word. Tracking aggregated co-occurrence between those context words and the target word provides insights on when and to what extent does meaning changed in history.
Figure on the left show historical aggregated co-occurrence between gay and its context words of year 1850 and year 2000 separately.
5b. Change of Contextual Structures
Another way to think about change of word meaning is by looking at which of its context words have been losing/gaining association between two time points. Figure on the left shows the the 10 context words whose co-occurrence with gay increased (in blue bars) or decreased (in red bars) the most between year 1940 and 2000. Please note that the axes on the increase domain and decrease domain are scaled independently so that two domains occupy the same amount of space in canvas.
Try another word and different time frame to find out which context words changed the most.
A cognitive psychologist with a passion for software development. I create data solutions to answer specific questions about human psychology. These range from improving pilot communication, to designing new GDP metrics or describing historical language use patterns. My recent publication (Humor norms for 4,997 English words) has been picked up by over 50 news outlets world wide.
personwebsite email@example.com Click me
My research broadly involves using behavioral and computational methods to investigate the cognitive processes and mechanisms that support lexical processing. In particular I apply the suite of tools offered by network science to study the structure of the mental lexicon and see how that affects the way we produce speech, understand language and learn new words. I am also interested in applying network science approaches to understand other areas of the psychological and cognitive sciences.
I am interested in quantitative approaches to language, wellbeing, memory, and decision making. My work involves using 'big data' to understand psychological change over cultural time; understanding language learning using network analysis; computational modelling of memory representations and age-related cognitive decline; and information search in decision making.