Clif High talks to Language Magazine about his work
LM: Where does your interest in language and/or linguistics originate? And can you explain how it correlates with your love of mathematics?
CH: My interest in language stems from my schooling in the polyglot environment of the occupation forces in Europe following the war with Germany. As with all military dependents, exposure to multiplicity of language is part of the experience. Language, for me, matured in its fascination due to personal experience with mind expanding drugs that had a direct effect on the language centre of my brain. Further, the elucidation of the rich depth of purpose of language in all forms that came from those altered state experiences was later reinforced by decades working as a software engineer and computer/network systems programmer in which the human to computer interface is, of course, built of words. The attraction of mathematics has always been, at its core, a love of the practical proofs of reality provided to the astute and prepared mind in the chaos of this universe.
LM: Can you briefly explain how language analysis (in terms of bots/data mining/language corpora) can help anticipate future events?
CH: Not simply. Concisely, my process works by way of the delta of language expressed in common contexts over time reduced to archetypes and projected forward based on numeric quantifiers for emotional values assigned to words.
LM: Is there a correlation between your work and corpus linguistics?
CH: My work is a specifically focused sub set of corpus linguistics and uses advanced automata of my own design crafted (primarily) in prolog, an artificial intelligence language that is able to re-compile (thus alter its function based on new learning) while in operation.
LM: What languages do you employ in your data searches?
CH: We use C, C++, prolog, lisp, perl, and some small assembly language sub routines in our data searches. The process uses the multibyte character encoding so the human language is determined at the time the word or corpora is encountered....thus to a certain extent, given automated translation (especially useful at the corpora level) from API calls, all the language samples, (*in the main*) are reduced to one of five human languages that we use, with the American expression of English being the most common.
LM: Can it be explained how algorithms are used on the data?
CH: Not easily. There are over 300 operating executables in my system (programs). At its base, I created a lexicon that functions as a link between the words and an array of emotional values that I *think* represents that word in its relationship to time, intensity and duration of impact (on our emotional state) and other emotional quantifiers.
LM: How do you test/measure the validity/accuracy of your findings?
CH: There are spyders (programs that read the net for me) which match the appearance of corpora in MSM (mainstream media) and our forecasts. These are my only continual reference of accuracy. There are a number of readers who have maintained accuracy spread sheets over the years. These may or may not be available, but my focus is to mature, or refine my system via other routes. Even though I am producing work with my system, I am still in my exploration period with this particular corpus linguistics tool set, so, to a certain extent, I am less concerned with accuracy of 'hits', and more focused on perfecting my understanding of how words and humans and time all interact.
LM: It´s easy to see how changes in language could precede changes in behaviour, however how can we bridge the gap between linguistic analysis and natural disasters or events beyond our control?
CH: All humans are psychic. It stems from our living in an energy based universe, as energy beings, who happen to think themselves solid. As energy beings, we are 'standing waves 'that can be considered to be 'antenna'. As antenna, we receive psychic energy floating through our energetic universe as our particular tuning and local consciousness may allow.
These psychic impressions of universe unfolding arrive with the 'template layer' of information that exists prior to actual 'manifestation'. This information is received, but most frequently NOT understood by the recipient. Their conscious mind *must* express this information somehow (just how consciousness works here in matterium), and therefore it 'leaks out' in many ways. One manner is in word choices that are outside the typical range of less than 11,000 words used in any given year by the average (industrialized = cube farmed) human. Thus the events outside our control are not outside our perception, however much any given individual human may or may not be able to express this perception of the near term future. For further, more scientific validation, we have the work of Dr Raiden among others.
LM: Can you see your method being used more widely?
CH: It is being used by corporations nascently now, and I expect a greater level of adoption of the techniques over time. However, the process is expensive, and tedious, and requires years of effort just to get 'settled in'...
LM: And furthermore, can it be taught to other people?
CH: Yes, sure.
LM: or used destructively?
CH: Yes, sure. Thus I keep my techniques close.
LM: How do you feel you are received?
CH: Received? By the thinking public and the powers that be - quite well. My track record speaks for itself. From those who do not possess critical thinking skills, or are dominated by religious constraints on their point of view, my work is received poorly - as was to be expected.
LM: And whose work do you admire?
CH: Buckminster Fuller, Michael Tsarion (great critical thinker, especially these last few years). David Icke.
LM: What are your predictions for the future of language usage?
CH: The future of language will, this year, reach the point of a complexity increase providing new areas of exploitation in communication and understanding (of universe and ourselves), both intra and inter species. By this I mean that language, in all its forms from DNA through to macroscopically burst bloops and dense form glumps, will be actually propelling human civilization changes this year. The data sets that I analyse suggest that from Summer (northern hemisphere) of this year linguists will be frequently commenting on their 'wonder' and 'awe' over the developing nature of language in our solar system.
LM: You mentioned American English, being used in your data searches, what are the other four human languages?
CH: Actually there are over 40 human languages being sampled now. The issue is transliteration.
So we are using English in all its flavours in multibyte expression, all the Cyrillics, Greek, and some Snskrtta variants out to Indonesian. Some, extra mainland, Chinese variants are also thinly represented. This is due to their blocking my access due the Chinese Government maintaining that my site supports 'superstition'. This is beyond hypocritical as I have encountered much evidence that they are also doing as I in the interpretation of language at radical levels.
LM: Can you give us an example of this emotional evaluation of a word or group of words to better understand the process? For instance, what is the value of the word "beautiful" or is it relative to the data sets? Or is such a question even too simplistic to be appropriately applied to the mathematics in question?
CH: It is simplistic. The process reduces everything down to a thick, syrupy mass based on archetypes. Noting that there are thousands of overlapping word usages, obviously, archetypes proceed in a taxonometric fashion reflecting word usage and historical precedent. Also, note that 'beautiful' as a word, is low emotive value. Think of things the other way...all words of high emotive value are in what I have labelled 'immediacy values.' Such words, in and of themselves, expressed within the current context generated by the current milieu, will produce an immediate, emotive reaction. We can demonstrate, without intending offense, in a rather shamanistic fashion, by drawing your attention to the slightly redacted word 'c*nt'.
To all English speaking women, this word is immediately emotive. So you see, as per your question below, it is the 'bleeding edge' of language that has the most emotive power, and this is usually expressed in the area that we call 'slang.'
Such words as 'beautiful' are within the far edge of the shorter term value set. Within my model space they have very little 'propulsion' power.
It is my contention that, due to the powers that be and their stranglehold on the English speaking media, the most powerfully emotive words are overwhelmingly negative, and that this is an engineered result of the efforts of the mainstream media. It is also my contention that this state has now shifted, and that the mainstream media is in the process of failing as an 'institution' such that a new flow of language is emerging globally…
LM: How does the AI program allow for such linguistic factors as in the creative invention of new word/phrase usages and/or the accounting for the complexities of contextual reference of grammatical structure in language phrasing?
CH: Basically the processing kicks out references that are not within its understanding and awaits for human assistance in further categorizing the anomaly. Within a limited range it learns over time, but humans are a shifty lot continually increasing their inventive use of language.
LM: And how do you algorithm into value, for instance, the meaning of satire and sarcasm within the context of pure data of word usage counts? In other words, to what extent does the AI "read" text?
CH: The processing I use does not do word counts. Well, it does, but only to keep track of where it is, not as a part of our end result. Bear in mind, it is not the words being used today that are the real key to my forecasting; rather it is the delta as to why this particular word (outside of typical lexical framework for that conversation) that is of importance. Most of my most accurate forecasts are worked around single word instances that have very low frequency counts. As a case in point, the recent forecast, by weeks ahead of the event, of the 'blondes on boats' incident off Italy. This was worked up off of a single occurrence of the word 'blondes' that was totally outside the normal range for the data set in which it appeared. Make sense? As this word had never until that time shown up in that data set, it was intriguing, and led directly to the forecast, which was later manifest in reality. The count of the word 'blondes' in the total data set of the larger entity structure was within nominal range, but just due to this single instance occurring in an exceptional relationship to other words, I was aware that I was actually viewing a 'future leak'...then the issue became proper interpretation. It is usually here that I screw up.
The word counts are used internally in assaying changing states of words as regards to their existent emotive tagging in our lexicon. Basically, the ebb and flow of words from slang out toward 'old age' for words.
Yes, the prolog program does read text. In no sense does it 'understand' any meanings, but it is perfectly capable of encountering a new word, seeking, via API calls, definitions, and then providing preliminary categorization based on the aggregate of the definitions. It takes a human to assign emotive values within the context.
LM: From what you have described, regarding the future of language usage, this sounds like the makings of quantum linguistics. Do you have any thoughts on the subject?
CH: None that can be let out into the wild at this time.
Readers can decide for themselves what to think of Clif High's work found at halfpasthuman.com
Photo courtesy of © ATOUT FRANCE/Cédric Helsly.