Semantic search algorithm, behaviorism and fairy-tale Snowwhite with the seven dwarfs. Would SEO behave like Grumpy?


How does semantic search work? Which are the implications regarding SEO tactics and users/customers’ behaviors?

Google search is not unlike the “Mirror, mirror on the wall, who’s the fairest of them all?” where the question asked, reveals (in the fairy tale) the Evil Queen’s narcissistic obsession

, what a great metaphor to explain how semantic search works! (see Google Search and the Racial Bias).

I will take the assist from David Amerland to help me to better understand how the SEO world (something still unknown from me) as well as remembering childhood times with the fair tale “Snow White and the seven dwarfs“.

So, let’s have a look at the characters of the famous fairy-tale:

The mirror is the result of the search engine. According to what I’ve understood about semantic search, the mirror reflects back a result that is contextualize accordingly to the user and his/her relationships among the social networks as well as thorough the analysis of past behaviours.

Snow White is the most beautiful creature in the WEB forest. She publishes smart content as well as she establishes such trusted relationships in the social medias so that the mirror (the semantic search engine) reflects back a beautiful princess… accordingly to the algorithm I would say.

The evil queen is the bad guy, attempting to be viewed as the most beautiful in the WEB forest while it is not. The evil queen struggles and suffers a lot for that, since the mirror suggest always Snow White as the best result… the life in the digital jungle is not so easy for the evil queen!

The poisoned apple represents a trick, a negative SEO attack where the objective either is to game the search engine (the mirror) or to compromise the reputation of Snow White. Fake reviews, negative or positive SEO tactics, are just an example of how an apple could be poisoned in order to kill digitally a competitor and game the search engine algorithm (see the case of Tripdavisor).

The seven dwarfs are data scientists and SEO experts that are mining the WEB forest in order to get some valuable and reliable information from the WEB. Usually they are well-intentioned and thus willing to protect the beauty of Snow White from negative SEO (the poisoned apple and the evil princess).

The charming Prince represents all the users, companies and individuals, that go deeper and deeper into the WEB forest in order to discover the truth. Mirror’s result apart: Who is really the fairest in the WEB forest?Encountering few smart dwarf might be useful for the charming Prince, both in the forest to discover the beauty of Snow White and in the WEB to find out great contents and reputations accordingly to personal impressions rather than only relying on algorithms.

…so, which is the moral of the fairy-tail “Snow White and the seven dwarfs” applied to the modern semantic search and SEO?

An interesting point has been pointed out by D. Amerland in his article “How semantic search is changing end-user behaviour“. In particular:

The fact remains that the web is changing, search has changed and the way we operate as individuals, as well as marketers, has changed with it.

Since the semantic search is so powerful to influence the behaviour of the end-user (individuals, companies,…), the point is: what kind of algorithm there is behind the mirror on the wall? Which are the criteria behind the result that identify the fairest princess in the WEB?

More interesting doubt: what happen if the criteria behind the search algorithm (the mirror) change so that the fairest in the WEB would be Grumpy, one of the seven dwarfs? Would all the end-user and SEO really want to become and behave like Grumpy?

seo_mirror_on_the_wall

Advertisements

My Issue with BigData Sentiment Bubble: Sorry, Which Is the Variance of the Noise? (NON Verbal Communication)


Why sentiment analysis is so hard? How to interpret the word “Crush” in a tweet? Crush as in “being in love” or Crush as in “I will crush you”? According to Albert Mehrabian communication model and statistics, I would say that on average a tweet for a sentimenter has an accuracy of 7%. No such a big deal, isn’t it?

Let’s think about it by considering, as an example, the case of the sentiment analysis described in My issues with Big Data: Sentiment: crush as in “being in love” (positive) or crush as in “I will crush you” (negative)?

What is a sentimenter? As a process, is a tool that from an input (tweets) produce an outupt like “the sentiment is positive” or “the sentiment is negative“. Many sentimenters are even supposed to estimate how much the mood is positive or negative: cool!

Paraverbal and non-verbal communication

Anyhow, according to Albert Mehrabian the information transmitted in a communication process is 7% verbal, 38% paraverbal (tone of the voice) and the remaining 55% is non-verbal communication (facial expressions, gestures, posture,..).

In a Tweet, as well in a SMS or e-mail, neither paraverbal nor non-verbal communication are transmitted. Therefore, from a single tweet is possible to extract only the 7% of the information available: the text (verbal communication).

So, what about the paraverbal and non verbal communication? During a real life conversation, they play a key role since they count for 93% of all the message. Moreover, since paraverbal and non verbal messages are strictly connected with emotions, they are exactly what we need: sentiments!

Emotions are also transmitted and expressed though words such as “crush” in the example mentioned. However, within a communication process, not always the verbal and non-verbal are consistent. That’s the case when we talk with a friend, he\she saiys that everything is ok while we perceive, more or less consciously, something different from his\her tone or expressions. Thus we might ask: are you really sure that everything is ok? As a golden role, also for every day life, I would recommend to use non-verlbal signals as an opportunity to make questions rather than inferring mislead answers (see also: A good picture for Acceptance: feel the divergences & think how to deal with).

For these reason, the non-verbal messages are a kind of noise that interferes with verbal communication. In a tweet, it is a noise that interferes with the text. Such a noise can be as much disturbing as much the transmitter and the receiver are sensitive to the non-verbal communication. It might be so much disturbing to change completely the meaning of the message received.

Statistic and Information Theory

From a statistic point of view the noise might be significantly reduced by collecting more samples. In Twitter, a tweet is one sample and each tweet have 7% of available information (text) and 93% of noise (non verbal communication) that is the unknown information.

From a prediction\estimation point of view no noise means no errors.

Thus, thanks to BigData, if the sentimenter analyzes all the tweets theoretically it’s possible to reduce the noise to zero and thus having no prediction error about sentiments…...WRONG!!!

Even if the sentimenter is able to provide a result by analyzing all the BigData tweets (see Statistical Truisms in the Age of Big Data Features):

the final error in our predictive models is likely to be irreducible beyond a certain threshold: this is the intrinsic sample variance“.

The variance is an estimation of how much samples are different each others. In the case of a communication process, that means how much emotions are changeable through time. Just for fun, next time, try to talk to a friend by changing randomly your mood happy, sad, angry,..and see what happen with him\her (just in case, before fighting tell him\her that is part of an experiment that you’ve read in this post).

In Twitter, the variance of the samples is an estimation about how much differently emotions are impacting the use of certain words in a tweet, from person to person at a specific time. Or, similarly, by considering one person, how much emotions are impacting the use of words differently through time.

Like in a funnel (see picture), the sentimenter can eliminate the noise and thus reduce the size of the tweet bubbles (the higher the bubble the higher the noise) till a fixed limit that depends on the quality of the sample: its variance.

Sentimenter_Twitter_Funnel

So, I have a question for bigdata sentimenters: which is the sample variance of tweets due to non-verbal communication? Acknowledge the sample variance, the error of prediction of the best sentimenter ever is also given:

error of prediction (size of the bubble sentiment) = sample variance of tweets…

…with the assumption that both samples and algorithm used by the sentimenter are not slanted\biased. If this is not the case, the sentiment bigdata bubble might be even larger and the prediction less reliable. Anyhow, that is another story, another issue for BigData sentimenters (coming soon, here in this blog. Stay tuned!).

Feelink – Feel & Think approach for doing life!