New IT Innovation Development (N.IT.I.D) method Step 5: Select Solutions (Tripadvisor Case Study)


NITID-Select_Solutions

See also:

After brainstorming, simplifying, organizing and evaluating all the ideas, finally it’s time to select the BigData solution in which to invest.

Thanks to the QFD matrix, the key parameters  and competencies that are required for fulfilling the customer needs have been identified.

Moreover, the QFD assesses how the development of a new idea will change the strategic position of the company in the market through a gap analysis.

Anyhow, which idea to choose among all the bigdata IT initiatives?

The question can be addressed by using the Pugh matrix in the table below.

Pugh_Matrix

In particular, in our PUGH matrix of TripAdvisor, all bigdata initiatives are the new functions (needs) to implement and putted into rows: facilitate the process, establish an integrated supply chain, create customer experiences, engage customers and provide statistics and reports.

Meanwhile, all the IT solutions (requirements) are putted into columns: WEB user interface, restaurateur interface, suppliers interface, social networks, visual charts, external company’s interface, data analytics and predictive tools.

In the PUGH matrix a final benchmark is accomplished by assigning for each interception a score that might be -1, +1 or 0, respectively if the solution proposed is worse, better or equal on satisfying the specific need to a solution defined as a reference (market).

The solution to use as a reference for the in the Pugh matrix should be either one of the best solution of the market (e.g.: Booking) or the solution AS-IS that is currently provided to the customer.

The Pugh result is given by the sum of all the scores into rows and it is representative of much innovative and useful solution is.

As an example, according to results in the PUGH matrix, the most innovative solutions are the creation of a social network and the external company interface (EDI).

However, within a strategy plan definition such solutions might be risky due to the investments involved, more IT complexity to manage and agreements with stakeholders.

On the contrary, WEB user interface and Visual Charts, that have the second highest score (3) might be easily implemented and ready to use by the users of TripAdvisor. Thus, these solutions might be developed in order to quickly get a competitive advantage.

Not necessarily, the highest Pugh score is the best solution must be choosen. In this sense, the Pugh matrix is useful, rather than univocally determining one IT initiative, to encourage and stimulate a well-defined strategy with a proper staging.

In particular, in a strategy plan based on Pugh’s results it might be stated that a social network should be created only when the customer engagement has been improved enough thanks to WEB user interface and Visual Charts. By doing so, it is possible to leverage the customer commitment for achieving the network effect when promoting the social network initiative.

Similarly, if becoming a platform for an integrated supply chain is one of the long-term objectives, a proper staging and pace of the related IT initiatives should be defined in a way that fits the company DNA (see A question about IT change management: does the DNA of the company fit your IT vendor?).

For example, if TripAdvisor is a “cautious adopter”, it will better to first implement restaurateur’s interface IT initiative in order to engage them and then involve also suppliers with supplier’s interface.

Furthermore, as an example, specifically for the data analytics and predictive tool IT solutions (requirements), the KPIs to adopt are linked with the key performances evaluated in the QFD matrix through the need “providing statistics and reports” (see Table in New IT Innovation Development (N.IT.I.D) method Step 4: Evaluate Solutions).

In particular, the key parameters are data redundancy, correlation, representatives, etc. as a measure of the quality of the data to gather and collect. In such a way, it will be ensured reliable information and insights by considering all the relevant aspects: information theory, statistic, control theory and psychology (see Caution!!! BigData S.L.I.P.S.: five tips when using analytics).

With the Step 5 (Select Solution), the N.IT.I.D method for developing new business model thanks to BigData is accomplished.

So… Would Tripadvisor adopt new BigData initiatives?

Feelink – Feel & Think approach for doing life!

New IT Innovation Development (N.IT.I.D) method Step 4: Evaluate Solutions (Tripadvisor Case Study)


NITID-Evaluate_Solutions

See also:

After brainstorming new idea and organize them, the next step of the N.IT.I.D. method is to evaluate the solution proposed for the TripAdvsor case study.

The KJ method in the Step 3 (Organize), it has been possible to organize the un-structured brainstormed ideas in a structure way. The next phase is to connect all the ideas grouped with KJ method with the key performances needed in order to satisfy new functionalities.

In order to evaluate the solutions, all the key performances must be measurable. For example, the “facilitate the process” BigData initiation needs a new WEB interface to develop in order to create a list of pending reviews. In this case, some key measurable parameters for the interface should be the delay and inconsistency between the pending reviews shown and the incoming new receipts.

In the Table below are shown all the BigData ideas, IT infrastructures required and the KEY parameters that should be considered respectively as:

  • new needs
  • requirements
  • and Key Performance Indicators (KPIs)

BigData_Initiative - needs, requirements and KPIs

By putting needs into rows and the list of KPIs listed in the Table into columns, it is possible to create the QFD (Quality Function Development) matrix (see Figure below).

BigData Initiative QFD Matrix

In this example it has been assigned a score for each KPI with values X (no relations), 3 (weak relation) or 9 (strong relation) and an importance rating (e.g. from 0 to 9) for each new need accordingly to the priorities and objectives of the company.

For example, the importance rating should be assessed by considering factors such as the customer value curve, opportunity to innovate for getting a competitive advantage or risks (e.g.: disruptive innovations from competitors). Finally, the QFD score for each KPI is given by the sum of the scores intercepted by weighting (multiplying) them accordingly to the importance rating.

The aim of the QFD matrix is to link in a structured way the ideas (needs) with the key performances. Moreover, by assigning a score for each KPIs that measure its importance in order to satisfy the need it is possible to obtain an indicator of relevance for each key performance.

In the example, KPI inconsistencies (fake reviews and Negative SEO) is the most relevant KPI to consider since got the highest score (117).

Meanwhile, according to the importance rating, the most relevant ideas are Engage Customers and Facilitate the Process.

However, are they the ones that should be really implemented by TripAdvisor?

To be continued…

Feelink – Feel & Think approach for doing life!

New IT Innovation Development (N.IT.I.D) method Step 3: Organize (Tripadvisor Case Study)


NITID-Organize

See also:

A new business model for Tripadvisor has been developed thanks to BigData IT Innovations.

After brainstorming (Step 1) and then simplifying (Step 2) the founded new opportunities emerged from such a new business model, the next step for the N.IT.ID method is: Organize.

The purpose of organizing is to obtain a set of ideas and innovations that are structured according to a criteria that will be useful during the remaining two steps 4 and 5: Evaluate and Select.

So, after Step 2- Simplify, on the table there are the following opportunities for Tripadvisor thanks to BigData Innovation:

A) Facilitate the Process

B) Establish an Integrated Supply Chain

C) Create Customer Experiences

D) Engage Customers

E) Provide Statistics and Reports

 The Step 3 – Organize, consist on the following procedure:

  1. unify the remaining ideas into rational association of affinity group;
  2. rank the groups;
  3. group the unified ideas into a rational association affinity group (2nd level group) as it has been done at point 1;
  4. assign to each 2nd level group a general topic (avoid a list of ideas) or a common theme;
  5. identify relations among groups (correlations, dependencies,…).

In details:

1) Unify the remaining ideas into rational association of affinity group

In order to create affinity groups, it is necessary to define a criteria. Let’s assume as a criteria the function in the company (value chain), that would be involved for each idea.

With such a criteria the affinity groups are as follows:

Affinity_Groups

In particular, within the organization of Tripadvisor, A) facilitating the process and B) establishing an integrated supply chain are about its Operations’ activities.

While C) Creating Customers Experiences and D) Engaging customers they are responsibilities under Sales & Marketing department.

Finally, D) Providing Statistics and Reports might be considered as an additional Service (e.g.: after sales).

2) Rank the groups

As it has been done during the previous N.IT.I.D Step 2 – Simplify, it might be useful to further reduce the number of new initiatives by ranking the affinity groups and eliminating those groups that are less important.

For example, each team member should give a vote between 1 to 6 accordingly to its perception of importance (1 = low, 6 = very high) and then keep only the highly scored groups.

In this case, since there are only three groups, it does not make sense eliminate any groups.

Better would be to keep all the three groups for the next N.IT.I.D. – Step: Evaluate.

3) Group the unified ideas into a rational association of affinity group (2nd level groups)

In a similar way at point 1, group again the groups of ideas that are similar according to another criteria.

Using as a second criteria the Stakeholder potentially involved that are Customer, Suppliers (Restaurateurs and Hotels) and Collaborators, here below the 2nd level affinity groups:2nd_Level_GroupsIn particular, Customers & Suppliers are engaged in the idea A) facilitating the process and B) establishing an integrated supply chain.

While, Customers & Collaborators  are involved for C) Creating Customers Experiences and D) Engaging customers.

Finally, D) Providing Statistics and Reports might be considered as an additional Service (e.g.: after sales) for Suppliers.

4) Assign to each 2nd level group a general topic or common theme

As for N.IT.I.D Step 2 – Simplify, define a general topic or common theme that labels each 2nd level groups.

As labels for 2nd level groups just keep the stakeholder involved so that the general topics are:

  • Customers & Suppliers
  • Customers & Collaborators
  • Suppliers

5) Identify relations among groups (correlations, dependencies,…).

The last point is to identify relations among the 2nd level groups. In particular, as it is shown in the Figure below, there are two relations: one that is related to customers and one that is related to suppliers.

Precisely, the customer as a stakeholder create a link between “facilitate the process” and “engage customer” functionalities, while suppliers enable a connection between “integrated supply chain” and “provide statistics” ideas.

Groups_of_Ideas

An example how it looks like…

The Step 3 – Organize is actually the so called KJ Method developed by Jiro Kawakita.

Writing all the ideas into cards and using a whiteboard where to post them, it would be useful in order to have always the view of the “big picture” during the process.

Here below an example of it might look like a N.IT.ID process after Organize (Step 3).

KJ Method

To be continued…

Feelink – Feel & Think approach for doing life!

New IT Innovation Development (N.IT.I.D) method Step 2: Simplify (Tripadvisor Case Study)


NITID-Simplify

See also: New IT Innovation Development (N.IT.I.D) method Step 2: Simplify (Tripadvisor Case Study).

Thanks to the new data available for Tripadvisor, it has been applied the N.IT.I.D. process, and after the first step (Brainstorming), there are on the table the following opportunities:

1) Facilitate the process of the reviews for users by updating a list of “pending reviews” to fill once a new bill in a restaurant\hotel has been recorded.

2) Create “customer experiences” through new visualization charts such as a map of visited restaurants and hotels and\or a social network that connect users together in order to share the experiences.

3) Engage customers by creating synergies with loyalty programs.

4) Provide statistics and reports that might be useful to restaurateurs and hotel managers.

5) Establish an integrated supply chain by providing further additional services to restaurateurs and hotels.

All these ideas required investments, in terms of IT infrastructures, human resources and agreements with stakeholders or third parties. Will such investments have a return?

Hard to say now. Better first to jump at the second Step of the N.IT.I.D. process: Simplify.

A brainstorming session usually produce a redundant, inconsistent and  sometimes irrelevant set of ideas. Thus, a rough simplification could be done as follows:

1. Give to each idea a senentence\image

The idea\image should identify briefly the concept. In this case, simply let’s keep as images for the opportunities mentioned above the words in bold: facilitate the process, create customer experiences, engage customers, provide statistics and reports, establish an integrated supply chain.

2. Remove redundant ideas

After giving a simplified sentence, some ideas might be redundant. In this case, no redundancies emerged and thus all the idea should be kept on the table.

3. Rank each idea

With a set ideas that are unique (no redundancies) a vote among all the team members should rank the opportunities. For example, each member can give to each idea a score between 1 to 6: 1 not relevant at all and 6 extremely relevant.

The sum of the votes than will rank all the opportunities by theirs relevance.

An important recommendation: when ranking, each team member should evaluate ideas with a score between 1 and 6, without providing any justification. The vote should be only about perceptions, intuitions, and feelings. As Edward de Bono would say: “wear the red hat!” (see 6 Thinking Hats).

4. Keep only the most relevant ideas

After ranking all the ideas, the group might decide to keep only the most relevant ones (e.g.: first ten idea). Anyhow, do not throw away the removed ideas. Keep them in a temporary basket because they might be useful during steps 4 and 5 (Evaluate and Select solutions).

In our case, since there are only five items, all the ideas will be kept as important for the next step: Organize.

To be continued…

Feelink – Feel & Think approach for doing life!

New IT Innovation Development (N.IT.I.D) method Step 1: Brain Storming (Tripadvisor Case Study)


 

NITID-BrainStorming

Immagine that a business model described in “New IT Innovation Development NITID: Tripadvisor Case Study” is in place where new data is available for Tripdavisor:

1) the receipt (WHAT)

2) the client user ID (WHO)

3) the GEO position of the restaurant (WHERE)

The purpose of collecting such data is to ensure a better service for customers by ensuring an effective countermeasure against Negative SEO tactics.

Anyhow, is this really all? How such a new data availability might be exploited? This is the aim of the NITID method: find new business opportunities that might arise from data.

Commonly, in a creative process, the first step is brainstorming. By putting all the ideas on the table without any kind of criticism, judgments and filters, here below a list of new business opportunities that a brainstorming session applied to the Tripadvisor case might generate:

1) Facilitate the process of the reviews for users by updating a list of “pending reviews” to fill once a new bill in a restaurant\hotel has been recorded. In such a way the user, when log in to TripAdvisor’s website, doesn’t need to search for the restaurateur to be reviewed since a pending review connected to the restaurant\hotel is available  Furthermore, the number of total reviews will increase since the review process has been simplified for customers.

2) Create “customer experiences” through new visualization charts such as a map of visited restaurants and hotels and\or a social network that connect users together in order to share the experiences.

3) Engage customers by creating synergies with loyalty programs and companies (e.g.: Nectar) and provide gifts to customers who deliver more reviews.

4) Provide statistics and reports that might be useful to restaurateurs and hotel managers: trends, demand forecast, seasonality, sentiment,….

5) Establish an integrated supply chain by providing further additional services to restaurateurs and hotels. In particular, the demand forecasts should be shared with restaurateurs and hotels’ suppliers by recording all the food\goods consumed by linking them with the receipt. For example, a week in a room requires three units of soap or for cooking a steak in a restaurant are needed 100 grams of meat plus 1 mg of salt and 5 grams of olive oil. By using this data it’s possible to establish a logistic platform that might provide a competitive advantage both for buyer (restaurateurs and hotels) and suppliers. In particular, buyers, and especially small entrepreneurs that cannot invest and maintain complex IT infrastructures, might improve theirs inventory turnovers by having a more granular and detailed consumption of material. Meanwhile, local suppliers might have additional information regarding consumptions and level of inventories of theirs clients and thus enable more frequent on time deliveries and\or with a small amount. Such a solution might be useful for enabling an advanced Demand Driven Supply Chain (DDSC). EBay’s already told us with its “same day delivery” business model where small retailers have became a network of a distribuited inventory (see: “EBay expands same-day delivery in local battle with Amazon“).

Maybe, or better for sure, there are many and many issues regarding those new business opportunities:

Why changing and expanding the business model of Tripadvisor beyond core activities?

Why to invest in new relations, IT infrastructure, and skills?

What about the issue of data privacy?

Anyhow, making questions, having doubts and putting barriers are not allowed when brainstorming.

As when dreaming:

Logic will get you from A to B. Imagination will take you everywhere (Albert Einstein).

Feelink – Feel & Think approach for doing life!

TV Audience and Tweets Flow: a great beauty or bigdata SLIP n.1 for marketing communication strategists (statistic)?


TV_Audience and Tweets: a big beauty or bigdata SLIP n.1 (statistic)?

After being awarded as the best foreign language movie (Italy) Academy Awards 20014, The Great Beauty, directed by Paolo Sorrentino, got an outstanding audience last week when it was broadcasted in Italy in TV prime time.

Comments and opinions about the movie apart (I would recommend to see it), providing trends and flows among social medias is getting more frequent every day. Few day ago, it has been posted by the Italian TV Network that transmitted the movie, a “statistic” (here) regarding the Tweet flows with the purpose to explain when twitters’ peaks happened as well as gathering the main influencers.

Accordingly to a third party analysis, twitters’ peaks happened at specific moments: 1) a meaningful sentence by Jep Gambardella, the protagonist, 2) when the Sabrina Ferilli (famous Italian actress) showed up in the movie with all her beauty and 3) at the end of the movie.

Very interesting. However, looking carefully at the charts (see figure above) I have noticed two things:

  1. Twitters’ peaks happen concurrently with a temporary decline of the TV audience (share). Thus, a correlation (negative) between peaks in Twitter and TV Share exists.
  2. The Twitters’ peaks and audiences’ downturns occur with a perfect timing: one each 30 minutes.

Since advertisements’ stops during TV shows, and radio broadcasts as well, are previously defined according to a specific TV time clock…

…well, I am wondering: Is there also a cause-effect relationship between advertisements’ stops during TV programs and the peaks registered in Twitter?

Who knows. An answer should be provided only analyzing data and real facts carefully. For example, why not putting chips in our home that register and transmit also when the refrigerator has been opened to bring something to eat or even when a WC has just been flushed? Other stimulating correlations might be found by gathering such kind of data.

Anyhow, finding correlations it’s quite easy. Just observe what happen. Finding causation relationships is definetely much more tricky (see also BigData S.L.I.P.S. n.1: statistic) since a deep knowledge of what is going to be analyzed is required and it is quite easy to fall into wrong assumptions. In this case, the beauty of human behaviours.

By the way, concerning the connection between Tweets and TV shows, last year Twitter and BBC America have established a partnership for advertising (see Mashable, Twitter Partners With BBC America to Promote Branded Videos).

Maybe it’s just a coincidence… or maybe Twitter and BBC have the information that when people go to the toilette is just for posting a tweet and not beacause of a TV break 😉

Feelink – Feel & Think approach for doing life!

Caution!!! BigData S.L.I.P.S.: five tips when using analytics


BigData_SLIPS

Along my brief research on BigData, I’ve found 5 type of S.L.I.P.S that a data scientist might encounter along the way: Statistic, Learning, Information, Psychology and Sources.

1) Statistic (Left Foot)

Is without any doubt the main and well-known technical aspect. The most common slip concerning statistic is misleading correlation with causation. In other words, discovering correlations among variables doesn’t necessarily imply a cause-effect relation. Mathematically speaking, correlation is a necessary but not sufficient condition for a cause-effect relationship.

(see also K. Borne: Statistical Truisms in the Age of BigData).

2) Learning (Right Foot)

OK, lets assume that a cause-effect relationship exists: which model\algorithm to chose in order to describe the relationship? There are many: ARMA, Kalman’s Filter, Neural Networks, customized,… which one fits best? A model that has been validated with the data available now might be not valid anymore in the future. So, constantly monitoring and measure the error of prediction with the estimated values by the model.

Choosing a model implies making assumptions. In other words, never quit to learn from data and be open to break assumptions otherwise predictions and analysis will be slanted.

3) Information (Right Hand)

Which information is really meaningful? That’s the first point to clarify before implementing a bigdata initiative or any new BI tool for your business.

Another point is misleading information with data. According to information theory, and a well-grounded common sense as well, data are facts while information is an interpretation of facts based upon assumptions (see also the D.A.I. model).

(see also: D. Laney & M. Beyer: BigData Strategy Essentials for Business and IT).

4) Psychology (the Head… of course!)

Have you ever heard about eco-chamber effects and social influence? Well, what happen is that social media might amplify irrational behaviours where individuals (me included) base its decisions, more or less consciously, not only on their knowledge or values but also on the actions of those who act before them.

In particular, whenever dealing with tricky-slippy tools such as bigdata sentiments is better to consider carefully the relevance and impacts of psychology and behaviours. The risk is to gather data that is intrinsically biased (see also My Issue with BigData Sentiments.)

(see also:

D. Amerland: How Semantic Search is changing end-user behaviour

C. Sunstein: Echo Chambers: Bush v. Gore, Impeachment, and Beyond – Princeton University Press

e! Science News: Information technology amplifies irrational group behavior).

5) Sources (Left Hand)

Variety!!! That is one of the three suggested by D. Laney: Volume, Velocity and Variety. Not only choosing the right model is important in order to avoid predictions’ and insights’ biases: what about the reliability of the sources of data that has been used for the analysis? If the data is biased predictions and insights will be biased as well. In particular, any series of data has a variance and a bias that can not be eliminated.

How to mitigate such a risk? By gathering data from different sources and weight them accordingly to its reliability: the variance.

Moreover, as a bigdata scientist and as a consumer as well, never forget positive and negative SEO tactics. There is a social-digital jungle there! (see Tripadvisor: a Case Study to Think Why BigData Variety matters).

Feelink – Feel & Think approach for doing life!

The D.A.I. model to better understand different mindests and cultural values: why social responsibility means higher prices?


Few weeks ago, from a new Twitter follower, I’ve received a direct message with the following question: “Do you spend more money with a brand that you think is socially responsible?”. I felt immediately that it could be either a marketing research or a way to create awareness on something, nothing bad on it whatever it is.

Anyhow, the aim of a question is to gather an information. So which is the information that the question above wants to address? Suddenly came into my mind a principle from information theory: information is an interpretation of data based on assumptions (see figure). Usually assumption are due to culture, mindset and context in general. Think, as an example, how the same gesture of moving the head up and down (data) means yes for Europeans and Westerns but for Indians means exactly the opposite.

information_assumption

So, why not applying such a principle from information theory also for every day life in order to better understand ourselves as well as others? Let’s analyze deeper the question “Do you spend more money with a brand that you think is socially responsible?”

First of all, the question is a close one since the answer must be yes or not. When I’ve realized that I felt myself uncomfortable… why? I thought and I realized that is due to the value of “social responsibility” that in the question is forced to be against “price” (money).

Acknowledge that, I inferred unconsciously that if the answer of the question would have been YES it means that social responsibility is priceless thus more important that money. Vice versa, if the answer would have been NO.

…however, why inferring such considerations? which is the assumption behind? That was my doubt and my hypothesis was that the assumption behind the tricky question “Do you spend more money with a brand that you think is socially responsible?” is: beeing social responsible costs!

…wow, eureka! So, why not creating such conditions so that pursuing social responsibility implies intrinsically cheaper products?

That was my question that I’ve delivered to the owner of the research…and, as an incredible surprise, I’ve receive the following answer: “The impression is socially responsible = higher product cost to the consumer.”

Bingo! The assumption that I’ve inferred is right. There is a kind of cultural impression, suggestion and mindset that unconsciously let us to think (me included) that if you want social responsible products there are no other ways: you have to pay more! Why?

Paradoxically, since people behave according to incentives, if socially responsibility implies intrinsically cheaper prices instead, a virtuous circle will be established!

How to create a context where the assumption “socially responsible = higher product” is replaced with “socially responsible = cheaper product”?

…I don’t know, any idea?

Meanwhile, why not applying the DAI (Data, Assumption, Information) model whenever we inferred quick answers?

Behind each information there is an unknown world of undisclosed assumptions.

Feelink – Feel & Think approach for doing life!

Semantic search algorithm, behaviorism and fairy-tale Snowwhite with the seven dwarfs. Would SEO behave like Grumpy?


How does semantic search work? Which are the implications regarding SEO tactics and users/customers’ behaviors?

Google search is not unlike the “Mirror, mirror on the wall, who’s the fairest of them all?” where the question asked, reveals (in the fairy tale) the Evil Queen’s narcissistic obsession

, what a great metaphor to explain how semantic search works! (see Google Search and the Racial Bias).

I will take the assist from David Amerland to help me to better understand how the SEO world (something still unknown from me) as well as remembering childhood times with the fair tale “Snow White and the seven dwarfs“.

So, let’s have a look at the characters of the famous fairy-tale:

The mirror is the result of the search engine. According to what I’ve understood about semantic search, the mirror reflects back a result that is contextualize accordingly to the user and his/her relationships among the social networks as well as thorough the analysis of past behaviours.

Snow White is the most beautiful creature in the WEB forest. She publishes smart content as well as she establishes such trusted relationships in the social medias so that the mirror (the semantic search engine) reflects back a beautiful princess… accordingly to the algorithm I would say.

The evil queen is the bad guy, attempting to be viewed as the most beautiful in the WEB forest while it is not. The evil queen struggles and suffers a lot for that, since the mirror suggest always Snow White as the best result… the life in the digital jungle is not so easy for the evil queen!

The poisoned apple represents a trick, a negative SEO attack where the objective either is to game the search engine (the mirror) or to compromise the reputation of Snow White. Fake reviews, negative or positive SEO tactics, are just an example of how an apple could be poisoned in order to kill digitally a competitor and game the search engine algorithm (see the case of Tripdavisor).

The seven dwarfs are data scientists and SEO experts that are mining the WEB forest in order to get some valuable and reliable information from the WEB. Usually they are well-intentioned and thus willing to protect the beauty of Snow White from negative SEO (the poisoned apple and the evil princess).

The charming Prince represents all the users, companies and individuals, that go deeper and deeper into the WEB forest in order to discover the truth. Mirror’s result apart: Who is really the fairest in the WEB forest?Encountering few smart dwarf might be useful for the charming Prince, both in the forest to discover the beauty of Snow White and in the WEB to find out great contents and reputations accordingly to personal impressions rather than only relying on algorithms.

…so, which is the moral of the fairy-tail “Snow White and the seven dwarfs” applied to the modern semantic search and SEO?

An interesting point has been pointed out by D. Amerland in his article “How semantic search is changing end-user behaviour“. In particular:

The fact remains that the web is changing, search has changed and the way we operate as individuals, as well as marketers, has changed with it.

Since the semantic search is so powerful to influence the behaviour of the end-user (individuals, companies,…), the point is: what kind of algorithm there is behind the mirror on the wall? Which are the criteria behind the result that identify the fairest princess in the WEB?

More interesting doubt: what happen if the criteria behind the search algorithm (the mirror) change so that the fairest in the WEB would be Grumpy, one of the seven dwarfs? Would all the end-user and SEO really want to become and behave like Grumpy?

seo_mirror_on_the_wall

My Issue with BigData Sentiment Bubble: Sorry, Which Is the Variance of the Noise? (NON Verbal Communication)


Why sentiment analysis is so hard? How to interpret the word “Crush” in a tweet? Crush as in “being in love” or Crush as in “I will crush you”? According to Albert Mehrabian communication model and statistics, I would say that on average a tweet for a sentimenter has an accuracy of 7%. No such a big deal, isn’t it?

Let’s think about it by considering, as an example, the case of the sentiment analysis described in My issues with Big Data: Sentiment: crush as in “being in love” (positive) or crush as in “I will crush you” (negative)?

What is a sentimenter? As a process, is a tool that from an input (tweets) produce an outupt like “the sentiment is positive” or “the sentiment is negative“. Many sentimenters are even supposed to estimate how much the mood is positive or negative: cool!

Paraverbal and non-verbal communication

Anyhow, according to Albert Mehrabian the information transmitted in a communication process is 7% verbal, 38% paraverbal (tone of the voice) and the remaining 55% is non-verbal communication (facial expressions, gestures, posture,..).

In a Tweet, as well in a SMS or e-mail, neither paraverbal nor non-verbal communication are transmitted. Therefore, from a single tweet is possible to extract only the 7% of the information available: the text (verbal communication).

So, what about the paraverbal and non verbal communication? During a real life conversation, they play a key role since they count for 93% of all the message. Moreover, since paraverbal and non verbal messages are strictly connected with emotions, they are exactly what we need: sentiments!

Emotions are also transmitted and expressed though words such as “crush” in the example mentioned. However, within a communication process, not always the verbal and non-verbal are consistent. That’s the case when we talk with a friend, he\she saiys that everything is ok while we perceive, more or less consciously, something different from his\her tone or expressions. Thus we might ask: are you really sure that everything is ok? As a golden role, also for every day life, I would recommend to use non-verlbal signals as an opportunity to make questions rather than inferring mislead answers (see also: A good picture for Acceptance: feel the divergences & think how to deal with).

For these reason, the non-verbal messages are a kind of noise that interferes with verbal communication. In a tweet, it is a noise that interferes with the text. Such a noise can be as much disturbing as much the transmitter and the receiver are sensitive to the non-verbal communication. It might be so much disturbing to change completely the meaning of the message received.

Statistic and Information Theory

From a statistic point of view the noise might be significantly reduced by collecting more samples. In Twitter, a tweet is one sample and each tweet have 7% of available information (text) and 93% of noise (non verbal communication) that is the unknown information.

From a prediction\estimation point of view no noise means no errors.

Thus, thanks to BigData, if the sentimenter analyzes all the tweets theoretically it’s possible to reduce the noise to zero and thus having no prediction error about sentiments…...WRONG!!!

Even if the sentimenter is able to provide a result by analyzing all the BigData tweets (see Statistical Truisms in the Age of Big Data Features):

the final error in our predictive models is likely to be irreducible beyond a certain threshold: this is the intrinsic sample variance“.

The variance is an estimation of how much samples are different each others. In the case of a communication process, that means how much emotions are changeable through time. Just for fun, next time, try to talk to a friend by changing randomly your mood happy, sad, angry,..and see what happen with him\her (just in case, before fighting tell him\her that is part of an experiment that you’ve read in this post).

In Twitter, the variance of the samples is an estimation about how much differently emotions are impacting the use of certain words in a tweet, from person to person at a specific time. Or, similarly, by considering one person, how much emotions are impacting the use of words differently through time.

Like in a funnel (see picture), the sentimenter can eliminate the noise and thus reduce the size of the tweet bubbles (the higher the bubble the higher the noise) till a fixed limit that depends on the quality of the sample: its variance.

Sentimenter_Twitter_Funnel

So, I have a question for bigdata sentimenters: which is the sample variance of tweets due to non-verbal communication? Acknowledge the sample variance, the error of prediction of the best sentimenter ever is also given:

error of prediction (size of the bubble sentiment) = sample variance of tweets…

…with the assumption that both samples and algorithm used by the sentimenter are not slanted\biased. If this is not the case, the sentiment bigdata bubble might be even larger and the prediction less reliable. Anyhow, that is another story, another issue for BigData sentimenters (coming soon, here in this blog. Stay tuned!).

Feelink – Feel & Think approach for doing life!