Social data is honest.

For decades, physicists are looking for a universal theory that unifies essential theories such as gravity and relativity. Do engineers especially in the domain of computing and information systems have a universal model to explain all tasks, techniques, models, and problems?

I believe the answer is filter theory. I personally was not able to find even a single counter example. From data transmission to data processing to data storage, encryption, recommendation, and prediction, all can be explained by filter theory.

When the innate characteristic of all our models is filtering, it means our main focus is on channels. A channel could be a transmission line, memory, a learning machine, or a compression algorithm. We spend a great deal of effort and off course academic publishing to design trustworthy channels or filters. Trustworthy channels don’t lie, don’t betray the source, or mislead the destination. In other words, our focus has been designing honest channels. But these all are based on one strong assumption: source is good, trustworthy, and honest.

Almost in all models, the honesty of the source is out of context. In the case of natural and man-made sources such as sensors, engineers model and estimate sampling and measurement error. But it is obvious, dishonesty is different from error. In social context, error is an honest mistake but with good faith. Error is not because of bad intention and mostly caused by poor judgement.

If honesty means: “a sincere intention to deal fairly with others,” then dishonesty has two features to be detected: insincere intention and lack of fairness. Unfortunately in many social and human behavior studies which are based on data collection from questionnaires and surveys, these two variables are hard to be measured. As a matter of fact, we humans demonstrate our true intentions and opinions when we unintentionally behave or express our views. It means we don’t manipulate our thought for the sake of interests or preventing threats. But most lab experiments, surveys, and questionnaires are based on this fact that participants know they are under study.

When we know the consequence of our action, our behavior dramatically changes. This is why we rely on body language more than the oral language. Body language is an honest language. And this is why politicians always stand behind podium to hide their true intention and express “what they have to say” and “not what they really want to say.”

Social media provide us a unique opportunity to collect “honest data”.
The honesty of social data can be measured by the following criteria:

1. In most cases, users are publishing their opinions, express their intentions, and behave without calculating the consequences. We should note that social data is not based on one incident but based on the frequency of a behavior and repeating similar pattern again and again. A good example is “like” feature in face book. A user may “like” tens of items per day and over thousands each year.
2. Social data is and has to be big. One reason has been addressed in the first criterion. The second is by increasing the size of data the likelihood of noise and outlier is reduced.
3. Social data is and has to be based on heavy users. Infrequent users are usually either very conservative (calculating consequences and hiding true intentions) or they are not very serious.
4. Social data is interactive. It means by interaction more hidden behaviors are revealed.
5. Social data have to be collected from independent sources. Not all social data are honest. If the opinions are not from independent sources, the credibility and honesty of the data are questionable.

 

Leave a comment