"Data analysis 101"
In our world of what seems to be automated “everything,” we tend to take data analysis for granted. We believe it to be something that only computers do somewhere within a distant server farm – quietly crunching numbers as we go about our daily lives. Or we envision specialized personnel studiously hunched over their desk, reviewing row after row of Excel spreadsheets.
While this may be true, to some extent, with regard to quantitative data, when it comes to those tweets and other social media posts, textual data is a different data analysis story. Human analysis continues to be required for most of the data collected from unstructured sources (see the prior post What is Data).
What people say to each other about your business can have a deep impact on your bottom line. All it takes is one negative image to sully consumers’ impression of your brand – think United Airlines public relations disaster in the first half of 2017.
For this reason, the more information you have regarding your consumer segments, more likely you will have the ability to create products and services that meet consumer needs or pain points that no other business is currently attending to.
Do I Need to Analyze Data?
The short answer is – yes. Even if you are using software with a slick dashboard that displays the colorful pie charts and bar graphs for you, it’s prudent to understand the basics of data analytics.
Why?
The bulk of your data isn’t numerical; it’s the written or spoken word. Machines (computers) aren’t up to speed on accurately reading and interpreting the semantics – or meaning – of human language. Therefore, small businesses are going to need to figure out a way to gather and analyze large data sets that aren’t immediately quantifiable. Certainly, sentiment analysis is readily available to you. However, determining whether a customer is saying something positive or negative about a business continues to be primarily a human interpretation. It’s all about context!
Cleaning and Transforming Data
It’s almost impossible to derive meaning from data that’s unstructured and missing information. As such, cleaning the data is imperative. Data scientists clean data (also called data scrubbing) for both human and machine processing. One fundamental purpose of cleaning a dataset is to gain an understanding of its relevance and type.
During this process, they’ll review both quantitative and qualitative data for typographical errors and standardize the values. Ultimately, for now, all data has to be translated into a similar form for easier interpretation. A simple example is shown below:
Models Aren’t Just for High Fashion Runways and Magazines
Every human being – and computer – has a “model” in place for classifying and interpreting all of the information we receive each day. In psychology, we call these “schemas.” But, it may be easier to picture these as internal spreadsheets where we store and then create rules for our own behavior.
For example, if you dislike chocolate cake but love ice cream, then your mental spreadsheet may be something as simple as:
Type: Dessert | Eat? |
Chocolate Cake | No |
Ice Cream | Yes |
Meanwhile, due to the fact that the primary language of computers is math, the construct might look more like:
Type: Dessert | Eat? |
Chocolate Cake | 0 |
Ice Cream | 1 |
What’s missing for both models is the “why.” Data modeling assists in understanding the reason behind your dislike for the chocolate cake by revealing the underlying relationship between “dislike” and “chocolate cake.” Data scientists use statistical models – those treacherous mathematical algorithms that cause non-math friendly people to shriek and run away – to assist in the data modeling process. So, while you’re thinking “yes” or “no,” the visual model produced through computation may appear like this:
Using our case of chocolate cake versus ice cream, the blue circles would represent “no” (or “false” if you were posed the question, “Do you like chocolate cake?”), and the red crosses would mean “yes” (meaning “true” with regards to the same question).
There are several different models available in the data science toolbox. Additional models are continually being developed to increase the accuracy of predicting what your customers mean when they send an email to you stating they loved one aspect of your product (or service) but greatly disliked another. Human sentiment isn’t consistently a binary, yes or no communication.
To be completely forthcoming, what has been presented above is a simplified version of what truly occurs through the data analysis process. But, it’s a solid start to furthering your understanding of what data “is” and how it's transformed into actionable information.
To get all the benefits of Array, sign up here for free.