Understanding the 3 Types of Big Data

Home Stager with Caprice Weston's Home Staging

Credit: rawpixel.com Via Freepik

If you’ve recently scoured the pages of a tech magazine or combed through a business blog, chances are you’ve come across the term “big data.”

No, it doesn’t refer to the consolidated market forces behind data (the way some pundits use “big pharma” or “big ag” to drum up tension). Instead, big data is far more literal: it simply refers to data sets too complex, quick and vast for everyday computing software to store, manage or interpret.

These unmanageable data sets are invaluable to companies – provided they know how to use them. When a company mines big data with machine learning and predictive analytics models, it uncovers some inimitably valuable information. Hidden within these enormous reams of data are insights into consumer behaviors, formal anomalies, risk-reward analyses, fraud detection and much more.

In this post, let’s delve deeper into the complex world of big data by looking at its three types: unstructured, structured and semi-structured data.

Structured Data

Structured data is easily quantifiable and often numeric. It’s the kind of data that’s been around for decades. The information is often neatly compartmentalized in rows and columns, as you might find on a spreadsheet or online form.

Financial statements are a good example of structured data; they often contain numerically expressed transactions that you can easily organize.

Whereas structured data used to be the only type of data, experts estimate that it now accounts for roughly 1/5th of the world’s data output.

Unstructured Data

By contrast, unstructured data is complex, difficult to analyze and cannot sit neatly in the cells of a spreadsheet. In the greater context of historical data analysis, it's a relatively recent type of data. It’s also the biggest form of data we produce, encompassing videos, photos, pdf and text.

Perhaps a helpful example of unstructured data might be consumer feedback or non-starred online reviews. They contain language indicating sentiment, which traditionally requires human intelligence to analyze. Therefore, machine learning and AI algorithms are essential for interpreting large sets of unstructured data.

Semi-structured Data

Semi-structured data is the “goldilocks” of data, containing elements of the above two. A starred review is a good example here, as it contains information expressed numerically (four out of five stars indicates an 80% approval, e.g.), but it also contains qualitative, language-dependent data. Emails are also a good example, as they may contain text alongside quantifiable information like addresses, names and reference numbers.  

The Three Vs of Big Data

The rule of threes doesn’t just apply to types of big data – it also refers to big data characteristics. Specifically, you can define big data by “the three V’s”: velocity, variety and volume. Some experts also add "veracity" and “value” to the mix.

As mentioned, big data is massive in volume, comprising several petabytes or even exabytes of information. The data sets also come in several forms and formats – a variety of text, graphic, video, pdf and IoT-driven forms. Finally, big data comes hot and fast, barrelling at a company with real-time velocity.

What Big Data Means for Consumers

It’s tempting to think that big data represents a cynical ploy for businesses to gain more information on you as a consumer. (And, yes, it’s sometimes used like that). But these structured, unstructured and semi-structured data can also benefit consumers.

Take Nobul as a great example. The real estate digital marketplace leverages AI algorithms to comb through information on real estate agents, like location, language, sales histories and verified reviews. The company then passes that information to consumers so they can make informed choices. Speaking to BNN Bloomberg, CEO Regan McGee shared that “We’re building a full ecosystem: end-to-end real estate, and consumer-centric. Buyers and sellers never pay us anything, and they never see an ad. It’s a competitive process throughout the whole buying and selling of real estate.”

To summarize, big data comes in structured, unstructured and semi-structured forms that mainly relate to complexity. It’s characterized by its velocity, variety and volume. And, as Nobul illustrates, big data has the potential to help consumers leverage valuable insights when making important transactions.


Comments (0)