Category Archives: Data Science

Entity–attribute–value model

Entity–attribute–value model (EAV) is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. In mathematics, this model is known as a sparse matrix. EAV is also known as object–attribute–value modelvertical database model and open schema.  Read more at Wikipedia …

What is Big Data?

Big Data is a buzzword that few people can agree on. So if someone starts spewing on about Big Data, ask them “What is your definition of Big Data?”

Here’s mine.

From a technical standpoint, Big Data is an extension of data warehousing and business intelligence. Data warehousing has always used large data sets, and data mining has always been an advanced analytical method for business intelligence. These technologies have been around since the 1990’s and are well-developed and mature.

So, what is Big Data, truly? Early adopters like Yahoo were trying to apply data warehousing techniques to Social Media data sets, which turned out to be considerably larger than most previous data sets. They solved their storage issues by mimicing Google’s approach using large server farms to store the data. There is now a packaged approach for this called Hadoop. Hadoop allows you to manage the servers, while the data is processed with Google’s Map-Reduce technique.

Because this Social Media was difficult to store in a traditional relational database (RDBMS), early adopters turned to other alternatives. These databases are now typically called NoSQL databases, but this term is vague and useless. The types of databases are many and varied, so find out the name and type of your database. Always use this in a discussion to avoid confusion.

The type of analysis done on this data is typical of business intelligence: basic reporting, probability, statistics, and data mining. Although these techniques are not new, the labor force for them is scarce. Big Data projects often require analysts with advanced skill sets along with additional creative skills to work outside the box.

Who is using Big Data? Primarily the retail and telco sectors, with some new adopters in the financial and health sectors.

In summary, Big Data uses a tool like Hadoop to store and process data, uses a non-traditional database like MangoDB to give structure the data, and uses advanced analytical techniques like data mining to make sense of the data.



Hans Rosling on TED

Stats that reshape your worldview.

You’ve never seen data presented like this. With the drama and urgency of a sportscaster, statistics guru Hans Rosling debunks myths about the so-called “developing world.”

In Hans Rosling’s hands, data sings. Global trends in health and economics come to vivid life. And the big picture of global development—with some surprisingly good news—snaps into sharp focus. Full bio »