BI Insights

Big Data 101: Intro To Probabilistic Data Structures

17 April, 2017

Big Data 101: Intro To Probabilistic Data Structures

Oftentimes while analyzing big data we have a need to make checks on pieces of data like number of items in the dataset, number of unique items, and their occurrence frequency. Hash tables or Hash sets are usually employed for this purpose. But when the dataset becomes so enormous that it cannot fit inside the memory all at once, we need to use special kinds of data structures known as Probabilistic Data Structures. Streaming applications usually require data processing in one pass and then incremental updates. Fortunately, probabilistic data structures fit that processing model very well. Such data structures ignore collisions but errors are controlled under a certain specified threshold. They trade in a small margin of error for considerably less memory footprint and constant query time. This article discusses some commonly used probabilistic data structures:

Read full story

Related Articles

24 April, 2017

How To Build A Big Data Engineering Team

Publication: Forbes


25 April, 2017

The Best Data Visualization Tools Available For Nonprofits

Publication: Artipot


1 May, 2017

How Companies Say They’re Using Big Data

Publication: Harvard Business Review


The BI Guru
Presented by