unstructured_no_sqlWorld today is producing lot more data than before. IBM estimates that 90% of the total data in the world today has been created in the past 2 years. It wasn’t that data didn’t exist before. There have been devices, sensors, financial services firms and lots generating vast amount of data. Companies have been storing, processing and analyzing it as well. So what has happened?

There has been fundamental shift in last few years in consumerization of technologies, emergence of social media channels. Enterprises are now confronted with new forms of data which is classified as “Unstructured Data”. Unstructured Data accounts for 80 percent of all enterprise data, according to Gartner, Forrester and IDC. Gartner predicts unstructured data will grow a whopping 800 percent over next five years and 80 percent of new data will be unstructured data.

Enterprises will have to find right tools & technologies to store, process and analyze this form of data. New categories of databases widely known as No-SQL databases have come up to address this problem. Popular categories are Column Databases, Document Database, Key Value databases and Graph databases.

Not going into technical evaluation we will describe what business value Column and Document databases bring on table. Both of these categories include some very popular open source databases like Cassandra, HBASE, MongoDB. Each boasts flexible schema, horizontal scalability on commodity hardware, zero licensing costs and ability to handle big volumes of data. Let’s have a look how some of these attributes can add value to enterprises:

document-db

Store records with different properties/columns in same table or collection.

Flexible Schema

Use Case

  • A Retailer can fit his product inventory into one table without worrying about creating long list of fields in database to accommodate all or creating different schemas for different product categories.
  • Building 360 degree view of customer across entire functions. Each customer might have different channel of engagement, product portfolio, and set of attributes which can easily be built into single collection and analyzed.

Benefits

  • Lesser go to market time due to significant time savings from database management.
  • Cost savings from little or zero database administration
  • Increased Developer Productivity.
horizontal-scalability-typical-scenario

Supports auto-scaling using commodity hardware. Simply add more instances and data automatically spreads across servers. Performance scales linearly for most systems.

Horizontally Scalable

Use Case

  • Storing huge web logs or device data logs.
  • Data warehouse for storing/Archiving transactional records.

Benefits

  • Cost savings from using commodity hardware as against specialized servers.
  • Can easily be hosted on cloud without need for in house data center.
CUBRID_Shard

Partitions data across nodes automatically and aggregates back for any queries.

Auto-Sharding

Use Case

  • Distributed storage architectures for storing huge sets of data.
  • Read/Write Intensive Applications

Benefits

Spares complexity of developing and managing data partitioning and aggregation layer with SQL systems.

Popular Adoptions

  • Metlife uses MongoDB Document database to provide 360 degree view of its customers, including policy details and transactions.
  • Foursquare uses MongoDB to store user check ins across nodes. It has allowed them to scale and handle traffic for their fast growing application in cost effective and product manner.
  • Ebay uses Datastax’s Cassandra to turn volumes of data into useful insights for its customers.
  • Craigslist uses MongoDB for its flexible document-based storage and built-in scalability. As the schema changes on the live database, MongoDB can accommodate these changes without costly schema migrations.

Using one or another category and which one in these categories will be dictated by type of input data, data access patterns, ease of adoption,learning curve and enterprise support.How we are using No-SQL? View here
Using one or another category and which one in these categories will be dictated by type of input data, data access patterns, ease of adoption,learning curve and enterprise support.How we are using No-SQL? View here

JOIN OUR COMMUNITY
I agree to have my personal information transfered to MailChimp ( more information )
Join over 3.000 like minded AI enthusiasts who are receiving our weekly newsletters talking about the latest development in AI, Machine Learning and other Automation Technologies
We hate spam. Your email address will not be sold or shared with anyone else.
Tags:

Leave a Reply