Contact

R20/Consultancy

+31 252-514080

info@r20.nl

 

 

 

 

 

Title: New Big Database Technologies; A Market Overview of Technologies and Products

Introduction

With the introduction of big data and cloud platforms, a tsunami of new technologies and products for data storage, processing, and analytics has been introduced. Hadoop, Spark,  NoSQL, NewSQL, triplestores, SQL-on-Hadoop are just a few of the countless technologies that have become available for developing big data systems. But also so many new powerful database engines have entered the market, including Amazon Athena, Cloudera, Exasol, Google BigQuery, Microsoft Synapse, MongoDB, Neo4j, SingleStore, SnowflakeDB, Splice Machine, and Starburst.

Most organizations have many questions. How mature are all these new technologies? Are they worthy replacements for the more traditional SQL products? How should they be incorporated in existing data warehouse architecture? Should they be used to develop data lakes? Are they the perfect platforms for data science, or for operational BI?

This seminar gives a clear, extensive, and critical overview of all the new key technologies for storing, processing, and analyzing big data. Technologies are explained, market overviews are presented, strengths and weaknesses are discussed, and guidelines and best practices are given. It’s the perfect update for those interested in the new market of big data technology.

Subjects

1. Big Data: State of the art

  • What exactly do we mean with big data?
  • The key application area of big data: business analytics
  • Differences between semi-structured, poly-structured, multi-structured, and unstructured data
  • Big data systems require specialization of database engines

2. Analytical SQL Database Servers

  • Classification of analytical SQL database servers, and can they compete with NoSQL products?
  • Techniques to improve performance and scalabiloty, including column-based storage, sharding, in-memory analytics, and query compilation
  • How important is in-database analytics?
  • Is loading databases into internal memory the solution? Is it feasible?
  • Market overview, including Amazon Athena, Exasol, Google BigQuery, HP/Vertica, Microsoft Synapse, SingleStore, SnowflakeDB, Splice Machine, and Starburst.

3. The World of Hadoop and Spark

  • The Hadoop stack explained: HDFS, MapReduce, Spark, Hive, HBase, YARN, ZooKeeper, Pig, HCatalog, and so on
  • Characteristics and consequences of HDFS and file formats
  • Alternative implementations by Amazon, Google, and Microsoft
  • Kafka for fast messaging

4. NoSQL Database Stores

  • Classification of NoSQL products: key-values stores, document stores, column-family stores, and graph data stores
  • It’s all about data scalability and performance
  • Why is schema-on-read more flexible than schema-on-write?
  • Are NoSQL products really database servers?
  • Market overview, including Apache HBase and CouchDB, Cassandra, Cloudera, DataStax, InfiniteGraph, MongoDB, and Neo4J

5. Exploring Data in Hadoop Using SQL

  • Making Hadoop data available for reporting and analysis through SQL-on-Hadoop engines
  • Examples of SQL-on-Hadoop engines, including Apache Drill, Apache Hive, Apache Phoenix, Cloudera Impala, HP Vertica, Pivotal HawQ, Spark SQL and Splice Machine
  • Data virtualization for unleashing the information hidden in NoSQL and SQL systems

6. NewSQL database servers for transaction workloads

  • NewSQL database servers are designed for high-performance transactional systems
  • Simpler transaction mechanisms
  • The challenge of multi-table joins
  • Market overview, including CitusDB, Clustrix, and SingleStore

7. Concluding Remarks

What You Will Learn:

  • Why traditional database technology is not “big” enough
  • How analytical SQL engines can help to simplify data architectures
  • How different are Hadoop and NoSQL from traditional technology
  • How new and existing technologies such as Hadoop, NoSQL, and NewSQL can help develop BI and big data systems
  • How to embed Hadoop technologies in existing BI systems
  • How Spark can boost performance for analytics
  • How to distinguish between three NoSQL subcategories: key-value, document, and column-family stores
  • Why graph databases are very different from all other systems
  • When to use NewSQL or NoSQL for developing transactional systems
  • How to simplify data access through SQL-on-Hadoop engines
  • When to use which new data storage technology and the pros and cons of each solution
  • Which products and technologies are winners and which are losers

Geared to: IT architects; database specialists; big data specialists; BI specialists; data warehouse designers; technology planners; technical architects; enterprise architects; IT consultants; IT strategists; systems analysts; database developers; database administrators; solutions architects; data architects.

Related Articles and Blogs:

 Interview with Rick van der Lans: New Technologies Complementing Traditional BI

Related Whitepapers:

 SQL Syntax for Apache Drill; Using SQL for the SQL-on-Everything Engine; December 2015; sponsored by DZone

 InfiniteGraph: Extending Business, Social, and Government Intelligence with Graph Analytics; September 2010; sponsored by InfiniteGraph