Contact

R20/Consultancy

+31 252-514080

info@r20.nl

 

 

 

 

 

Title: New Big Database Technologies; From NoSQL to NewSQL, and from Hadoop to Spark

Introduction

Big Data, Hadoop, NoSQL, analytical database servers, MapReduce, appliances are all immensely popular terms in the IT industry. For most organizations they questions are: How mature are all these new technologies? Are they worthy replacements for the more traditional SQL products? How should they be incorporated in the existing Data Warehouse architecture? This seminar discusses and explains these new data storage technologies clearly and explains why and how they can be relevant for any organization.

This seminar is intended for anyone who has to stay up to date and implement the new developments, including data warehouse designers, business intelligence experts, database specialists, database experts, consultants, and technology planners.

Subjects

1. Big Data: State of the art

  • What exactly do we mean with big data?
  • The key application area of big data: business analytics
  • Differences between semi-structured, poly-structured, multi-structured, and unstructured data
  • Examples of big data: sensor data, (micro-)event data, textual data, and clickstream data

2. Analytical SQL Database Servers

  • Classification of analytical SQL database servers, and can they compete with NoSQL products?
  • The advantages and disadvantages of column-based database servers
  • How important is in-database analytics?
  • Is loading databases into internal memory the solution? Is it feasible?
  • Market overview, including Exasol, HP/Vertica, IBM PureData Systems for Analytics, Actian Matrix and Vector, Kognitio WX2, Oracle Exalytics, SAP HANA, Teradata Appliances, and Teradata Aster Database

3. The World of Hadoop

  • The Hadoop stack explained: HDFS, MapReduce, Spark, Hive, HBase, YARN, ZooKeeper, Pig, HCatalog, and so on
  • Characteristics and consequences of HDFS and file formats
  • Alternative implementations by MapR, Amazon, and ScaleOut (Hadoop in-memory)
  • Use of MapReduce for analytics and reporting
  • Storm for streaming data
  • The role of Cloudera, HortonWorks, and MapR

4. NoSQL Database Stores

  • Classification of NoSQL products: key-values stores, document stores, column-family stores, and graph data stores
  • It’s all about data scalability and performance
  • Why is schema-on-read more flexible than schema-on-write?
  • Are NoSQL products really database servers?
  • Market overview, including Apache HBase and CouchDB, Cassandra, Cloudera, DataStax, InfiniteGraph, Riak, MongoDB, and Neo4J

5. Exploring Data in Hadoop Using SQL

  • Making Hadoop data available for reporting and analysis through SQL-on-Hadoop engines
  • Examples of SQL-on-Hadoop engines, including Apache Drill, Apache Hive, Apache Phoenix, Cloudera Impala, HP Vertica, JethroData, MemSQL, Pivotal HawQ, Spark SQL and Splice Machine
  • Data virtualization for unleashing the information hidden in NoSQL and SQL systems

6. NewSQL database servers for transaction workloads

  • NewSQL database servers are designed for high-performance transactional systems
  • Simpler transaction mechanisms
  • The challenge of multi-table joins
  • Market overview, including Akiban, CitusDB, Clustrix, MariaDB, NuoDB, TransLattice, VMware SQLFire and VoltDB

7. Concluding Remarks

What You Will Learn:

  • Why traditional database technology is not “big” enough
  • How different are Hadoop and NoSQL form traditional technology
  • How new and existing technologies such as Hadoop, NoSQL, and NewSQL can help develop BI and big data systems
  • How to embed Hadoop technologies in existing BI systems
  • How Spark can boost performance for analytics
  • How to distinguish between three NoSQL subcategories: key-value, document, and column-family stores
  • Why graph databases are very different from all other systems
  • When to use NewSQL or NoSQL for developing transactional systems
  • How to simplify data access through SQL-on-Hadoop engines
  • When to use which new data storage technology and the pros and cons of each solution
  • Which products and technologies are winners and which are losers

Geared to: IT architects; database specialists; big data specialists; BI specialists; data warehouse designers; technology planners; technical architects; enterprise architects; IT consultants; IT strategists; systems analysts; database developers; database administrators; solutions architects; data architects.

Related Articles and Blogs:

 Interview with Rick van der Lans: New Technologies Complementing Traditional BI

 The Next Stage of Hadoop and Big Data is all About Simplification

 Polyglot Persistence and Future Integration Costs

Related Whitepapers:

 SQL Syntax for Apache Drill; Using SQL for the SQL-on-Everything Engine; December 2015; sponsored by DZone

 How Drill Enriches Self-Service Analytics; The Added Value of a SQL-on-Everthing Engine; November 2015; sponsored by MapR Technologies

 Mixed, Shifting, and High-Concurrency Workloads in Data Warehouse Systems; July 2012; sponsored by Teradata Corporation

 InfiniteGraph: Extending Business, Social, and Government Intelligence with Graph Analytics; September 2010; sponsored by InfiniteGraph