Contact

R20/Consultancy

+31 252-514080

info@r20.nl

 

 

 

 

 

Title: Incorporating Big Data, Hadoop, and NoSQL in Business Intelligence Systems and Data Warehouses

Introduction

Big data, Hadoop, in-memory analytics, Spark, Kafka, self-service BI, fast data, data warehouse automation, analytical database servers, data virtualization, data vault, operational intelligence, predictive analytics, and NoSQL are just a few of the new technologies and techniques that have become available for developing BI systems. Most of them are very powerful and allow for development of more flexible and scalable BI systems. But which ones do you pick?

Due to this waterfall of new developments, it’s becoming harder and harder for organizations to select the right tools. Which technologies are relevant? Are they mature? What are their use cases? These are all valid but difficult to answer questions.

This seminar gives a clear, extensive, and critical overview of all the new developments and their inter-relationships. Technologies and techniques are explained, market overviews are presented, strengths and weaknesses are discussed, and guidelines and best practices are given.

The biggest revolution in BI is evidently big data. Therefore, considerable time in the seminar is reserved for this intriguing topic. Hadoop, Spark, MapReduce, Kafka, Hive, NoSQL, SQL-on-Hadoop are all explained. In addition, the relation with analytics is discussed extensively.

This seminar gives you a unique opportunity to see and learn about all the new BI developments. It’s the perfect update for those interested in knowing how to make BI systems ready for the coming ten years.

Subjects

1. The Changing World of Business Intelligence

  • Big Data: Hype or reality?
  • Operational intelligence: does it require online data warehouses?
  • Fast data is the next frontier of big data
  • Data warehouses in the cloud
  • Self-service BI
  • The business value of analytics

2. Hadoop Explained

  • The relationship between big data and analytics
  • The Hadoop software stack explained, including HDFS, MapReduce, YARN, Kudu, Hive, Impala, Storm, Sqoop, Flume, and HBase
  • The balancing act: productivity versus scalability
  • Making big data available to a larger audience with SQL-on-Hadoop engines, such as Apache Drill, Apache Hive, Apache Impala, Apache Phoenix, HP Vertica, IBM BigSQL, JethroData, MemSQL, SparkSQL, and Splice Machine

3. Spark Explained

  • Spark is about in-memory analytical processing
  • The interfaces: SQL, R, Scala, Python
  • Does Spark need Hadoop?
  • The relationship between Spark and data science
  • Examples of use cases of Spark

4. NoSQL Explained

  • Classification of NoSQL database servers: key-value stores, document stores, column-family stores and graph data stores
  • Market overview: CouchDB, Cassandra, Cloudera, MongoDB, and Neo4j
  • Strong consistency or eventual consistency?
  • Why an aggregate data model?
  • Use case of NoSQL products
  • How to analyze data stored in NoSQL databases

5. Overview of Analytical SQL Database Servers

  • Are classic SQL database servers more suitable for data warehousing?

  • Important performance improving features: column-oriented storage, in-database analytics

  • The new generation of GPU-based database servers: BlazingDB, Kinetica, MapD, and SQream
  • Market overview of analytical SQL database servers: Apache Greenplum, Edge Intelligence, Exasol, HP Vertica, IBM PureData Systems for Analytics, InfoBright, Kognitio WX2, Microsoft PDW, Oracle In-Memory, SAP HANA and Sybase IQ, SnowflakeDB, Teradata Appliance, and Teradata Aster Database

6. Technologies for Fast Data and Streaming Analytics

  • The key use case for fast data: the Internet of Things (IoT)
  • IoT implies streaming data and fast analysis of data - analytics at the speed of business
  • IoT devices: Smartphones (watches), RFID sensors, machines, general sensors, cameras, pace makers, and so on
  • The challenge: real-time reactions on streaming data
  • The difference between big data and fast big data
  • Technologies for streaming data: Apache Kafka, Apache ActiveMQ, Amazon Kinesis, Kestrel, RabbitMQ, and ZeroMQ
  • Differences between these new technologies and traditional message queuing products
  • Products for big data streaming: Apache Storm and Flink, IBM InfoSphere Streams, Informatica for Streaming Analytics, Software AG Apama, and Spark Streaming
  • How to integrate fast data with the enterprise data warehouse?

7. Data Virtualization for Agile BI systems and Lean Integration

  • Data virtualization offers on-demand data integration
  • Seamlessly integrating big data and the data warehouse
  • Market overview: AtScale, Denodo Platform, RedHat JBoss Data Virtualization, Rocket DV, Stone Bond Enterprise Enabler, and Tibco Data Virtualization
  • Importing non-relational data, such as XML documents, web services, NoSQL and Hadoop data, and unstructured data
  • Differences between data virtualization and data blending

8. New Business Intelligence Architectures

  • Discussion of different BI architectures, including Kimball’s Data Warehouse Bus, Architecture, Inmon’s Corporate Information Factory, DW 2.0, the Federated Architecture, the Centralized Warehouse Architecture, the Data Virtualization Architecture, and the BI in the Cloud Architecture
  • Do we still need data marts?
  • What is the role of master data management in BI architectures?
  • Using data vault to create more flexible data warehouses
  • Data warehouse automation to create data warehouses and data marts faster

9. NewSQL Database Servers

  • NewSQL stands for high-performance transactional SQL database servers
  • Simpler transaction mechanisms to implement scale-out
  • What does the term geo-compliancy  mean?
  • Market overview: Clustrix, GenieDB, NuoDB, and VoltDB

10. Data modeling for Big Data, Hadoop, and NoSQL

  • Explanation of non-relational concepts, such as column families, hierarchies, sets, and lists
  • Is storing unstructured and semi-structured data really more flexible?
  • The differences between schema-on-read and schema-on-write
  • Rules for transforming classic data models to NoSQL concepts
  • Application needs influence database design

11. Closing Remarks

Learning Objectives

In this seminar Rick van der Lans answers the following questions:

  • Learn about the trends and the technological developments related to business intelligence, analytics, data warehousing, streaming analytics, and big data.
  • Discover the value of big data and analytics for organizations
  • Learn which products and technologies are winners and which ones are losers.
  • Learn how new and existing technologies, such as Hadoop, NoSQL and NewSQL, will help you create new opportunities in your organization.
  • Learn how more agile data business intelligence systems can be designed.
  • Learn how to embed big data and analytics in existing business intelligence architectures.

Intended Audience:

Business Intelligence Specialists, Data Warehouse Designers, Business Analysts, Technology Planners, Technical Architects, Enterprise Architects, IT Consultants, IT Strategists, Systems Analysts, Database Developers, Database Administrators, Solutions Architects, Data Architects, IT Managers

Related Whitepapers:

 SQL Syntax for Apache Drill; Using SQL for the SQL-on-Everything Engine; December 2015; sponsored by DZone

 How Drill Enriches Self-Service Analytics; The Added Value of a SQL-on-Everything Engine; November 2015; sponsored by MapR Technologies

 SQL-on-Hadoop Engines Explained; May 2014; sponsored by MapR Technologies

 SAP HANA and Data Virtualization: Competitors or Complements?; September 2012; sponsored by Cisco (Composite Software)

 Mixed, Shifting, and High-Concurrency Workloads in Data Warehouse Systems; July 2012; sponsored by Teradata Corporation

 Using SQL-MapReduce for Advanced Analytical Queries - Second Edition; September 2011; sponsored by Teradata InfiniteGraph: Extending Business, Social, and Government Intelligence with Graph Analytics; September 2010; sponsored by InfiniteGraph