Title: Practical Guidelines for Designing Modern Data Architectures
tips, experiences, guidelines, do’s and don’ts
Digital transformation, the data-driven organization, and the ‘data economy’ are popular topics in boardrooms today, because organizations understand the need to do more with data. They want to make use of the strengths and benefits of data science, self-service BI, embedded BI, edge analytics, and customer-driven BI. The consequence is that data needs to be deployed more widely, more efficiently, and more effectively. Unfortunately, current IT systems,
such as the data warehouse and transactional systems, can no longer cope with the ever-increasing workload. They have already been overstretched. For many organizations it’s time for a new and future-proof data architecture.
But designing new data architectures is not something you do every day. This two-day seminar answers many common questions raised when designing a new data architecture. Practical guidelines, tips, design criteria, tips, design principles, use cases, and real life examples are extensively discussed. Concepts and technologies, such as data lakes, big data, datavault, cloud, data virtualization, Hadoop, NoSQL , and data warehouse automation, are explained.
The seminar is based on practical experiences with designing and implementing modern data architectures for numerous organizations.
Digital transformation, the data-driven organization, and the ‘data economy’ are popular topics in boardrooms today. Regardless of what these terms exactly mean, it means that organizations want to do more with data. Data has to be deployed more widely, more efficiently, and more effectively to improve their business and decision-making processes and to increase their competitive power. Technically, this implies that new forms of data usage must be
deployed, such as data science, self-service BI, embedded BI, edge analytics, and customer-driven BI.
Unfortunately, current IT systems such as the data warehouse and the transactional systems, can no longer cope with these new, more intense, and resource-intense forms of data usage. The current data architecture for data delivery is already overstretched. Some of these systems are over twenty years old and many changes and extensions have been applied. They can’t process the ever-increasing workload. Additionally, because they have become
static and inflexible, implementing new reports and executing new forms of analytics have become very time consuming. In other words, the current data-architecture can’t cope with today’s current ‘speed of business change’.
The effect is that, understandably, countless organizations have decided to develop a new and future-proof data architecture. However, this is easier said than done. You don’t design data architectures every day. Which new technologies are available today? What is the influence of new technologies on the architecture, such as Hadoop, NoSQL, big data, data warehouse automation, and data-streaming? Which new architectural principles should
be applied? How do we handle the new rules and regulations for data storage and analysis? And what is the influence of cloud platforms?
This two-day seminar answers most of the common questions architects have when designing a modern data architecture. This is done through guidelines, tips, and design rules. Concepts and technologies, such as data lakes, big data, datavault, cloud, data virtualization, Hadoop, NoSQL, data warehouse automation, and anonymization of data are discussed. The seminar is based on practical experiences while designing and implementing modern
data architectures. Also, the relationship between a modern data architecture and more organizational aspects are addressed as well, including data quality, data governance, data strategy, and the migration to a new architecture.
Part 1: Introduction - what is a data architecture?
Part 2: Overview of new technologies for data storage, data processing, and data analytics
Part 3: Design aspects of data-architectures
Part 4: Innovative new data architectures
Part 5: Action plan for developing a complete and correct data architecture
Part 1: Introduction - what is a Data Architecture?
Why a new data architecture?
Examples of real life data architectures
What are the key elements of a data architecture?
What are the differences between a data architecture and a solutions-architecture?
From batch via Lambda to the Kappa architecture
Benefits, drawbacks, and shortcomings of well-known reference architectures, such as the classic data warehouse architecture, the data lake, and transactional systems
From vision to implementation plan
Part 2: Overview of New Technologies for Data Storage, Data Processing, and Data Analytics
Benefits, drawbacks, features, and use cases of each technology
Data storage: analytical SQL, NoSQL, Hadoop, cubes
Data integration: ETL, data virtualization, data replication, data warehouse automation, enterprise service bus, API gateway
Data cleansing: home-made, professional
Data streaming: messaging, Kafka, streaming SQL
Data documentation: data glossary, data catalog, metadata management
Reporting tools: self-service BI, dashboards, embedded BI
Data science tools: programming languages, such as R and Python, machine learning automation tools, data science workbenches
Data security: anonymization, authorization
Part 3: Design Aspects for Data Architectures
First the technology or first the data architecture?
The importance of reusable transformation specifications for e.g. integration, filtering, correcting, and aggregation of data
Influence of specialized technology on data architectures
Why migration to the cloud: unburdening, high performance, scalability, available software?
Are all software products suitable for the cloud?
Design principles for dealing with data history and data cleansing
Modernization of a classic data warehouse architecture
Generating a data warehouse architecture with data warehouse automation tools
New requirements for transactional systems, such as storing historic data and continuous logging
The influence of GDPR: deleting customer data
Responsibility of data quality
Part 4: Innovative New Data Architectures
The logical data warehouse architecture as an agile alternative
Design rules, do’s and don’ts for a logical data warehouse architecture
From a single-purpose to a multi-purpose data lake
Requirements for implementing data science models, such as transparency, immutability, and version control
The changing role of the data lake: from data delivery system for data scientists to a platform for storing all the enterprise and external data
A data streaming architecture; when every microsecond counts
Technical challenges: performance, inconsistent data streams, storing massive amounts of messages for analytics afterwards
Operationalization of data science models
Merging data architectures to one unified data delivery platform
Differences between data hub and data warehouse
The data marketplace: from taylor-made to ready-made
Part 5: Action Plan for Developing a Complete and Correct Data Architecture
What is the business motivation for a new data architecture: ICT cost reduction, competitive improvement, new business model, new laws and regulations, improving reaction speed to business demands, or a more efficient exploitation of available data?
The importance of a business strategy and data strategy and the relationship with the data architecture
Who are the stakeholders and what is the C-level support?
Maturity level of the ICT organization
Description of the current data architecture; data flow, data storage, quantities, and technologies in use
Stock-taking of current bottlenecks; business and ICT, performance, functionality, costs, ICT organization and the immediate environment
Constraining rules, such as laws and regulations, budget restrictions, software limitations, and legacy systems.
Requirements and needs of the new data architecture; financial, available expertise, software, quantities, uptime, speed of data delivery, and level of unburdening.
Architecture and design principles
Current and future forms of data usage: standard reports, self-service BI, data science, customer-driven, mobile apps
Forms of data usage; batch, manual internally, manual extern ally, and sensors
Data types in use, including structured, unstructured, audio, video, text, and geo/gis.
Setting up the data architecture project; which choices must be made, which steps to take, is a PoC or Pilot required, what are key questions in a RfI, and convincing the organization
Part 6: Closing Remarks
What are the steps to take to come up with the perfect data architecture? From requirement analysis via proof of concepts to a data architecture.
What is the importance of a holistic approach to analyzing technology, organization, and architecture in conjunction?
What are real life examples of new data architectures?
How can the new technology use optimally within a new data architecture?
How do you develop a data architecture?
Which components make up a data architecture?
What are the use cases, pros and cons of new technologies and how do they influence data architectures?
What is the value of well-known reference architectures, such as the Lambda architecture, the logical data warehouse architecture and the data lake?
What are the right criteria for a data architecture?
Data Virtualization: Selected Writings by Rick F. van der Lans
Data Virtualization for Business Intelligence Systems by Rick F. van der Lans
Related Articles and Blogs:
Part 1: Drowning in Data Delivery Systems, May 2018
Part 2: Key Benefits of a Unified Data Delivery Platform, June 2018
Part 3: How Siloed Data Delivery Systems Were Born, June 2018
Part 4: Big Data is Not the Biggest Change in IT, June 2018
Part 5: Requirements for a Unified Data Delivery Platform , June 2018
Part 6: A Unified Data Delivery Platform - A Summary, June 2018
The Fusion of Distributed Data Lakes - Developing Modern Data Lakes; February 2019,
sponsored by TIBCO Software
Unifying Data Delivery Systems Through Data Virtualization; October 2018;
sponsored by fraXses
Architecting the Multi-Purpose Data Lake With Data Virtualization; April 2018;
sponsored by Denodo Technologies
The Next Wave of Analytics - At the Edge; December 2017;
sponsored by Edge Intelligence Software
Developing a Data Delivery Platform with Composite Information Server; June 2010; sponsored by Cisco (Composite Software)
Geared to: Business intelligence specialists; data analysts; data warehouse designers; business analysts; data scientists; technology planners; technical architects; enterprise architects; IT consultants; IT strategists; systems analysts; database developers; database administrators; solutions architects; data architects; IT managers.