Posts

Showing posts from March, 2024

Slowly changing dimensions

  Slowly changing dimensions (SCD)    Dimensions in data warehousing typically contain relatively static data about entities such as geographical locations, customers, or products.  Given that modifications in dimensions happen much less frequently, in unpredictable manner and outside of the regular update schedule that applies to fact tables, these types of dimension tables are called slowly changing dimensions (SCD).     Type 0: History isn't preserved. Dimension attributes reflect original values.     Type 1: Dimension attributes reflect latest values (previous values are overwritten)     Type 2: Every version of dimension member represented with separate row in the table usually with columns that represent period of validity     Type 3: Keeping limited history for selected attribute(s) using additional columns in the same row     Type 4: Keeping history in the separate table while original...

Temporal Tables

Temporal Tables Temporal Tables - use "AS OF" with date/time.   In SQL Server since 2016.   Usually table is built with ValidFrom and ValidTo. Why temporal - Use Cases?     1) Auditing all data changes and performing data forensics when necessary     2) Reconstructing state of the data as of any time in the past     3) Calculating trends over time     4) Maintaining a slowly changing dimension for decision support applications     5) Recovering from accidental data changes and application errors How to create:   CREATE TABLE dbo.Employee (     [EmployeeID] INT NOT NULL PRIMARY KEY CLUSTERED,      ... Employee properties ...     [ValidFrom] DATETIME2 GENERATED ALWAYS AS ROW START,     [ValidTo] DATETIME2 GENERATED ALWAYS AS ROW END,     PERIOD FOR SYSTEM_TIME(ValidFrom, ValidTo)   )   WITH (SYSTEM_VERSIONING =...

Immutability Talks

 Immutability Talks   LEE BRYON of Facebook Immutability - copy rather than change   power (goto) vs. principles (loops, even reduce, etc.)   declarative (what not how) Show changes by making more changes Latency, Intermittency, and Failure REST - fires loading of subtables Architecture:   old: MVC and REST   new: Immutable, Declarative, Executable, Architecture Views:   View Trees that are mutatable     Models are data (via server) => (old) Components => (old) render into Views   React Native, Component Kit, Litho:    Pure Functional - Javascript functions                                          then Mutable Objects - DOM elements (View)   new is GraphQL and fragments Structural sharing for performance Persistence...

Immutable Databases

 Immutable Databases   Fluree Amazon QLDB - Ledger, hosted, non-open source that is "document-oriented data model". The query language is PartiQL, a new open source document query engine. Oracle offers blockchain tables with its Oracle 21c database immudb -  Ledger, immutable database based on zero trust, SQL/Key-Value/Document model, tamperproof, data change history, open source database written in Golang. Run client-server or embedded. Uses a subset of the PostgreSQL SQL dialect. Has a separate auditor to ensure your database has not been tampered with. TerminusDB is a Version Controlled Database with immutable graph or document database with full schema and data versioning capability. Custom query language called Web Object Query Language (WOQL). Trillian - open source database Dolt - Git for Data. Version Controlled Database. In SQL, Dolt implements Git read operations (ie. diff, log) as system tables and write operations (ie. commit, merge) as functions. The only SQL...

Immutability

 Immutability How can data be immutable? Don’t facts change? They don’t, in fact, when you incorporate time in the data. For instance, when Obama became president, it didn’t mean that Bush was never president. So you just have "As Of" time column(s) in addition to the usual fact columns. Reasons to make all types immutable:     1) Do not change.     2) Immutable objects are simpler to construct, test, and use     3) Thread-safe. Most thread safety problems are due to writes on one thread and reads on another; immutable objects don't have writes.     4) Avoid temporal coupling     5) Usage is side-effect free (no defensive copies)     6) always have failure atomicity     7) much easier to cache     8) prevent NULL references     Immutable objects can be taken apart and re-used. For example, if you have an immutable binary tree then you can use its le...