Databaseteori

En database er en måte å strukturere store datamengder på slik at det er mulig å finne fram raskt og effektivt.

Typiske databasemotorer er Oracle (verdensledende på databaser, men veldig dyr), Postgres, Mysql og mange fler.
Felles for disse er at de er relasjonsdatabaser - dvs at du lager tabeller og beskriver koblinger mellom dem. Databasemotoren vil siden passe på at disse reglene (relasjonene) alltid er oppfylt.

Det finnes også andre databasemotorer som ikke er relasjonsbasert, eksempler er MongoDB og Firebase.
Vi skal bruke Firebase som vår databasemotor, men teorien vil basere seg mye på typiske relasjonsdatabaser.

Normalformer

I relasjonsdatabaser vil vi vanligvis kreve at alle tabeller er på normalform.

Normalform for en tabell innebærer at følgende krav er oppfylt:

Alle felt i tabellen er atomiske, dvs de kan ikke splittes opp videre uten å miste mening,
f.eks et telefonnummer som 047 55667788 kan splittes opp i retningsnummer:047
og nummer:55667788. Vi kan ikke splitte opp videre uten at feltene mister mening.
Alle tabeller har et entydig nøkkelfelt som identifiserer datasettet. Det skal ikke være felt i tabellen som er avhengige av andre felt enn nøkkelen

En enkel regel for normalformene er slik:

Alle felt i en tabell skal være avhengig av nøkkelen, hele nøkkelen og ingenting anna enn nøkkelen.

For å beskrive databaser tegner vi et klassediagram, vanligvis brukes UML til dette.

Relasjonsdatabaser og sql

Relasjonsdatabaser har en klar fordel over NoSQL, de har et standardisert spørrespråk, kalt Structured Query Language (SQL) - eller Strukturert Spørre-Språk på norsk. Du kan stort sett skrive en spørring i SQL og kjøre den på de aller fleste relasjons-databaser. F.eks en spørring som virker i postgres kan kjøres i mysql, oracle, sybase eller microsoft sin løsning (access eller iis).

Typiske spørringer ser slik ut:

Select navn,alder,klasse from Elev where klasse="3a";

Nosql databaser ( json, objektorientert)

mongodb, firebase

Dokument-databaser hver nøkkel er kobla til en sammensatt datastruktur (et dokument) som kan inneholde flere
(nøkkel:verdi) par eller andre strukturer (lister, tabeller osv).
Graph stores brukes til å lagre nettverk (koblinger mellom ting) f.eks venner på facebook. Graph stores eksempler er Neo4J og Giraph.
Nøkkel-Verdi stores er de enkleste NoSQL databasene. Hvert element er lagra som et (nøkkel:verdi) par, noen lar
verdien ha forskjellig type (Redis og Firebase).
Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.

The Benefits of NoSQL

Under er forklaringen fra et firma som lager noSQL databaser på fordelene med NoSQL: (en smule biased)

When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:

Large volumes of rapidly changing structured, semi-structured, and unstructured data
Agile sprints, quick schema iteration, and frequent code pushes
Object-oriented programming that is easy to use and flexible
Geographically distributed scale-out architecture instead of expensive, monolithic architecture

Valg av databasetype

relasjoner mellom data, enkle datastrukturer, osv

Projects where SQL is ideal:

logical related discrete data requirements which can be identified up-front
data integrity is essential
standards-based proven technology with good developer experience and support.

Projects where NoSQL is ideal:

unrelated, indeterminate or evolving data requirements
simpler or looser project objectives, able to start coding immediately
speed and scalability is imperative.

In the case of our book store, an SQL database appears the most practical option — especially when we introduce ecommerce facilities requiring robust transaction support. In the next article, we’ll discuss further project scenarios, and determine whether an SQL or NoSQL database would be the best solution.

Google ansatte med tanker om noSQL vs SQL

"The original API of Spanner provided NoSQL methods for point lookups and range scans of individual and interleaved tables. While NoSQL methods provided a simple path to launching Spanner, and continue to be useful in simple retrieval scenarios, SQL has provided significant additional value in expressing more complex data access patterns and pushing computation to the data."

En vurdering av mongoDB

A couple of years ago, we moved our code base to a monorepo, which helped us scale tremendously in terms of code reuse and overall speed of development. We are extremely proud of our ability to run a resilient service that has 99.99% availability with zero downtime upgrades.

From the beginning of this journey, I made a decision to go all-in on JavaScript as our default coding language. The most important reason for this was that I wanted to hire full-stack developers who could work on every aspect of the product, so we chose Angular.js for UI, Node.js for API, and a schema-less JSON database (NoSQL MongoDB). We made all technology decisions based on this one philosophy (another blog coming about what I learned and why I am no longer a fan of full-stack development) and it worked beautifully...for a while.

It Started With Small Problems... Even though we had the ability to add features at a lightning fast pace, we started seeing occasional downtimes which always seemed to come down to MongoDB. For instance:

We were very happy to have 24x7 availability with primary and secondary instances of MongoDB. However, our performance suddenly deteriorated one day and retrieval started taking more than a second per document. We tried using many tools and profilers, but could not figure out what was happening. Finally, we rebuilt a new server, switched that over as primary, and rebuilt our secondary. Retrieval times dropped to 150ms again. This is still an unsolved mystery! Our Mongo instance reached 4TB and we were proud of our growing adoption. Due to the lack of tooling around managing large DBs, we relied on indexes to keep the search times low. When NoSQL DBs first became popular, there was no way to create uniqueness, so these features were built as an afterthought. Some of the bloating of our MongoDB was actually due to indexes, but rebuilding them was primitive and the entire DB would lock down. At one point, we needed to reboot our DB server — and it took MongoDB four hours to come back online. This led to an extended downtime for our service, and we had very little visibility into the MongoDB process and status. And Then Came the Knockout Punch! The biggest advantage — and disadvantage — of MongoDB is that it has a flexible schema. This means that documents in the same collection (AKA "table" in the old world) do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data. In a nutshell, there are no strict schema rules and this opens it up to a lot of cowboy tinkering.

While many developers love the flexibility, it also puts a very high degree of responsibility on their shoulders to get things right.

For example, let's consider a simple schema that stores information about a Git repository:

Field name added on provider 12/1/2012 repoOrg 12/1/2012 repoName 12/1/2012 isPrivate 7/17/2014 hasTeams 2/23/2016 As you could guess, the schema is updated over a period of time as fields are added to meet new requirements of an evolving product. This means that depending on when a repository was added to this document, it may or may not have the isPrivate and hasTeams fields. Our backend and frontend services needed to handle both cases gracefully, which led to code like this:

if exists(repo.hasTeams) and repo.hasTeams === true
{
 do something
} else {
 do something
}

In every single place where repo.hasTeams is used, we needed to add this code. With many microservices, many schemas and 40+ developers adding these blocks all over the place, our codebase was starting to look ugly. Also, every time we saw failures across our system, it was always a spectacular crash with no easy way to recover. I would wager that 90% of our crashes were due to the fact that some piece of code expected a field that didn't exist for that document. Internal brainstorming brought up the idea of building a schema validator and all sorts of hacks, but isn't this what a database should provide out-of-the-box?

One big black mark against Mongo (or any equivalent NoSQL database)!

The straw that broke the camel's back was when we introduced a critical field that absolutely needed to be present for each document in our most important collection. To ensure that every document included the field, we had to retrieve every single document one by one, update it, and then put it back. With millions of documents in the collection, this process caused the database performance to degrade to an unacceptable degree and we had to accept 4+ hours of downtime.

And that's when I decided that NoSQL wasn't going to work for us. To each his own, but we were done struggling with it and causing our customers (and ourselves) unnecessary heartache.

Postgres to the Rescue! After this last incident, which happened about a year ago, we migrated to PostgreSQL. I can explain the step-by-step process of migration in another blog if you're interested. We have no regrets, and the following factors have greatly improved our availability and resiliency:

Postgres has a strongly typed schema that leaves very little room for errors. You first create the schema for a table and then add rows to the table. You can also define relationships between different tables with rules so that you can store related data across several tables and avoid data duplication. All this means someone on the team can act as a database architect and control the schema which acts as a standard for everyone else to follow. You can change tables in PostgreSQL without requiring to lock it for every operation. For example, you can add a column and set the value to NULL quickly without locking the entire table. Postgres also supports JSONB, which lets you create unstructured data — but with data constraint and validation functions to help ensure that JSON documents are more meaningful. The folks at Sisense have written a great blog with a detailed comparison of Postgres vs. MongoDB for JSON documents. Our database size reduced by 10x since Postgres stores information more efficiently and data isn't unnecessarily duplicated across tables. As was shown in previous studies, we found that Postgres performed much better for indexes and joins, and our service became faster and snappier as a result. We have been very happy with Postgres since we migrated and we are no longer struggling with managing our database. As a result, we have seen our NPS go up significantly as customers are happier with a platform that "always works."

Modellering

Uavhengig av hvilken databsetype du har valgt for prosjektet ditt bør du lage en databasemodell.

Modellen under viser tre tabeller: Gruppe, Medlem og Elev. Nøkkelfeltet i hver tabell er markert med grønn runding.
Tabellen på mange-sida av en kobling er markert med «» (en diamant).
Fremmednøkler er markert med gul firkant.

Databaser