Jed Rembold
February 19, 2025
{ "OrderID": 100,
"OrderItems": [
{
"sku": 1,
"price": 10,
"quantity": 1,
"name": "Thingmajig"
},
{
"sku": 2,
"price": 25,
"quantity": 2,
"name": "Whatchamacalit"
}],
"CustomerID": 5,
"CustomerName": "Jed Rembold",
"OrderDate": "2022-11-09" }
| OrderID | OrderItems | CustomerID | CustomerName | OrderDate |
|---|---|---|---|---|
| 100 |
|
5 | Jed Rembold | 2022-11-09 |
| OrderID | Sku | Price | Quantity | ProductName | CustomerID | CustomerName | OrderDate |
|---|---|---|---|---|---|---|---|
| 100 | 1 | 50 | 1 | Thingmajig | 5 | Jed Rembold | 2022-11-09 |
| 100 | 2 | 25 | 2 | Whatchamacalit | 5 | Jed Rembold | 2022-11-09 |
| OrderID | ItemNum | Sku | Price | Quantity | ProductName | CustomerID | CustomerName | OrderDate |
|---|---|---|---|---|---|---|---|---|
| 100 | 1 | 1 | 50 | 1 | Thingmajig | 5 | Jed Rembold | 2022-11-09 |
| 100 | 2 | 2 | 25 | 2 | Whatchamacalit | 5 | Jed Rembold | 2022-11-09 |
| OrderID | CustomerID | CustomerName | OrderDate |
|---|---|---|---|
| 100 | 5 | Jed Rembold | 2022-11-09 |
| OrderID | ItemNum | Sku | Price | Quantity | ProductName |
|---|---|---|---|---|---|
| 100 | 1 | 1 | 50 | 1 | Thingmajig |
| 100 | 2 | 2 | 25 | 2 | Whatchamacalit |
ProductName depends on
SkuCustomerName depends on
CustomerID| Sku | ProductName |
|---|---|
| 1 | Thingmajig |
| 2 | Whatchamacalit |
| CustomerID | CustomerName |
|---|---|
| 5 | Jed Rembold |
| OrderID | ItemNum | Sku | Price | Quantity |
|---|---|---|---|---|
| 100 | 1 | 1 | 50 | 1 |
| 100 | 2 | 2 | 25 | 2 |
| OrderID | CustomerID | OrderDate |
|---|---|---|
| 100 | 5 | 2022-11-09 |

Think back to the Spotify modeling that we were playing around with last week
Suppose you have the following information you need to store for each track:
| Info | Description |
|---|---|
| Track Name | The name of the track |
| Artists | Comma separated list of contributing artists |
| Album | The album this specific track appears on |
| Duration | The duration of the track in seconds |
| Genre | The general genre of the track |
| Track Num | The placement of this track on the album |
Your task is to model this information into purely tables of 3NF
| TName | Artists | Album | Duration | Genre | TNum |
|---|---|---|---|---|---|
| Don’t Stop Believin’ | Journey | Greatest Hits | 249 | Rock | 2 |
| Don’t Stop Believin’ | Journey | Escape | 251 | Rock | 1 |
| Fifteen | Taylor Swift | Fearless (Taylor’s Version) | 294 | Country Pop | 2 |
| Under Pressure | Queen,David Bowie | Greatest Hits | 238 | Rock | 7 |
Postgres will automatically index any column that is a primary
key or which has the UNIQUE
constraint
You can choose to set up indexes on other columns as well, but do so outside of the table creation
CREATE INDEX |||index name||| ON |||table name||| (|||column|||);If you decide you want to remove an index, you do so using the index name:
DROP INDEX |||index name|||;How can we objectively test this?
EXPLAIN keyword to give
you information about what the database is doing in the backgroundEXPLAIN ANALYZE will also give you
timing information about how long it took a query to runEXPLAIN always comes at the start of
your query
EXPLAIN ANALYZE
SELECT * FROM |||table name|||
WHERE |||condition|||;Other SQL variants have their own versions, but most have some method of getting information back about execution time or what is being done under the hood
EXPLAIN ANALYZE reports to you the time
it took the server to process the query, not necessarily the time it
took your client to finish getting and rendering the response!
EXPLAIN ANALYZE

It is frequently the case that values in a particular table column all belong to a smaller subset of categories or options
With current methods, if you want to compare some sort of aggregate between those categories or options, you need to do it in multiple queries:
SELECT AVG(age)
FROM voters
WHERE party = 'D'
SELECT AVG(age)
FROM voters
WHERE party = 'R'
This rapidly becomes intractable if you want to compare across many categories
GROUP BY commandGROUP BY gathers together all rows with
matching values from a particular column
DISTINCT on a columnSELECT |||grouped column|||
FROM |||table name|||
GROUP BY |||grouped column|||;Groups by themselves are not that useful, since we already had
DISTINCT
The prime use-case of GROUP BY is to
be able to run aggregates across all potential groups
simultaneously for comparison
SELECT
|||grouped column|||,
min(|||some other column|||),
avg(|||some other column|||)
FROM |||table name|||
GROUP BY |||grouped column|||;Causes any aggregate function to just aggregate over the smaller, group tables
Looking back at the cereal table, how can we answer:
DISTINCT or
UNIQUE, you can group by multiple
columnsSELECT
|||grouped col1|||,
|||grouped col2|||,
min(|||other col|||),
avg(|||other col|||)
FROM |||table|||
GROUP BY |||grouped col1|||, |||grouped col2|||
WHERE
WHERE actions happen before any
aggregates can be computedHAVING
HAVING filters take place
after groups and aggregates have been computedSELECT
|||grouped col|||
min(|||other col|||),
avg(|||other col|||)
FROM |||table|||
GROUP BY |||grouped col|||
HAVING min(|||other col|||) > 50;

jsonjson types:
json: stores data as text
jsonb: stores data in a binary format
jsonb is probably preferable
unless you really need what the base json
type offersJSON is frequently given with the outermost type being a list, and then each record inside separated by commas
Postgres will not innately be able to import this type of document
Instead, Postgres COPY works best when
the JSON is a newline delimited JSON file (sometimes called jsonlines),
where each new record is on a single line, and there is no surrounding
[]
COPY |||table name|||
FROM |||your newline json file|||.jsonWant to follow along? A jsonlines file is here!
You can drill down and select specific values from
json or jsonb
types using two different syntaxes:
raw_json -> 'key'/index where:
raw_json['key'/index] with the same
conventions as aboveThese can be “chained” to “drill down” to the desired info
raw_json -> 'siblings' -> 0 -> 'name'These selections return more
json/jsonb
objects! Convert as needed
Using ->> will instead return a
TEXT type object
Recall that JSON does not enforce a strict schema, so some entries may have keys or values that others are missing
It can thus be useful to check for the existence of certain keys or values
You can check if an entire chunk is contained within a block of
JSON using @>:
raw_json @> |||sub-snippet of json|||You can also check if a key or value shows up in the top
level of the JSON block using ?:
raw_json ? |||top-level key|||While you can create a default B-tree index for
jsonb columns, they generally wouldn’t be
very useful
Instead, we can use a different type of index called a generalized inverted index, or GIN
This type of index works much better when the data is comprised of nested composite items (we’ll see it again when we look at text searching)
Just add USING GIN to your normal
command to make an index
CREATE INDEX |||my index||| ON |||mytable||| USING GIN (|||colname|||);DATE: holds a single individual dayTIME: holds a single individual timeTIMESTAMP or variants with
TIMESTAMPTZ: holds a combination of date and
time, along with a potential time zoneINTERVAL: holds a duration of timeextract( part FROM |||datetime_value||| )date_part( |||part|||, |||datetime_value||| )DOUBLE PRECISION value
of whatever part was requested| text | Description |
|---|---|
| century | What century the date is in. 1st century starts 0001-01-01★ |
| day | What day of the month |
| decade | The year divided by 10 |
| dow | The day of the week (0-6, starting with Sunday) |
| doy | The day of the year |
| epoch | Number of seconds since 1970-01-01 |
| hour | The current hour (0-23) |
| microseconds | The number of microseconds |
| text | Description |
|---|---|
| milliseconds | The number of milliseconds |
| minute | The minute |
| month | The month (1-12) |
| quarter | What quarter of the year (1-4) |
| second | The number of seconds |
| timezone | The timezone offset in seconds |
| timezone_hour | The timezone offset in hours |
| week | What week of the year. ISO weeks start on Monday |
| year | The year |
★ – If you disagree with this, please write your complaint to: Pope, Cathedral Saint-Peter of Roma, Vatican.
make_date( |||year|||, |||month|||, |||day|||):
Returns a new DATE type valuemake_time( |||hour|||, |||minute|||, |||seconds||| ):
Returns a new TIME type value (with no
timezone)make_timestamptz(|||year|||,|||month|||,|||day|||,|||hour|||,|||minute|||,|||second|||,|||time zone|||):
Returns a new TIMESTAMPTZ type valuemake_timestamp and
make_interval also existDATE type values will
give just an INT (in days)TIMESTAMP type values
will give an INTERVAL, with the biggest
“unit” in daysage() function can
smooth over both and give units larger than days
age( datetime1, datetime2 ): Subtracts
datetime2 from datetime1justify_interval( interval ), which breaks
intervals into divisions that don’t exceed a categories max
| function | description |
|---|---|
current_date |
Returns the current date |
current_time |
Returns the current time with timezone |
localtime |
Returns the current time without timezone |
current_timestamp★ |
Returns the current date and time with timezone |
localtimestamp |
Returns the current date and time without timezone |
★ – Postgres also offers the shorter
now() function to do the same
thing
SHOW timezone;SELECT * FROM pg_timezone_abbrevs;SELECT * FROM pg_timezone_names;Change your postgressql.conf file,
which controls your Postgres server. Only recommended if you have
permanently moved elsewhere and the database time zone has not updated
appropriately.
Set future queries in a single session to be from a new timezone:
SET timezone TO time_zone_name_or_abbrv;
localtime or
localtimestamp report!Transform a single query to be reported in a different time zone:
SELECT |||datetime column||| AT TIME ZONE |||tz name or abbr|||
FROM |||table name|||;