第1页
Making Data Dance
@barrettclark
第2页
Harness Heretical Antipatterns
For Fun And Profit
第3页
My Resume
第4页
}
My Resume
DATA + +
第5页
ActiveRecord
第6页
Raw SQL
• Complicated joins • Deeper database functionality
第7页
But database portability?!
第8页
https://flic.kr/p/eg4hih
第9页
It just doesn’t work like that.
Database Portability Pipe Dream
第10页
You chose the ___ database for a reason
Use it.
第11页
Caveats
• 500ms • Database connection pool
第12页
Let’s get our hands dirty
第13页
Window Functions
第14页
A window function performs a calculation across a set of table rows that are somehow related to the
current row.
–PostgreSQL 9.4.0 Documentation
第15页
Fold data from other rows into this row.
–Barrett Clark
第16页
Window Function Greatest Hits
•lead() •lag() •first_value() •last_value() •row_number()
第17页
id fruit 1 apple 2 apple 3 apple 4 pear 5 apple 6 pear 7 pear 8 pear 9 pear 10 banana
第18页
SELECT *, LEAD(id) OVER(), LAG(id) OVER() FROM fruits;
id | fruit | lead | lag
----+--------+------+-----
1 | apple | 2 |
2 | apple | 3 | 1
3 | apple | 4 | 2
4 | pear | 5 | 3
5 | apple | 6 | 4
6 | pear | 7 | 5
7 | pear | 8 | 6
8 | pear | 9 | 7
9 | pear | 10 | 8
10 | banana |
|9
(10 rows)
http://sqlfiddle.com/#!15/230c3b/2
第19页
A window function performs a calculation across a set of table rows that are somehow related to the
current row.
–PostgreSQL 9.4.0 Documentation
第20页
https://flic.kr/p/7KGqEY
第21页
SELECT id, fruit, FIRST_VALUE(id) OVER(PARTITION BY fruit ORDER BY id)
FROM fruits;
id | fruit | first_value
----+--------+-------------
1 | apple |
2 | apple |
3 | apple |
5 | apple |
10 | banana |
4 | pear |
6 | pear |
7 | pear |
8 | pear |
9 | pear |
(10 rows)
http://sqlfiddle.com/#!15/230c3b
第22页
SELECT *, FIRST_VALUE(id) OVER(PARTITION BY fruit ORDER BY id), ROW_NUMBER() OVER(PARTITION BY fruit ORDER BY id)
FROM fruits;
id | fruit | first_value | row_number
----+--------+-------------+------------
1 | apple |
1| 1
2 | apple |
1| 2
3 | apple |
1| 3
5 | apple |
1| 4
10 | banana |
10 |
4 | pear |
4| 1
6 | pear |
4| 2
7 | pear |
4| 3
8 | pear |
4| 4
9 | pear |
4| 5
(10 rows)
http://sqlfiddle.com/#!15/230c3b
第23页
Practical Example
Chat App
第24页
SELECT id, message, room_id, user_id FROM messages ORDER BY id;
id |
message
| room_id | user_id
----+-----------------------------+---------+---------
1 | Leaving for the airport now |
1 | 1337
2 | Flight just landed!
| 1 | 1337
3 | waiting on an uber
| 2 | 1337
4 | omw y'all!
| 1 | 1337
9 | hello, world
| 1 | 27
(5 rows)
第25页
Look ahead
SELECT id, message, room_id, user_id, LEAD(user_id) OVER(PARTITION BY room_id ORDER BY room_id, id) AS next_user_id
FROM messages ORDER BY 3, 1;
Defines the groups of rows to evaluate against the current row
第26页
SELECT id, message, room_id, user_id, LEAD(user_id) OVER(PARTITION BY room_id ORDER BY room_id, id) AS next_user_id
FROM messages ORDER BY 3, 1;
id |
message
| room_id | user_id | next_user_id
----+-----------------------------+---------+---------+--------------
1 | Leaving for the airport now |
1 | 1337 |
2 | Flight just landed!
| 1 | 1337 |
4 | omw y'all!
| 1 | 1337 |
9 | hello, world
| 1 | 27 |
3 | waiting on an uber
| 2 | 1337 |
(5 rows)
第27页
12 3 4 5
SELECT id, message, room_id, user_id, LEAD(user_id) OVER(PARTITION BY room_id ORDER BY room_id, id) AS next_user_id
FROM messages ORDER BY 3, 1;
ProTip™
第28页
class Message < ActiveRecord::Base def self.window_example sql = <<-SQL SELECT id, message, room_id, user_id, LEAD(user_id) OVER(PARTITION BY room_id ORDER BY id) AS next_user_id FROM messages ORDER BY 3, 1; SQL connection.execute(sql) end
end
Returns array of Message objects with additional calculated fields
第29页
Too anti pattern for you?
第30页
module PGConnection
def conn
config = YAML.load_file(File.open('config/database.yml'))['development']
@conn ||= PG.connect(
:dbname => config['database'],
:user
=> config['username'],
:password => config['password'],
:host
=> config['host'] || 'localhost'
)
end
end
class Antipattern extend PGConnection
def self.window_example
sql = <<-SQL
SELECT id, message, room_id, user_id,
LEAD(user_id) OVER(PARTITION BY room_id ORDER BY id) AS next_user_id
FROM messages
ORDER BY 3, 1;
SQL
conn.exec(sql).values
end end
Returns array of arrays of values
第31页
Models do not have to be backed by ActiveRecord.
Or even persisted in the database.
第32页
Questions We Can Now Answer
• When did something change? • When did someone leave a place?
• How long did each thing last? • How long did people stay in which places?
• Enter / Exit events
第33页
Subqueries
第34页
Subquery
• Subquery to filter and group • Subquery for a field value • Subquery in a join
第35页
Base Query
第36页
SELECT id, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor
FROM beacon_readings ORDER BY 2, 5;
“For a given phone, pull in the next minor field”
第37页
When did something change?
第38页
SELECT major, minor, next_minor, COUNT(*) FROM (
SELECT id, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp) AS next_minor
FROM beacon_readings ORDER BY 2, 5 ) AS beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor;
第39页
Yo dawg, I heard you like queries
第40页
SELECT major, minor, next_minor, COUNT(*) FROM (
SELECT id, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp) AS next_minor
FROM beacon_readings ORDER BY 2, 5 ) AS beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor;
第41页
major | minor | next_minor | count
-------+-------+------------+-------
1| 0|
1 | 18
1| 0|
2| 3
1| 0|
3 | 23
1| 0|
4| 7
1| 1|
0 | 19
1| 1|
2 | 59
1| 1|
3 | 158
1| 1|
4 | 44
1| 2|
0| 5
1| 2|
1 | 59
1| 2|
3 | 85
1| 2|
4 | 40
1| 3|
0 | 21
1| 3|
1 | 154
1| 3|
2 | 85
1| 3|
4 | 52
1| 4|
0| 6
1| 4|
1 | 49
1| 4|
2 | 42
1| 4|
3 | 45
(20 rows)
第43页
Refactor
第44页
https://flic.kr/p/9tT9Yh
第45页
CTE
Common Table Expression
第46页
SELECT major, minor, next_minor, COUNT(*) FROM (
SELECT id, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp) AS next_minor
FROM beacon_readings ORDER BY 2, 5 ) AS beacon_readings_lead WHERE major = 1 AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor;
第47页
WITH beacon_readings_lead_lag AS ( SELECT id, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY 2, 5
) SELECT major, minor, next_minor, COUNT(*) FROM beacon_readings_lead_lag WHERE major = 1
AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor;
第48页
WITH interval_query AS ( SELECT (ts ||' hour')::INTERVAL AS hour_interval FROM generate_series(0,23) AS ts
), time_series AS ( SELECT DATE_TRUNC('hour', NOW()) + INTERVAL '60 min' * ROUND(DATE_PART('minute', NOW()) / 60.0) interval_query.hour_interval AS start_time FROM interval_query
), time_intervals AS ( SELECT start_time, start_time + '1 hour'::INTERVAL AS end_time FROM time_series ORDER BY start_time
), reading_counts AS ( SELECT f.start_time, f.end_time, br.minor, COUNT(DISTINCT br.phone) readings FROM beacon_readings br RIGHT JOIN time_intervals f ON br.reading_timestamp >= f.start_time AND br.reading_timestamp < f.end_time AND br.major = 1 GROUP BY f.start_time, f.end_time, br.minor ORDER BY f.start_time, br.minor
) SELECT * FROM reading_counts;
第50页
Repetition Repetition RepetitionRepetition
Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition
Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition Repetition
第51页
View
Stored Query
第52页
CREATE OR REPLACE VIEW beacon_readings_lead_view AS
SELECT id, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor
FROM beacon_readings ORDER BY 2, 5 ;
第53页
class CreateLeadView < ActiveRecord::Migration def up execute <<-SQL.strip_heredoc CREATE OR REPLACE VIEW beacon_readings_lead_view AS SELECT ID, phone, major, minor, reading_timestamp, LEAD(minor) OVER ( PARTITION BY phone ORDER BY phone, reading_timestamp ) AS next_minor FROM beacon_readings ORDER BY phone, reading_timestamp; SQL
end
def down sql = "DROP VIEW IF EXISTS beacon_readings_lead_view;" execute(sql)
end end
第54页
SELECT major, minor, next_minor, COUNT(*) FROM beacon_readings_lead_view WHERE major = 1
AND minor != next_minor GROUP BY major, minor, next_minor ORDER BY major, minor, next_minor;
第55页
Business Logic
In The Database?!?
😱😱😱
第56页
YOU
Get to decide where your code runs
第57页
https://flic.kr/p/2MW7SY
第58页
My query is too slow 😢
第59页
Do the heavy lifting in a separate process
第60页
Materialized View
Stored Query And Data
第61页
CREATE MATERIALIZED VIEW beacon_readings_lead_lag_mv AS SELECT * FROM beacon_readings_lead_lag_view WHERE reading_timestamp >= NOW() - '1 day'::INTERVAL;
CREATE INDEX index_beacon_readings_mv_on_major ON beacon_readings_lead_lag_mv USING btree(major ASC NULLS LAST);
CREATE INDEX index_beacon_readings_mv_on_phone ON beacon_readings_lead_lag_mv USING btree(phone COLLATE "default" ASC NULLS LAST);
CREATE INDEX index_beacon_readings_mv_on_reading_timestamp ON beacon_readings_lead_lag_mv USING btree(reading_timestamp ASC NULLS LAST);
第62页
DATA
https://flic.kr/p/khS6jn
第63页
REFRESH MATERIALIZED VIEW CONCURRENTLY beacon_readings_lead_lag_mv;
New in 9.4!
第64页
namespace :db do namespace :heroku do # https://devcenter.heroku.com/articles/scheduler desc 'Update the materialized view(s)' task :update_materialized_view => :environment do sql = 'REFRESH MATERIALIZED VIEW beacon_readings_lead_lag_mv;' ActiveRecord::Base.connection.execute(sql) end end
end
第65页
namespace :db do namespace :heroku do desc "Load the PROD database from Heroku to the local dev database" task :load => :environment do if Rails.env == 'development' Bundler.with_clean_env do config = Rails.configuration.database_configuration[Rails.env] system <<-CMD heroku pgbackups:capture --expire curl -o latest.dump `heroku pgbackups:url` pg_restore --verbose --clean --no-acl --no-owner -h localhost \ -U #{config["username"]} -d #{config["database"]} latest.dump rm -rf latest.dump CMD end end end end
end
https://gist.github.com/barrettclark/c94467e3872d16b3f8b0
第66页
https://flic.kr/p/bVUL7
第67页
Subquery For A Field Value
第68页
SELECT
id,
phone,
( SELECT MIN(id)
FROM beacon_readings_lead_lag_mv
WHERE
major
= br1.major
AND minor = 0
AND phone = br1.phone
AND id >= br1.id
) AS session_close_id,
minor,
next_beacon_minor,
reading_timestamp
FROM beacon_readings_lead_lag_mv br1
WHERE major = 2
AND minor != next_beacon_minor
ORDER BY phone, reading_timestamp;
第69页
SELECT
id,
phone,
( SELECT MIN(id)
FROM beacon_readings_lead_lag_mv
WHERE
major
= br1.major
AND minor = 0
AND phone = br1.phone
AND id >= br1.id
) AS session_close_id,
minor,
next_beacon_minor,
reading_timestamp
FROM beacon_readings_lead_lag_mv br1
WHERE major = 2
AND minor != next_beacon_minor
ORDER BY phone, reading_timestamp;
第71页
Other Useful Things
第72页
Datatypes
• Array • DateRange and TSRange • JSON • JSONB (9.4) • UUID
http://edgeguides.rubyonrails.org/active_record_postgresql.html
第73页
Future You Will Thank You
第74页
Postgres
• Heroku’s Postgres.app (ships with PostGIS baked in) • Current Version = 9.4
• Database Tools • pgAdmin3 (free) • Navicat (several offerings, free trials)
• Visual Queryplan: http://explain.depesz.com/
第76页
Recap
第77页
It’s OK to write SQL
第78页
Refactor
• Common Table Expression (CTE) • View • Materialized View
第79页
Not saying do everything in the database
第80页
PostgreSQL is awesome
第81页
Thank You.
第82页
Barrett Clark
@barrettclark