第1页
PostgreSQL & Docker
digoal.zhou 11/29/2014
第2页
个人介绍
网名: 德哥, digoal
就职(Nasdaq: MOBI), 08年开始接触PostgreSQL, 负责集团公司DBA团队, 2010年完成去O化.
PostgreSQL中国社区成员之一, 担任PostgreSQL中文社区管理板块版主.
闲暇时间在PostgreSQL社区, Stackoverflow社区帮助网友解决PostgreSQL相关的问题;
跟踪git.postgresql.org, commitfest.postgresql.org, GoogleSoC, github等与PostgreSQL相关的项目进展;
喜欢撰写一些技术文档 (site: http://blog.163.com/digoal@126), (包括PostgreSQL内核,开发,管理,优化,安全,架构,水平扩展;
主机,存储,编程,操作系统,Oracle,MySQL,MongoDB,Redis,虚拟化等技术文章1800余篇.)
组织一些公益培训, 社区会议, 区域性会议等.
第3页
目录
传统数据库部署缺陷
数据库in container
第4页
传统数据库部署的缺陷
传统的数据库部署管理成本较高
例如升级版本, 扩容, 更换硬件, 迁移
服务器更换, Pg软件需重新部署.
新增节点(如容灾节点, 只读节点)需要重新部署环境.
单机资源浪费, 迁移成本高.
主机 Pg软件
主机 Pg软件
主机 Pg软件
主机 Pg软件
主机 Pg软件
共享存储 数据文件
复制
容灾存储 数据文件
本地存储 数据文件
复制
本地存储 数据文件
第5页
传统数据库部署的缺陷
虚拟机可以解决软件部署的问题
但是
虚拟机镜像较大
性能损耗较大
虚拟机 镜像
主机
(计算节点) 虚拟机
主机
(计算节点) 虚拟机
主机
(计算节点) 虚拟机
共享存储 数据文件
复制
容灾存储 数据文件
主机 虚拟机
主机 虚拟机
本地存储 数据文件
复制
本地存储 数据文件
第6页
传统数据库部署的缺陷
单机部署多实例
资源争抢现象
第7页
传统数据库部署的缺陷
虚拟化
http://blog.163.com/digoal@126/blog/static/163877040201322815029796/
传统虚拟化技术的问题
性能损耗数据举例
同一台主机的测试结果
KVM
TPS: 14514
真机
TPS: 23657
38.65%
第8页
Docker
解决资源争抢问题
解决部署问题
解决虚拟机性能损耗大的问题
依赖技术
cgroup
chroot
namespace
system pid
docker server
pid
container pid
process process
container pid
第9页
Control Groups
资源控制组
blkio
读写吞吐, IOPS, 队列, 合并, 等待, ...
cpuset
核亲和
memory
限制, 交换比例, OOM控制, ....
hugetlb
大页使用限制
device
.......................
第10页
chroot
隔离本地, 容器间的路径, 设备, 网络设备, 进程, 用户空间等.
path namespace
net namespace
process tree
docker server
pid
container pid
process process
container pid
第11页
namespace
网络支持模式
container
host
none
bridge
container pid
v1
peer
bridge 模式
bridge
vp1
eth0
第12页
Docker 应用举例
主要组件
image (read only, running environment)
registry (image storage)
docker (server daemon, client manage)
container (process running based on image)
volume (dynamic data storage)
network (bridge(peer), host, container)
gateway (sometimes for bridge mode network)
分布式文件系统 (shared dynamic data storage)
第13页
Docker 应用举例
整体架构
基础设施1 服务器池
主机 本地 镜像
TCP连接 search,pull,push image(s)
Docker registry images
TCP连接 search,pull,push image(s)
基础设施2 服务器池
主机 本地 镜像
共享存储 PGDATA
PGDATA PGDATA PGDATA PGDATA PGDATA
存储网络 快照
归档
流复制 per PGCluster
存储网络
共享存储
PGDATA PGDATA PGDATA PGDATA PGDATA PGDATA
快照
归档
第14页
Docker 应用举例
计算节点架构 交换机(vlan隔离docker(租户))
网关
bond
eth0 eth1
Bridge vp1 br0
Bridge vp2 br0
v1 v2
OVS obr0
eth2 eth3
OVS obr1
交 bond 换
机
(计算节点)服务器
分布式存储
第15页
网关peer peer
netns vx 公网IPx
netns
vxx
192.168.1.1/24
192.168.1.1/24 v1 peer
peer
v3 公网IP3
peer
公网IP4 v4
netns
v2
192.168.1.1/24
peer
gateway server
bridge br2 vp4 vp3 vpx eth1
OVS ob0
vpxx vlanx
vp1 vlan10
vp2 vlan11
eth0
px
p0 vlan10
物理交换机
pxx vlanxx
eth0 192.168.1.2/24
server
eth0 192.168.1.2/24
server
ob0
vlan1 管理IP
p10 vlan11
eth0 192.168.1.2/24
server
ISP
第16页
registry, docker server, manage API
镜像服务
docker server
docker client(manage)
启动docker server: docker -d, --daemon -b, --bridge="" -g, --graph -H, --host --tls --tlsverify ......
office image
image centos:5
image postgres:9.3.5
Docker Server
Docker registry
search image pull image push image
user image
image digoal/centos:5
image digoal/postgres:9.3.5
Docker client (manage)
第17页
container
pull image
run as docker server's child process
office image
image centos:5
image postgres:9.3.5
启动container: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
image centos:5
Docker Server
Docker client (manage)
Docker registry
pull image if local no
user image
image digoal/centos:5
image digoal/postgres:9.3.5
container1 centos:5
container2 centos:5
docker run -t -i centos:5 /bin/bash docker run -t -i centos:5 /bin/bash
第18页
run [opts]
--dns=[]
Set custom DNS servers
-e, --env=[]
Set environment variables
--entrypoint=""
Overwrite the default ENTRYPOINT of the image
--expose=[]
Expose a port from the container without publishing it to your host
-h, --hostname="" Container host name
-i, --interactive=false Keep STDIN open even if not attached
--link=[]
Add link to another container in the form of name:alias
-m, --memory=""
Memory limit (format: <number><optional unit>, where unit = b, k, m or g)
--name=""
Assign a name to the container
--net="bridge|none|host|container:<name|id>"
--rm=false
Automatically remove the container when it exits (incompatible with -d)
第19页
run [opts]
-P, --publish-all=false Publish all exposed ports to the host interfaces
-p, --publish=[]
Publish a container's port to the host
ip:hostPort:containerPort | ip::containerPort | hostPort:containerPort
-t, --tty=false
Allocate a pseudo-TTY
-u, --user=""
Username or UID
-v, --volume=[]
Bind mount a volume (e.g., from the host: -v /host:/container, from Docker: -v /container)
--volumes-from=[] Mount volumes from the specified container(s)
-w, --workdir=""
Working directory inside the container
第20页
run command
process tree
container进程消亡即container停止
system pid
数据建议存储在外部, container内部不建议存储
docker server
pid
container pid
process process
container pid
第21页
制作image
based on base image
[root@localhost ~]# docker run --rm -t -i --name digoal --hostname="postgres" centos:centos6 /bin/bash
[root@postgres /]# yum install -y http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/postgresql93-server-9.3.51PGDG.rhel6.x86_64.rpm \
http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/postgresql93-contrib-9.3.5-1PGDG.rhel6.x86_64.rpm \
http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/postgresql93-9.3.5-1PGDG.rhel6.x86_64.rpm \
http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/postgresql93-libs-9.3.5-1PGDG.rhel6.x86_64.rpm \
http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/postgresql93-pltcl-9.3.5-1PGDG.rhel6.x86_64.rpm \
http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/postgresql93-plperl-9.3.5-1PGDG.rhel6.x86_64.rpm \
http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/postgresql93-plpython-9.3.5-1PGDG.rhel6.x86_64.rpm \
http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgfincore93-1.1.1-1.rhel6.x86_64.rpm \
http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/postgresql93-devel-9.3.5-1PGDG.rhel6.x86_64.rpm
第22页
制作image
[root@localhost ~]# docker commit -a "digoal" -m "this is postgresql on centos6" -p digoal
172.16.3.221:5000/digoal/postgresql:9.3.5
[root@localhost ~]# docker images
REPOSITORY
TAG
IMAGE ID
CREATED
VIRTUAL SIZE
172.16.3.221:5000/digoal/postgresql 9.3.5
d70b9ae9e9ff 11 seconds ago 330.3 MB
[root@localhost ~]# docker push 172.16.3.221:5000/digoal/postgresql:9.3.5
The push refers to a repository [172.16.3.221:5000/digoal/postgresql] (len: 1)
Sending image list
Pushing repository 172.16.3.221:5000/digoal/postgresql (1 tags)
Image 511136ea3c5a already pushed, skipping
Image 5b12ef8fd570 already pushed, skipping
Image 70441cac1ed5 already pushed, skipping
d70b9ae9e9ff: Image successfully pushed
Pushing tag for rev [d70b9ae9e9ff] on {http://172.16.3.221:5000/v1/repositories/digoal/postgresql/tags/9.3.5}
第23页
制作image
based on Dockerfile
新建一个空目录
在空目录中创建Dockerfile
FROM <image>:<tag>
MAINTAINER <name>
RUN ["executable", "param1", "param2"] (exec form)
RUN ["/bin/bash", "-c", "yum install -y postgresql"]
CMD ["/usr/bin/wc","--help"] 可被覆盖
VOLUME ["/data"]
USER daemon
WORKDIR /path/to/workdir
ONBUILD [INSTRUCTION]
............... 例子
http://blog.163.com/digoal@126/blog/static/16387704020141
02711413675/
EXPOSE <port> [<port>...]
ENV <key> <value>
...................
ADD <src>... <dest>
COPY <src>... <dest>
ENTRYPOINT ["executable", "param1", "param2"] (exec
form, the preferred form)
docker build -t digoal/sshd .
docker tag [OPTIONS] IMAGE[:TAG]
[REGISTRYHOST/][USERNAME/]NAME[:TAG]
docker push digoal/sshd
第24页
性能测试
Docker性能测试
postmaster $PGDATA
on host
bridge docker0
eth0
pgbench
Switch
pgbench
container postmaster
v1
$PGDATA on host
vp1
bridge docker0
eth0
Switch
第25页
性能测试
创建数据目录(数据库数据文件存储在外部卷)
[root@localhost data01]# mkdir /data01/pgdata
启动容器, 挂载外部卷
[root@localhost data01]# docker run --rm -t -i --name digoal --hostname="postgres" -v /data01/pgdata:/pgdata
172.16.3.221:5000/digoal/postgresql:9.3.5 /bin/bash
容器内操作, 创建数据目录
# cd /pgdata/
# mkdir pg_root
[root@postgres pgdata]# chown postgres:postgres pg_root
初始化数据库
[root@postgres pgdata]# su - postgres
-bash-4.1$ /usr/pgsql-9.3/bin/initdb -D /pgdata/pg_root -E UTF8 --locale=C -U postgres -W
第26页
性能测试
配置数据库
pg_hba.conf
host all all 0.0.0.0/0 trust
postgresql.conf
listen_addresses = '0.0.0.0'
# what IP address(es) to listen on;
port = 6543
# (change requires restart)
max_connections = 100
# (change requires restart)
superuser_reserved_connections = 13 # (change requires restart)
unix_socket_directories = '.' # comma-separated list of directories
unix_socket_permissions = 0700 # begin with 0 to use octal notation
tcp_keepalives_idle = 60
# TCP_KEEPIDLE, in seconds;
tcp_keepalives_interval = 10
# TCP_KEEPINTVL, in seconds;
tcp_keepalives_count = 10
# TCP_KEEPCNT;
shared_buffers = 2048MB
# min 128kB
maintenance_work_mem = 512MB
# min 1MB
第27页
性能测试
vacuum_cost_delay = 10
# 0-100 milliseconds
vacuum_cost_limit = 10000
# 1-10000 credits
bgwriter_delay = 10ms
# 10-10000ms between rounds
wal_level = minimal
# minimal, archive, or hot_standby
synchronous_commit = off
# synchronization level;
wal_sync_method = fdatasync
# the default is the first option
wal_buffers = 16384kB
# min 32kB, -1 sets based on shared_buffers
wal_writer_delay = 10ms # 1-10000 milliseconds
checkpoint_segments = 128
# in logfile segments, min 1, 16MB each
log_destination = 'csvlog'
# Valid values are combinations of
logging_collector = on
# Enable capturing of stderr and csvlog
log_directory = 'pg_log'
# directory where log files are written,
log_truncate_on_rotation = on
# If on, an existing log file with the
log_rotation_age = 1d
# Automatic rotation of logfiles will
log_rotation_size = 0
# Automatic rotation of logfiles will
第28页
性能测试
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
# terse, default, or verbose messages
log_lock_waits = on
# log lock waits >= deadlock_timeout
log_statement = 'ddl'
# none, ddl, mod, all
autovacuum = on
# Enable autovacuum subprocess? 'on'
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
datestyle = 'iso, mdy'
lc_messages = 'C'
# locale for system error message
lc_monetary = 'C'
# locale for monetary formatting
lc_numeric = 'C'
# locale for number formatting
lc_time = 'C'
# locale for time formatting
default_text_search_config = 'pg_catalog.english'
退出容器
第29页
性能测试
重新启动容器
[root@localhost ~]# docker run -d --name digoal --hostname="postgres" -u postgres -v /data01/pgdata:/pgdata 172.16.3.221:5000/digoal/postgresql:9.3.5 /usr/pgsql-9.3/bin/postgres -D /pgdata/pg_root
(性能略低)
或
[root@localhost ~]# docker run -d --name digoal --hostname="postgres" -v /data01/pgdata:/pgdata 172.16.3.221:5000/digoal/postgresql:9.3.5 su - postgres -c "/usr/pgsql-9.3/bin/postgres -D /pgdata/pg_root"
(性能略好)
第30页
性能测试
进程 tree
[root@localhost ~]# ps -ewf|grep postg
postgres 6386 3216 0 19:20 ? 00:00:00 /usr/pgsql-9.3/bin/postgres -D /pgdata/pg_root
postgres 6425 6386 0 19:20 ? 00:00:00 postgres: logger process
postgres 6427 6386 0 19:20 ? 00:00:00 postgres: checkpointer process
postgres 6428 6386 0 19:20 ? 00:00:00 postgres: writer process
postgres 6429 6386 0 19:20 ? 00:00:00 postgres: wal writer process
postgres 6430 6386 0 19:20 ? 00:00:00 postgres: autovacuum launcher process
postgres 6431 6386 0 19:20 ? 00:00:00 postgres: stats collector process
[root@localhost ~]# ps -ewf|grep docker
root 3216 1 1 17:48 ? 00:00:31 /usr/bin/docker -d --selinux-enabled=false -g /data01/docker
第31页
性能测试
进程 tree
[root@localhost ~]# ps -few|grep postgres
root 6614 3216 0 19:23 ? 00:00:00 su - postgres -c /usr/pgsql-9.3/bin/postgres -D /pgdata/pg_root
postgres 6643 6614 0 19:23 ? 00:00:00 /usr/pgsql-9.3/bin/postgres -D /pgdata/pg_root
postgres 6659 6643 0 19:23 ? 00:00:00 postgres: logger process
postgres 6661 6643 0 19:23 ? 00:00:00 postgres: checkpointer process
postgres 6662 6643 0 19:23 ? 00:00:00 postgres: writer process
postgres 6663 6643 0 19:23 ? 00:00:00 postgres: wal writer process
postgres 6664 6643 0 19:23 ? 00:00:00 postgres: autovacuum launcher process
postgres 6665 6643 0 19:23 ? 00:00:00 postgres: stats collector process
[root@localhost ~]# ps -ewf|grep docker
root 3216 1 1 17:48 ? 00:00:31 /usr/bin/docker -d --selinux-enabled=false -g /data01/docker
第32页
性能测试
获得容器IP
[root@localhost data01]# docker inspect -f {{.NetworkSettings.IPAddress}} digoal
172.17.0.11
第33页
性能测试
容器性能
压力测试1
[root@localhost ~]# /usr/pgsql-9.3/bin/pgbench -M prepared -n -r -f ./test.sql -c 12 -j 4 -T 30 -p 6543 -h 172.17.0.11 -U postgres
postgres
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 4
duration: 30 s
number of transactions actually processed: 2501548
tps = 83379.360813 (including connections establishing)
tps = 83412.391302 (excluding connections establishing)
statement latencies in milliseconds:
0.142535 select 1;
第34页
性能测试
容器性能
压力测试2
http://blog.163.com/digoal@126/blog/static/16387704020141013115219217/
[root@localhost ~]# /usr/pgsql-9.3/bin/pgbench -M prepared -n -r -f ./test0.sql -f ./test1.sql -f ./test2.sql -f ./test3.sql -f ./test4.sql -c 12 -j 4 -T 30 -h 172.17.0.11 -p 6543 -U postgres postgres
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 4
duration: 30 s
number of transactions actually processed: 218251
tps = 7271.434700 (including connections establishing)
tps = 7274.639775 (excluding connections establishing)
第35页
性能测试
本地测试 [root@localhost ~]# docker stop digoal
[root@localhost ~]# su - postgres -c "/usr/pgsql-9.3/bin/postgres -D /data01/pgdata/pg_root" &
压力测试1
[root@localhost ~]# /usr/pgsql-9.3/bin/pgbench -M prepared -n -r -f ./test.sql -c 12 -j 4 -T 30 -p 6543 -h 172.17.42.1 -U postgres
postgres
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 4
duration: 30 s
number of transactions actually processed: 2713456
tps = 90444.831125 (including connections establishing)
tps = 90482.232687 (excluding connections establishing)
statement latencies in milliseconds:
0.131289 select 1;
第36页
性能测试
压力测试2
[root@localhost ~]# /usr/pgsql-9.3/bin/pgbench -M prepared -n -r -f ./test0.sql -f ./test1.sql -f ./test2.sql -f ./test3.sql -f ./test4.sql
-c 12 -j 4 -T 30 -h 172.17.42.1 -p 6543 -U postgres postgres
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 4
duration: 30 s
number of transactions actually processed: 234822
tps = 7826.373962 (including connections establishing)
tps = 7829.520104 (excluding connections establishing)
第37页
性能测试
http://blog.163.com/digoal@126/blog/static/16387704020141013115219217/
同一台主机的测试结果
Docker
TPS: 83379
TPS: 7271
真机
TPS: 90444
TPS: 7826
7.81% 7.09%
第38页
自建registry例子
docker pull registry
mkdir /data01/registry_sto
mkdir /data01/registry_conf
创建配置文件
cd /data01/registry_conf
vi config_sample.yml
https://github.com/docker/docker-registry/blob/master/config/config_sample.yml
修改几处
Docker 公共registry hub.docker.com image storage
issue: '"Digoal`s docker-registry server"'
local: &local
<<: *common
Docker Server
search push pull
storage: local
storage_path: _env:STORAGE_PATH:/registry_sto
search_backend: _env:SEARCH_BACKEND:sqlalchemy
私有registry image storage
sqlalchemy_index_database: _env:SQLALCHEMY_INDEX_DATABASE:sqlite:////tmp/docker-registry.db
第39页
自建registry例子
启动
docker run -d --net="host" -p 5000:5000 -v /data01/registry_conf:/registory_conf -v /data01/registry_sto:/registry_sto -e
DOCKER_REGISTRY_CONFIG=/registory_conf/config_sample.yml -e STORAGE_PATH=/registry_sto -e SETTINGS_FLAVOR=local --name=registry registry:0.8.1
search
curl -X GET http://172.16.3.221:5000/v1/search?q=centos
pull
docker pull 172.16.3.221:5000/centos:centos6
push
docker tag 504a65221a38 172.16.3.221:5000/digoal/centos:centos5
docker push 172.16.3.221:5000/digoal/centos:centos5
第40页
制作sshd镜像例子
Dockerfile :
FROM centos:centos7
MAINTAINER digoal.zhou
RUN yum install -y openssh-server
RUN yum install -y openssh-clients
RUN mkdir /var/run/sshd
RUN echo 'UseDNS no' >> /etc/ssh/sshd_config
# 设置默认密码
RUN echo 'root:Digoal_sshd_1999' | chpasswd
RUN /usr/bin/ssh-keygen -A
# 要在其他主机访问的话, 建议expose出去.
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
docker build -t digoal/sshd .
第41页
制作sshd镜像例子
[root@localhost ~]# docker run -d --name digoal digoal/sshd
486381d4428e917b6572eb1a802972eb576b0fa3731178c2cd055a5def9a02ea
[root@localhost ~]# docker inspect -f '{{.NetworkSettings.IPAddress}}' digoal
172.17.0.13
[root@localhost ~]# ssh root@172.17.0.13
The authenticity of host '172.17.0.13 (172.17.0.13)' can't be established.
ECDSA key fingerprint is 76:34:4f:98:d5:56:cd:2c:e4:f8:9c:14:5a:82:f6:bf.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.17.0.13' (ECDSA) to the list of known hosts.
root@172.17.0.13's password: 输入默认密码
第42页
自定义网桥例子
启动docker server时指定网桥名
-b="" Attach containers to a pre-existing network bridge; use 'none' to disable container networking
brctl addbr br0
ip link set br0 up
ip addr add 172.17.42.1/16 dev br0
brctl delif docker0 eth0
brctl addif br0 eth0
docker -d -b="br0"
第43页
自定义container网络例子
以网桥模式启动容器
--net="bridge"
网桥模式下 container IP 自动分配, 无法指定.
自定义方法 :
net namespace
docker run -i -t --rm --net=none base /bin/bash
63f36fc01b5f
docker inspect -f '{{.State.Pid}}' 63f36fc01b5f
2778
mkdir -p /var/run/netns
ln -s /proc/$pid/ns/net /var/run/netns/$pid
ip link add A type veth peer name B
brctl addif docker0 A
ip link set A up
ip link set B netns $pid
ip netns exec $pid ip link set dev B name eth0
ip netns exec $pid ip link set eth0 up
ip netns exec $pid ip addr add 172.17.42.99/16 dev eth0
ip netns exec $pid ip route add default via 172.17.42.1
第44页
Docker缺陷
如果使用NAT模式, 数据包转发带来的性能损失较大
container
iptables DNAT
external proc
container 自定义IP较繁琐
目前docker仅支持bridge, 不支持OVS接口, 要实现租户网络隔离需要做bridge on OVS.
第45页
Q&A
谢谢
更多PostgreSQL开发,管理,优化方面的资料可参考:
http://blog.163.com/digoal@126/blog/static/16387704020141229159715/
http://blog.163.com/digoal@126/blog/static/163877040201172183022203/
http://blog.163.com/digoal@126
QQ: 276732431
Email: digoal@126.com