第1页
Redis Trouble Shooting
Clark.kang charsyam@naver.com
第2页
Contents
• Redis
– Single Threaded
• Redis Trouble Shooting • Redis Security Issue
第3页
Single Threaded #1
Client #1 Client #2
…… Client #N
Packet #1 Packet #2
Redis Event Loop
I/O Multiplexing Process Command
第4页
Single Threaded #2
• Process only one command at once • If you run long processed command, all other
commands are pending.
– Keys, flushall, flushdb, lua script, MULTI/EXEC
• Redis uses some other threads, but it is only for avoiding fsync call.
第5页
How slow?
Command flashall
Item Count 1,000,000
Time 1000ms(1 second)
第6页
Recommanded Redis Version #1
• Lastest Stable Version
– 3.0.x is also good. – Using after 2.8.13
• There are some differences depending on the version of redis
– config set client-output-buffer-limit is accepted with redis-cli in 2.6.x
– config set client-output-buffer-limit couldn’t use some expression like 1GB, 1MB in 2.8.20
第7页
Memeory Fragmentation #1
• Previous Redis Version that using Jemalloc 3.6.0
– Redis uses just 2.4G but rss is 12G in 2.8.6
第8页
Memeory Fragmentation #2
• Redis version that using after Jemalloc 3.6.0
– 2.8.20 shows less difference Mem Usages and RSS – But you should check RSS.
第9页
Recommanded Client(For Management)
• Using redis-cli
– It is best.
• You can use telent also
– Redis support inline command – Twemproxy doesn’t support inline command – You can’t use some command in old versions.
第10页
If you support Service Team to use Redis
• Check They want cache or Store
– If it is for cache, Turn off SAVE option – Even it is for store, Give proper value for SAVE
• Using Multiple Redis Instances in one physical server. • Using maxmemory option.
第11页
Using Multiple Redis Instances.
• CPU 4 core 32G Memory
– 3 redis Instances each 8G is better than one 26G Redis
Mem: 8G
Mem: 26G
Mem: 8G
Mem: 8G
第12页
Replication
第13页
Redis Replication
• Redis is Single Thread
– Fork for Replication
• Supported Chained replication
– Not supported Multi-Master – To check RSS for Replication
第14页
Replication
• Support Chained Replication
Master 1st Slave 2nd Slave
1st slave is master of 2nd slave
第15页
Replication
Master
Slave
replicationCron
Health check
第16页
Replication
Master
Slave
replicationCron
Health check
第17页
Replication
Master
Slave
replicationCron
When master reruns, Resync with Master
第18页
Mistake: Replication
If master has no data.
Master
Slave
replicationCron
Slave will has no data after resyncing
第19页
Persistent
第20页
RDB/AOF
• RDB/AOF are not related. • RDB
– Snapshot of current memory status – Fork and dump its memory to disk – In Write Heavy System. It can use double memory
• AOF
– Save update(create, update, delete) commands to disk after event loop as redis protocol.
– Disk Sync option affects the performance(default: everysec) – Less Disk IO compared to RDB(Except AOF rewriting)
第21页
If you turn off persistent. But You
can’t avoid fork (migration)
第22页
Migration
第23页
Migration Order
1. Prepare New instance(as B) for old instance(as A) 2. Send “slaveof A ip A port” to B 3. Wait to finish replication
1. Fork and using more memory
4. Turn on writable option for B
1. Config set slave-read-only no
5. Connect clients to B 6. Send “slaveof no one” to B
第24页
Partial Sync
第25页
Redis Replication Mechanism
1. Slave sends sync command to Master 2. Master forks and create RDB 3. After RDB creation, Master sends RDB data to slave 4. While sending RDB data, Master saves new commands
into memory buffer 5. After sending RDB, slave starts loading RDB 6. After loading RDB, Master sends memory buffered
data to slave
第26页
Problem of Redis Repliaction
• When connection is broken between master and slave
– Slave try to start FULL sync. – But Full sync is very expensive.
第27页
Partial Sync
• If there are some small difference that can be recoverd by memory buffer.
– Slave can request “PSYNC” – And master just send small memory buffer to slave. – And finish syncing.
• But if master is changed as another server.
– Only FULL sync is possible.
第28页
Trouble Shooting
第29页
T Service
• Condition
– Only for cache
• Redis Configuration
– stop-writes-on-bgsave-error yes
• Failure
– Write is forbidden after RDB creating failure – Read are OK.
• Solution
– Config set stop-writes-on-bgsave-error no
第30页
G Service
• condition
– Only for cache – Using default options
• Redis Conf
– SAVE 900 1 – SAVE 300 10 – SAVE 60 10000
• Failure
– Performance degradation because of Much Disk IO in short time
• Solution
– config set SAVE “” – Removing SAVE option
第31页
S Service
• Condition
– Some cache and some storing data – Using one instance and using 28GB data in 32GB machine – And It has disk failure also.
• Failure
– Latency is high because of Using Swap memory – It spends much time for creating RDB
• Normally, it takes 5~6 minutes for 10G memory • It took over 8 hours of dumping 28G
• Solution
– Droping server – Sometime, it is better to drop data than dragging on failure.
第32页
P Service #1
• Condition
– Using AOF for store – 8 instances in 256GB each instance using 26GB
• Failure
– All 8 redis instances tried to start AOF Rewrite – Much Disk IO and Using much memory – They are start to service same time. So AOF rewriting timing
also similar
• Solution
– Stop AOF Rewrite and manage it with batch
第33页
P Service(#2) – Not actually Failure
• Condition
– All Redis Master/Slave servers connection are broken because of Network issue.
• Failure Possibility
– If network is recoverd, all redis slaves will try to sync with master – All redis masters will fork and using much memory. – It can trigger Big failure of Service
• Solution
– Disconnection all M/S connection using “slaveof no one” – And recovering network. – And make replication connection one by one in one physical serve.
第34页
Replication Faiulre
• Condition
– 20GB data – Some write operations
• Failure
– Failing of Master/Slave Replication.
• Solution
– Checking client-output-buffer-limit – Default “client-output-buffer-limit slave 256mb 64mb 60”
• Hard limit 256mb • Soft limit 64mb for 60 seconds.
– Default is ok for 10G data, but If you use 20G Data
• Increment it as 512mb or 1024mb
第35页
Redis Monitoring
第36页
Redis Monitoring
Name CPU Usage, Load Network inbound, outbound 현재 클라이언트 개수, max client 설정 키 개수, 명령어 처리 수 메모리 사용량, RSS Disk 사용량, io Expired keys, Evicted keys
Host or Redis(info) Host Host
Redis
Redis Redis Host Redis
第37页
Redis Security Issue
第38页
Redis Security is very weak.
• ACL
– Not supported – Never open redis port to public
• only use redis in private network
– Don’t run redis as root.
第39页
Redis hacking
• Redis port is opened for public
– Config set dir “/root/.ssh” – Config set dbfilename “authorized_keys” – Save
• So user can use this server as root.
第40页
Thank you.