Redis Trouble Shooting(Redis问题调试) by Clark.kang

AirJD

没有录音文件

00:00/00:00

加收藏

Redis Trouble Shooting(Redis问题调试) by Clark.kang

发布者 dber

发布于 1450832417824 浏览 7344 关键词 Redis, NoSQL, English

分享到

第1页

Redis Trouble Shooting

Clark.kang charsyam@naver.com

第2页

Contents

• Redis

– Single Threaded

• Redis Trouble Shooting • Redis Security Issue

第3页

Single Threaded #1

Client #1 Client #2

…… Client #N

Packet #1 Packet #2

Redis Event Loop

I/O Multiplexing Process Command

第4页

Single Threaded #2

• Process only one command at once • If you run long processed command, all other

commands are pending.

– Keys, flushall, flushdb, lua script, MULTI/EXEC

• Redis uses some other threads, but it is only for avoiding fsync call.

第5页

How slow?

Command flashall

Item Count 1,000,000

Time 1000ms(1 second)

第6页

Recommanded Redis Version #1

• Lastest Stable Version

– 3.0.x is also good. – Using after 2.8.13

• There are some differences depending on the version of redis

– config set client-output-buffer-limit is accepted with redis-cli in 2.6.x

– config set client-output-buffer-limit couldn’t use some expression like 1GB, 1MB in 2.8.20

第7页

Memeory Fragmentation #1

• Previous Redis Version that using Jemalloc 3.6.0

– Redis uses just 2.4G but rss is 12G in 2.8.6

第8页

Memeory Fragmentation #2

• Redis version that using after Jemalloc 3.6.0

– 2.8.20 shows less difference Mem Usages and RSS – But you should check RSS.

第9页

Recommanded Client(For Management)

• Using redis-cli

– It is best.

• You can use telent also

– Redis support inline command – Twemproxy doesn’t support inline command – You can’t use some command in old versions.

第10页

If you support Service Team to use Redis

• Check They want cache or Store

– If it is for cache, Turn off SAVE option – Even it is for store, Give proper value for SAVE

• Using Multiple Redis Instances in one physical server. • Using maxmemory option.

第11页

Using Multiple Redis Instances.

• CPU 4 core 32G Memory

– 3 redis Instances each 8G is better than one 26G Redis

Mem: 8G

Mem: 26G

Mem: 8G

Mem: 8G

第12页

Replication

第13页

Redis Replication

• Redis is Single Thread

– Fork for Replication

• Supported Chained replication

– Not supported Multi-Master – To check RSS for Replication

第14页

Replication

• Support Chained Replication

Master 1st Slave 2nd Slave

1st slave is master of 2nd slave

第15页

Replication

Master

Slave

replicationCron

Health check

第16页

Replication

Master

Slave

replicationCron

Health check

第17页

Replication

Master

Slave

replicationCron

When master reruns, Resync with Master

第18页

Mistake: Replication

If master has no data.

Master

Slave

replicationCron

Slave will has no data after resyncing

第19页

Persistent

第20页

RDB/AOF

• RDB/AOF are not related. • RDB

– Snapshot of current memory status – Fork and dump its memory to disk – In Write Heavy System. It can use double memory

• AOF

– Save update(create, update, delete) commands to disk after event loop as redis protocol.

– Disk Sync option affects the performance(default: everysec) – Less Disk IO compared to RDB(Except AOF rewriting)

第21页

If you turn off persistent. But You

can’t avoid fork (migration)

第22页

Migration

第23页

Migration Order

1. Prepare New instance(as B) for old instance(as A) 2. Send “slaveof A ip A port” to B 3. Wait to finish replication

1. Fork and using more memory

4. Turn on writable option for B

1. Config set slave-read-only no

5. Connect clients to B 6. Send “slaveof no one” to B

第24页

Partial Sync

第25页

Redis Replication Mechanism

1. Slave sends sync command to Master 2. Master forks and create RDB 3. After RDB creation, Master sends RDB data to slave 4. While sending RDB data, Master saves new commands

into memory buffer 5. After sending RDB, slave starts loading RDB 6. After loading RDB, Master sends memory buffered

data to slave

第26页

Problem of Redis Repliaction

• When connection is broken between master and slave

– Slave try to start FULL sync. – But Full sync is very expensive.

第27页

Partial Sync

• If there are some small difference that can be recoverd by memory buffer.

– Slave can request “PSYNC” – And master just send small memory buffer to slave. – And finish syncing.

• But if master is changed as another server.

– Only FULL sync is possible.

第28页

Trouble Shooting

第29页

T Service

• Condition

– Only for cache

• Redis Configuration

– stop-writes-on-bgsave-error yes

• Failure

– Write is forbidden after RDB creating failure – Read are OK.

• Solution

– Config set stop-writes-on-bgsave-error no

第30页

G Service

• condition

– Only for cache – Using default options

• Redis Conf

– SAVE 900 1 – SAVE 300 10 – SAVE 60 10000

• Failure

– Performance degradation because of Much Disk IO in short time

• Solution

– config set SAVE “” – Removing SAVE option

第31页

S Service

• Condition

– Some cache and some storing data – Using one instance and using 28GB data in 32GB machine – And It has disk failure also.

• Failure

– Latency is high because of Using Swap memory – It spends much time for creating RDB

• Normally, it takes 5~6 minutes for 10G memory • It took over 8 hours of dumping 28G

• Solution

– Droping server – Sometime, it is better to drop data than dragging on failure.

第32页

P Service #1

• Condition

– Using AOF for store – 8 instances in 256GB each instance using 26GB

• Failure

– All 8 redis instances tried to start AOF Rewrite – Much Disk IO and Using much memory – They are start to service same time. So AOF rewriting timing

also similar

• Solution

– Stop AOF Rewrite and manage it with batch

第33页

P Service(#2) – Not actually Failure

• Condition

– All Redis Master/Slave servers connection are broken because of Network issue.

• Failure Possibility

– If network is recoverd, all redis slaves will try to sync with master – All redis masters will fork and using much memory. – It can trigger Big failure of Service

• Solution

– Disconnection all M/S connection using “slaveof no one” – And recovering network. – And make replication connection one by one in one physical serve.

第34页

Replication Faiulre

• Condition

– 20GB data – Some write operations

• Failure

– Failing of Master/Slave Replication.

• Solution

– Checking client-output-buffer-limit – Default “client-output-buffer-limit slave 256mb 64mb 60”

• Hard limit 256mb • Soft limit 64mb for 60 seconds.

– Default is ok for 10G data, but If you use 20G Data

• Increment it as 512mb or 1024mb

第35页

Redis Monitoring

第36页

Redis Monitoring

Name CPU Usage, Load Network inbound, outbound 현재 클라이언트 개수, max client 설정 키 개수, 명령어 처리 수 메모리 사용량, RSS Disk 사용량, io Expired keys, Evicted keys

Host or Redis(info) Host Host

Redis

Redis Redis Host Redis

第37页

Redis Security Issue

第38页

Redis Security is very weak.

• ACL

– Not supported – Never open redis port to public

• only use redis in private network

– Don’t run redis as root.

第39页

Redis hacking

• Redis port is opened for public

– Config set dir “/root/.ssh” – Config set dbfilename “authorized_keys” – Save

• So user can use this server as root.

第40页

Thank you.