I Borked Prod: Initial troubleshooting of distributed systems in 5 minutes or less

A presentation at 12 Clouds of Christmas 2019 in December 2019 in Austin, TX, USA by Laura Santamaria

Slide 1

Slide 1

I Borked Prod Initial troubleshooting of distributed systems in 5 minutes or less @nimbinatus | #12Clouds

Slide 2

Slide 2

My ‘Patented’ 5-Step System @nimbinatus | #12Clouds

Slide 3

Slide 3

Step 0: Panic @nimbinatus | #12Clouds

Slide 4

Slide 4

Step 0: Panic @nimbinatus | #12Clouds

Slide 5

Slide 5

Step 1: Get Data @nimbinatus | #12Clouds

Slide 6

Slide 6

“ Logs @nimbinatus | #12Clouds

Slide 7

Slide 7

“ Alerts @nimbinatus | #12Clouds

Slide 8

Slide 8

“ Monitoring @nimbinatus | #12Clouds

Slide 9

Slide 9

“ Hit It @nimbinatus | #12Clouds

Slide 10

Slide 10

“ Drill In @nimbinatus | #12Clouds

Slide 11

Slide 11

“ Build Envs @nimbinatus | #12Clouds

Slide 12

Slide 12

Step 2: Analyze @nimbinatus | #12Clouds

Slide 13

Slide 13

“ Patterns @nimbinatus | #12Clouds

Slide 14

Slide 14

“ History @nimbinatus | #12Clouds

Slide 15

Slide 15

Step 3: Act @nimbinatus | #12Clouds

Slide 16

Slide 16

Step 3: Act @nimbinatus | #12Clouds

Slide 17

Slide 17

Step 3: Act @nimbinatus | #12Clouds

Slide 18

Slide 18

Step 4 (optional): Fail @nimbinatus | #12Clouds

Slide 19

Slide 19

Step 5: Repeat @nimbinatus | #12Clouds

Slide 20

Slide 20

Unborked (Thanks) @nimbinatus | #12Clouds