Production Ready Ruby
Decreasing MTBF & Increasing MTTR
29 January 2017
Adam Hawkins
SRE Team Lead, Saltside
Adam Hawkins
SRE Team Lead, Saltside
Code structured such that:
1. Mean time between failure (MTBF) is reduced
2. Mean time to resolve (MTTR) is increased
First, start your process using the same command you would use in production.
Example Makefile
# Sentinel artifact representing some commands have been run. # Every non-phone make target must create a file ENVIRONMENT:=tmp/environment # Boot everything for testing $(ENVIRONMENT): bundle exec rackup -p 9292 mkdir -p $(@D) touch $@ .PHONY: test-smoke # Run a smoke test; depend on the $(ENVIRONMENT) test-smoke: $(ENVIRONMENT) env SERVER_URL=http://localhost:9292 bats smoke_test.bats
Next, run some commands to test the process that make sense in production.
I like bats.
Bats is a bash test framework. Simple assertions via test
and TAP output.
Example smoke_test.bats
#!/usr/bin/env bats @test "liveness probe" { run curl -f "${SERVER_URL}/probe/liveness" [ $status -eq 0 ] # $status populated by bats run command } @test "readiness probe" { run curl -f "${SERVER_URL}/probe/readiness" [ $status -eq 0 ] }
Move your utilities out of rake
into thor
(or anything else really).
Define a test task in the Makefile
.PHONY: test-util test-util: $(ENVIRONMENT) bundle exec util reset -f
--port
not parsed correctly)Telemetry is data required to understand the current state.
Save data (e.g. metrics and/or logs) to relate the current to the past state.
Data = Business + Technical
"How is my server doing?"
Include metadata to aggregate across paths/request names etc
Examples
http.request
http.response.latency
http.response.{2xx,3xx,4xx,5xx}
These are your upstream APIs, other internal services, or data stores
Include metadata to aggregate across dependencies.
Examples:
api{a,b,c}.request
api{a,b,c}.response.latency
api{a,b,c}.response.{success,error,exception,timeout}
"Are messages moving through the queue?"
"queue" refers to message queue or a job queue like sidekiq
Include metadata to aggregate across each queue
Examples
queue.{channel}.depth
queue.{channel}.processed
queue.{channel}.failed
(exception, error, or any unexpected failure)"What's the load on the pool?"
incoming_order
is better than `POST /orders`LOG_LEVEL
progname
require 'logger' logger = Logger.new($stdout).tap do |log| log.level = ENV.fetch('LOG_LEVEL', :debug) end logger.info('server') { 'handling reqeust' } logger.debug('order-processor') { 'incoming order' } # Outputs # # I, [2017-01-28T23:02:48.662657 #29158] INFO -- server: handling reqeust # D, [2017-01-28T23:02:48.662730 #29158] DEBUG -- order-processor: incoming order # # Easy Grepping for subsystems when dealing with logs
require 'delegate' # Underused and powerful library require 'logger' class NamedLogger < DelegateClass(Logger) def initialize(logger, progname) super logger @progname = progname end def info(msg) super(@progname) { msg } end end
logger = Logger.new $stdout server = NamedLogger.new logger, :server queue = NamedLogger.new logger, :queue server.info 'incoming request' queue.info 'processed message' # Output # # I, [2017-01-28T23:10:40.365454 #29392] INFO -- server: incoming request # I, [2017-01-28T23:10:40.365516 #29392] INFO -- queue: processed message
LOG_LEVEL
"Release It!" by Michael Nygaard