ruby on rails - Unicorn workers timeout after "zero downtime" deploy with capistrano -


i'm running rails 3.2.21 app , deploy ubuntu 12.04.5 box using capistrano (nginx , unicorn).

i have app set zero-downtime deploy (at least thought), config files looking more or less these.

here's problem: when deploy done , restarts unicorn, when watch unicorn.log see fire new workers, reap old ones... app hangs 2-3 minutes. request app @ point hits timeout window (which set 40 seconds) , returns app's 500 error page.

here first part of output unicorn.log unicorn restarting (i have 5 unicorn workers):

i, [2015-04-21t23:06:57.022492 #14347]  info -- : master process ready i, [2015-04-21t23:06:57.844273 #15378]  info -- : worker=0 ready i, [2015-04-21t23:06:57.944080 #15381]  info -- : worker=1 ready i, [2015-04-21t23:06:58.089655 #15390]  info -- : worker=2 ready i, [2015-04-21t23:06:58.230554 #14541]  info -- : reaped #<process::status: pid 15551 exit 0> worker=4 i, [2015-04-21t23:06:58.231455 #14541]  info -- : reaped #<process::status: pid 3644 exit 0> worker=0 i, [2015-04-21t23:06:58.249110 #15393]  info -- : worker=3 ready i, [2015-04-21t23:06:58.650007 #15396]  info -- : worker=4 ready i, [2015-04-21t23:07:01.246981 #14541]  info -- : reaped #<process::status: pid 32645 exit 0> worker=1 i, [2015-04-21t23:07:01.561786 #14541]  info -- : reaped #<process::status: pid 15534 exit 0> worker=2 i, [2015-04-21t23:07:06.657913 #14541]  info -- : reaped #<process::status: pid 16821 exit 0> worker=3 i, [2015-04-21t23:07:06.658325 #14541]  info -- : master complete 

afterwards, app hangs 2-3 minutes, here what's happening:

e, [2015-04-21t23:07:38.069635 #14347] error -- : worker=0 pid:15378 timeout (41s > 40s), killing e, [2015-04-21t23:07:38.243005 #14347] error -- : reaped #<process::status: pid 15378 sigkill (signal 9)> worker=0 e, [2015-04-21t23:07:39.647717 #14347] error -- : worker=3 pid:15393 timeout (41s > 40s), killing e, [2015-04-21t23:07:39.890543 #14347] error -- : reaped #<process::status: pid 15393 sigkill (signal 9)> worker=3 i, [2015-04-21t23:07:40.727755 #16002]  info -- : worker=0 ready i, [2015-04-21t23:07:43.212395 #16022]  info -- : worker=3 ready e, [2015-04-21t23:08:24.511967 #14347] error -- : worker=3 pid:16022 timeout (41s > 40s), killing e, [2015-04-21t23:08:24.718512 #14347] error -- : reaped #<process::status: pid 16022 sigkill (signal 9)> worker=3 i, [2015-04-21t23:08:28.010429 #16234]  info -- : worker=3 ready 

eventually, after 2 or 3 minutes, app starts being responsive again, more sluggish. can see in new relic (the horizontal line marks deploy, , light blue area indicates ruby):

new relic graph during , after deploy

i have identical staging server, , cannot replicate issue in staging... granted, staging under no load (i'm person trying make page requests).

here config/unicorn.rb file:

root = "/home/deployer/apps/myawesomeapp/current" working_directory root  pid "#{root}/tmp/pids/unicorn.pid" stderr_path "#{root}/log/unicorn.log" stdout_path "#{root}/log/unicorn.log"  shared_path = "/home/deployer/apps/myawesomeapp/shared"  listen "/tmp/unicorn.myawesomeapp.sock" worker_processes 5 timeout 40  preload_app true  before_exec |server|   env['bundle_gemfile'] = "#{root}/gemfile" end  before_fork |server, worker|   if defined?(activerecord::base)     activerecord::base.connection.disconnect!   end    old_pid = "#{root}/tmp/pids/unicorn.pid.oldbin"   if file.exists?(old_pid) && server.pid != old_pid     begin       process.kill("quit", file.read(old_pid).to_i)     rescue errno::enoent, errno::esrch      end   end end  after_fork |server, worker|   if defined?(activerecord::base)     activerecord::base.establish_connection   end end 

and paint complete picture, in capistrano deploy.rb, unicorn restart task looks this:

namespace :deploy   task :restart, roles: :app, except: { no_release: true }     run "kill -s usr2 `cat #{release_path}/tmp/pids/unicorn.pid`"   end end 

any ideas why unicorn workers timeout right after deploy? thought point of zero-downtime keep old ones around until new ones spun , ready serve?

thanks!

update

i did deploy, , time kept eye on production.log see going on there. suspicious thing following lines, mixed in normal requests:

dalli/sasl authenticating 7510de dalli/sasl: 7510de dalli/sasl authenticating 7510de dalli/sasl: 7510de dalli/sasl authenticating 7510de dalli/sasl: 7510de 

update #2

as suggested of answers below, changed before_fork block add sig = (worker.nr + 1) >= server.worker_processes ? :quit : :ttou workers incrementally killed off.

same result, terribly slow deploy, same spike illustrated in graph above. context, out of 5 worker processes, first 4 sent ttou signal, , 5th sent quit. still, not seem have made difference.

i came across similar problem while trying set rails/nginx/unicorn on digital ocean. able zero-downtime deploys work after tweaking few things. here few things try:

  1. reduce number of worker process.
  2. increase memory of server. getting timeouts on 512mb ram droplet. seemed fix issue when increased 1gb.
  3. use "capistrano3-unicorn" gem.
  4. if preload_app true, use restart (usr2). if false, use reload (hup).
  5. ensure "tmp/pids" in set linked_dirs in deploy.rb.
  6. use px aux | grep unicorn make sure old processes being removed.
  7. use kill [pid] manually stop unicorn processes still running.

here's unicorn config reference:

working_directory '/var/www/yourapp/current' pid '/var/www/yourapp/current/tmp/pids/unicorn.pid' stderr_path '/var/www/yourapp/log/unicorn.log' stdout_path '/var/www/yourapp/log/unicorn.log' listen '/tmp/unicorn.yourapp.sock' worker_processes 2 timeout 30 preload_app true  before_fork |server, worker|   old_pid = "/var/www/yourapp/current/tmp/pids/unicorn.pid.oldbin"   if old_pid != server.pid     begin     sig = (worker.nr + 1) >= server.worker_processes ? :quit : :ttou     process.kill(sig, file.read(old_pid).to_i)     rescue errno::enoent, errno::esrch     end   end end 

deploy.rb

lock '3.4.0'  set :application, 'yourapp' set :repo_url, 'git@bitbucket.org:username/yourapp.git' set :deploy_to, '/var/www/yourapp' set :linked_files, fetch(:linked_files, []).push('config/database.yml', 'config/secrets.yml', 'config/application.yml') set :linked_dirs, fetch(:linked_dirs, []).push('log', 'tmp/pids', 'tmp/cache', 'tmp/sockets', 'vendor/bundle', 'public/system') set :format, :pretty set :log_level, :info set :rbenv_ruby, '2.1.3'  namespace :deploy   after :restart, :clear_cache     on roles(:web), in: :groups, limit: 3, wait: 10     end   end end  after 'deploy:publishing', 'deploy:restart' namespace :deploy   task :restart     #invoke 'unicorn:reload'     invoke 'unicorn:restart'   end end 

Comments

Popular posts from this blog

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - No viable overloaded operator for references a map -