The following is taken from a production system that we run, which I found might be helpful to others that are enjoying the happiness of using systemd.

Precondition

We had two services defined:

service-a.service

[Unit]
Description=Event handler service
After=supervisord.service service-b.service
Requires=service-b.service

[Service]
Type=simple
ExecStart=/bin/service-a
ExecReload=/bin/kill -s HUP $MAINPID

[Install]
WantedBy=multi-user.target

service-b.service

[Unit]
Description=Bootstrap service
Before=supervisord.service
Wants=basic.target
After=basic.target network.target vault-token.target

[Service]
User=root
Type=oneshot
ExecStartPre=/bin/sleep 180
ExecStart=/bin/service-b
RemainAfterExit=true
StandardOutput=journal

[Install]
WantedBy=vault-token.target

The initial design of this two services are:

  1. service-c is running in supervisord
  2. service-b sleep 30 seconds before start, to wait for another service-c to be available
  3. service-b should run before supervisord, which is incorrect
  4. service-a should only run after supervisord service, to have better change that service-c is available (though not guaranteed)
  5. service-a should also run after service-b
  6. vault-token.target is a special target that we define to notify the change of Hashicorp Vault token.

Also we have supervisord.service

[Unit]
Description=supervisord – Supervisor process control system for UNIX
Documentation=http://supervisord.org
After=network.target

[Service]
Type=forking
ExecStart=/usr/local/bin/supervisord -c /etc/supervisord.conf
ExecReload=/usr/local/bin/supervisorctl reload
ExecStop=/usr/local/bin/supervisorctl shutdown
User=root
KillMode=process

[Install]
WantedBy=multi-user.target

Observations

We observe on some node:
supervisord starts before service-b or after service-b started, but didn’t wait the ExecPreStart ended, this also makes service-b being able to talk to consul and not erroring out

Some node: supervisord starts after service-b is finished, this makes service-b crap out.

Systemd-fu

After/Before: They configure ordering dependencies between units. If a unit foo.service contains a setting Before=bar.service and both units are being started, bar.service‘s start-up is delayed until foo.service has finished starting up. Note there’re prerequisites that to make bar wait foo, the two units has to be triggered for start at the same time. Also the result of start doesn’t matter. This implies:

  • If foo is started by multi-user.target (start on boot), but bar is started by another unit or is started manually, bar will start no matter state foo is in (inactive, active), except for activating
  • If foo is activating, starting bar from whatever unit or manually will cause bar to wait for foo to finish starting

Requires/Wants: Configures requirement dependencies on other units. If this unit gets activated, the units listed here will be activated as well. For requires, if units listed as Requires fail to start, this unit will not start. For wants, this unit will always start no matter the start result of wanted unit is.

multi-user.target: a “checkpoint” equivalent to SysV multi-user runlevel, most unit that starts at boot is set to RequiredBy this target since the system is likely to have all the essential requirements at that moment to run tasks.

vault-token.target: a target that indicates the file /root/.vault-token is changed, this target is independent with multi-user.target. The call chain will be vault-login.service → vault-token.path → vault-token.service → vault-token.target.

The reason for observations

As you may noticed the supervisord is started by multi-user.target while service-b is started by vault-token.target. Depending on how fast awsauth finished, service-b may start earlier or latter than supervisord.

  • If service-b starts later than supervisord, supervisord will start with no delay.
  • If service-b starts early, it sleeps for 3 minutes and during that time if supervisord wants to start, it will wait for after 3 minutes that service-b is finished starting.

How to fix the issue

TheBefore=supervisord.service condition in service-b is removed. Since service-b is started by vault-token.target while service-a is started by multi-user.target, the order of “service-a run after service-b” may not work. The Install section of service-a.service can be removed to make the logic robust.

We also removed supervisord from our system so that systemd will be the only process manager to make dependency ordering easier.

 

Ref:
https://www.freedesktop.org/software/systemd/man/systemd.unit.html

Minimum test to verify the After/Before behaviour:

  1. systemctl start bbb ccc: in log ccc is only started after bbb is finished
  2. systemctl stop bbb, systemctl start ccc: ccc is started non-block-ly
  3. systemctl start bbb; (open another shell) systemctl start ccc: ccc waits for 30s bbb to finish start
  4. uncomment the requires, systemctl stop bbb ccc, systemctl start ccc: both services will be start in 30s