Sometimes, especially in development environments, you want to drop a whole database & then re-establish it. Unfortunately, if there are any current connections to the postgresql db, you can’t. Postgresql doesn’t give you a way to selectively drop client connections — you can stop the whole db server, but that will affect every database in there, not just the one you are working on.
Luckily, postgres will report on who has current connections, through the pg_stat_activity table. Our task is to use this table effectively to grab the process IDs of connections, and then to ask the OS to kill those connections for us. And to do this without granting too much power on the way; this will not “run as root with no password” via sudo, for example.
For those impatient people out there, here’s the script that does the job, the sudoers line needed to grant a reasonable minimum privilege, and how to use it. Then I can get on with explaining it …
Step 0 - The Finished Product
Put this script somewhere useful, like /usr/local/bin/kill-db-connections or $HOME/bin/kill-db-connections :-
#!/bin/sh
psql -U $1 -t -c "select procpid from pg_stat_activity \
where datname='$2' and current_query not like \
'select procpid from pg_stat_activity%';" \
| sudo -u postgres xargs kill
Now decide which user will be executing this - in a development box I have nearby, this is the user jenkins, related to the Jenkins Continuous Integration system (previously called Hudson). So in sudoers I add the following restricted ability :-
jenkins ALL=(postgres) NOPASSWD: /usr/bin/xargs kill
Finally, we invoke this command with the username and database name as command arguments, and any user password as an environment variable …
PGPASSWORD="userpw" kill-db-connection testuser testdb
And that’s all we need. Now read on for the detail of each step …
Step 1 - Identify the connections
The pg_stat_activity table in postgresql stores a lot of detail about the connections to the database and what they are doing. When we look at this table we are interested in the datname column, which shows the database a client is connected to (we are assuming that a busy system has multiple databases under postgresql, which is why we are not just shutting down the whole server to get rid of unwanted connections), the procpid column which gives us the OS process ID of the connection, and the current_query column, which I’ll explain in a moment.
Lets just query those three and see what we get (my database name is ‘dev’ in this example) :-
dev=> select datname,procpid,current_query
dev-> from pg_stat_activity;
datname | procpid | current_query
----------+---------+-------------------------------------------------------------
dev | 4865 | select datname,procpid,current_query from pg_stat_activity;
dev | 2200 |
dev | 4538 |
dev | 2257 |
dev | 2270 |
dev | 3926 |
dev | 5394 |
prod | 4528 |
(8 rows)
Ok, so we can see 8 connections — and one of those is actually our current query! More worryingly, one of them is a connection to a different database, and we should leave this one alone. We need to refine our query a little — select only the ‘dev’ database, and exclude our own query.
dev=> select datname,procpid,current_query
dev-> from pg_stat_activity where datname = 'dev'
dev-> and current_query not like 'select datname,procpid,current_query from pg_stat_activity%';
datname | procpid | current_query
----------+---------+---------------
dev | 2200 |
dev | 4538 |
dev | 2257 |
dev | 2270 |
dev | 3926 |
dev | 5394 |
(6 rows)
That’s better! I excluded our own query by specifying enough of our own actual query using LIKE to avoid any normal db user. I couldn’t specify the whole query, because that would end up including the LIKE clause itself … and that way lies recursive madness.
So now we have the list of connections that we are interested in. We can now forget about the datname and current_query colums on output, because we have confidence in our main query. When we drop these from the main query we need to also modify the LIKE clause to match …
dev=> select procpid from pg_stat_activity
dev-> where datname = 'dev'
dev-> and current_query not like 'select procpid from pg_stat_activity%';
procpid
---------
2200
4538
2257
2270
3926
5394
(6 rows)
Step 2 - Just the bare minimum data please
We need to step outside the postgresql prompt now, in order to process this output from the shell. The first step is to work out what this needs to look like to be a ’single command’ rather than an interactive session.
We can call the psql program with the option ‘-c SQL_COMMAND’, but we also need to specify which user to log in as with ‘-U username’. Lets try this …
$ psql -U devuser \
> -c "select procpid from pg_stat_activity where datname='dev' \
> and current_query not like 'select procpid from pg_stat_activity%';"
Password for user devuser:
Oh, that’s not very helpful. Automating this won’t work if I have to type in a password every time it gets used. We have two approaches — allow this user to log in to postgresql without providing a password, or store the actual password in the script.
The correct approach is to set up a user that is allowed to run this query, and only this query, and to do so without a password. After all, we can get most of this data from the OS as any user anyway. But I’m guessing that if you knew how to do that properly you wouldn’t need to read this page! And I don’t want to start describing how to modify postgresl authentication to allow this, we have enough to cover anyway. So providing the password from the script is the way we’re going. Just make sure that the script file is stored securely, with restricted permissions to stop just anyone from reading it.
So, we can provide the password by populating the PGPASSWORD environment variable when we call the psql command :-
$ PGPASSWORD="userpw" psql -U devuser \
> -c "select procpid from pg_stat_activity where datname='dev' \
> and current_query not like 'select procpid from pg_stat_activity%';"
procpid
---------
2200
4538
2257
2270
3926
5394
(6 rows)
That’s good. Now we need to ask psql to stop printing the column name at the top and the number of matching rows at the bottom, using the ‘-t’ option.
$ PGPASSWORD="userpw" psql -U devuser -t \
> -c "select procpid from pg_stat_activity where datname='dev' \
> and current_query not like 'select procpid from pg_stat_activity%';"
2200
4538
2257
2270
3926
5394
So now we have a simple and clean list of the connections that we want to terminate. Time to move on.
Step 3 - Terminate the connections
Lets have a look at one of those process IDs, and see what we can see.
$ ps -fp 2200
UID PID PPID C STIME TTY TIME CMD
postgres 2200 17444 0 09:22 ? 00:00:00 postgres: dev dev 203.97.121.211(44619) idle
It is owned by the system user ‘postgres’. If we want to kill that process, we need to either have the same user ID, or to be root. Being root is not good, it is too easy to make mistakes and do bad things on a system. We want to be the user ‘postgres’ when we are killing the connections. But at the beginning of all this, we mentioned running this under the Jenkins CI system, and that means that we are probably running as the user ‘jenkins’. We will need to use the sudo command to allow us to switch users for this. And we have already exposed one password to our script, can we avoid doing this again?
Yes, we can. sudo will allow us to run a specified command, as another user, without asking for a password. This is a potential security hole, but if you manage things well it becomes a powerful automation tool.
So, to allow ‘jenkins’ to run the command ‘kill’ as the user ‘postgres’, we add the following line to the sudoers configuration using visudo :-
jenkins ALL=(postgres) NOPASSWD: /usr/bin/kill
So we can now issue kill commands to these process IDs. We can also kill the whole postgresql server too, if we are not careful. How careful are you when writing new Jenkins tasks for your development server? How careful do you need to be?
I’m not going to answer those questions for you. What I am going to do it help you work your way through the list of process IDs from the pg_stat_activity table and get rid of them all quickly!
We could ask the shell to step through each line of the output, and send the kill signal to each one. That would look like this :-
$ for pid in $(PGPASSWORD="userpw" psql -U devuser ... blah blah)
> do
> sudo -u postgres kill $pid
> done
Yes, I really did say “blah blah” in there, there was too much to type and it all looks so very messy. That’s one reason to not use this approach — the code ends up messy, and messy code is bad code.
The other way is to use the xargs command, which is written specifically to make this kind of thing easy. xargs basically handles the whole ‘for;do;done’ loop for you in one go.
Lets test xargs with a simple printf call :-
$ PGPASSWORD="userpw" psql -U devuser -t \
> -c "select procpid from pg_stat_activity where datname='dev' \
> and current_query not like 'select procpid from pg_stat_activity%';" \
> | xargs printf 'pid=%s\n'
pid=2200
pid=4538
pid=2257
pid=2270
pid=3926
pid=5394
We can use xargs to call the kill command for us, too. Now we face another choice, to do with the use of the ’sudo -u postgres’ that we need — do we do this inside the xargs loop, or outside it? Both will work …
I’ve chosen to invoke the sudo first, and to run the xargs command as the postgres user. It is a little arbitrary, but in my way the jenkins user can no longer simply call ’sudo -u postgres kill …’. This might stop an accidental kill of the postgres server perhaps. It isn’t going to be difficult to work around, but it isn’t obvious. I’m not trying to utterly prevent the jenkins user from abusing their rights to kill postgres processes, but I am trying to avoid user mistakes.
So I will call sudo -u postgres xargs kill at the end of our long command line (and of course I have to change the sudoers config we used earlier).
$ PGPASSWORD="userpw" psql -U devuser -t \
> -c "select procpid from pg_stat_activity where datname='dev' \
> and current_query not like 'select procpid from pg_stat_activity%';" \
> | sudo -u postgres xargs kill
sudoers:
jenkins ALL=(postgres) NOPASSWD: /usr/bin/xargs kill
Step 4 - Generalising the solution
So now we need to take our huge command line, and pop it into a script for later use. While we’re doing that, we can clean up some of the fixed names we have used and make the solution more flexible.
First off, we will leave out the users password; this will be needed in scripts called directly by Jenkins, but our single-purpose ‘kill-db-connections’ script doesn’t need to handle it — after all, perhaps you set up postgresql acces permissions to allow a user to run this query with no password (tell me how you did it, please!)
Secondly, we won’t make a fixed assumption about the name of the database itself, or the username to connect as. We’ll grab these from the command-line when you run the script.
Thirdly, we won’t put in error checking and stuff like that. You won’t be using this script from the command-line, you will be integrating it into other scripts that, once tested, won’t be changed. So we can save time and effort here — and that’s another tradeoff between doing things ‘right’ and doing them ‘now’.
So, our script becomes :-
#!/bin/sh
psql -U $1 -t -c "select procpid from pg_stat_activity \
where datname='$2' and current_query not like \
'select procpid from pg_stat_activity%';" \
| sudo -u postgres xargs kill
In order for the sudo to work, we need ’sudoers’ to contain :-
jenkins ALL=(postgres) NOPASSWD: /usr/bin/xargs kill
And when we call this, we need to provide the user- and database-names as command arguments, and the user password as an environment variable :-
PGPASSWORD="userpw" kill-db-connection testuser testdb
Last thoughts
Is there a better way for all this? Well, perhaps if the postgresql server itself supported a command like ‘pg_ctl kill-db-clients datname’ we would be happy. Who wants to implement that, and submit the code to the postgresql project?
A better way to implement this specific solution, however, would be to mandate that the actual ‘kill-db-connection’ script were run as the postgres user. That way we avoid the granting of arbitrary ‘kill’ ability to the jenkins user, and as long as the script were protected by filesystem permissions to prevent ‘jenkins’ from altering it (and we spent more time being very defensive about the command-line argument inputs), we could be more restrictive in sudoers, with something like ‘jenkins ALL=(postgres) NOPASSWD: /usr/local/bin/kill-db-connections‘. I didn’t do that here, because I’m not that paranoid about the development server in question, and didn’t need to spend the extra time to make the script more defensive. However, in a larger environment you may not want to extend too much ‘trust’ to your automation systems.
Another improvement would be around the handling of the user password. I’m keeping this out of the command-line arguments so that it doesn’t show up in a standard process list; however putting it in an environment variable isn’t much better as it can still be extracted by any user — it’s just a little more obscured. The psql command will read data from $HOME/.pgpass so we could put data in there — however when you are running from a system user account like ‘jenkins’ it isn’t obvious just where $HOME actually is, and you may be switching between an arbitrary number of different database user accounts anyway. If you really need to exercise more care over passwords (and you should!) then you need to do a little more research. Talk to a real postgresql administrator, not me!
Tags: postgresql, sudo