Building Bard

What is Bard?

Last year, I had an idea for a useful little tool – a shell that could record the changes I made.

The motivation was having to do occasional sysadmin tweaks on production servers. We always want to have scripts or Ansible playbooks that will reproduce the state of a server, but sometimes some exploratory direct work is needed to figure out what to do. We then have to try to retroactively add something to our scripts to duplicate the change we just made, but now we’re just hoping it’s actually doing the same thing as what we did.

My idea was to make a shell that would record all the changes you make in a format that can easily be re-executed – a shell script or ansible playbook. Just recording commands is easy enough – you could just parse history – but the most important thing I wanted to be able to do is to also record edits to files made in a text editor.

Long story short, I have a proof-of-concept working now (implemented in Prolog, of course), which I’m calling Bard (because it immortalizes the tales of what you did). It’s at the stage where it mostly works, but in limited domains and I’m realizing that really building this out would require a ton of tedious work to both enhance the shell and to parse all the various commands that one could run.

I’m releasing it though, because I think 1) someone might find it useful but more importantly because 2) I wrote it in Prolog and I think it does a few interesting things.

If you think it might be useful, go ahead and check it out.

In the meantime, here are some things that I encountered while building it that some people out there might find interesting.

Parsing Shell

Our discussion of Bard will begin by looking at the parser.

One of my favourite features of Prolog are DCGs, which give us a very convienent way to write pretty decent little parsers, without having to resort to hacky regexes.

The limited, bash-lite syntax our shell will support is pretty easy to write a parser for. One key thing that I do at the top is wrap the various things we’ll be parsing in terms:

word(u([var(V)])) --> var(V).
word(s(W)) --> "'", !, single_quoted(W).
word(d(W)) -->
    "\"", !, double_quoted(W).
word(u(W)) -->
    unquoted_word(W).

It’s a natural thought to try to immediately parse our terms into native data structures – e.g. strings, atoms, and numbers – but by keeping them as terms, it will make it much easier for us to handle them in different ways in subsequent steps.

For example, later on, we’ll want to expand variables in double-quoted strings, but not in single-quoted strings. How will we distinguish the two things? Because we’ve kept them as terms, it will be very simple to have clauses that just match in the head, instead of having to check the types of the arguments or try to do more complicated logic.

To wit, the below code (from commands.pl) handles expanding the different types of strings.

sh_string_val_quoted_(s(Cs), _, S) :-
    format(string(S), "'~s'", [Cs]).
sh_string_val_quoted_(u(Cs), Env, S) :-
   maplist(lookup(Env), Cs, Expanded),
   append(Expanded, Codes),
   string_codes(S0, Codes),
   call_dcg(expand, [S0], [S]).
sh_string_val_quoted_(d(Cs), Env, S) :-
    maplist(lookup(Env), Cs, Expanded),
    append(Expanded, Codes),
    string_codes(S0, Codes),
    call_dcg(expand, [S0], [S1]),
    format(string(S), "\"~w\"", [S1]).

Since we can easily distinguish if something is s ingle-quoted, u n-quoted, or d ouble-quoted, we can use pattern-matching to easily distinguish which case we’re looking at. We also get to take advantage of first-argument indexing to avoid the need to cut, without worrying about leaving choice-points behind.

Another useful thing we can implement with DCGs is lookahead.

The particular thing that makes this necessary is the error redirect, e.g. command 2> foo, which would run command with stderr going to the file foo. Since 2 is also a perfectly reasonable thing to be parsed as part of a command, we need to able to ensure that the above construct is parsed as Command = [`command`], Redirects = [error(foo)], not Command = [`command`, `2`], Redirects = [out(foo)].

We can do this easily, using the “semi-context notation” for DCGs:

check_not_redirect(0'2), [Peek] -->
    [Peek], !,
    { Peek \= 0'> }.
check_not_redirect(_) --> [].

This predicate is used as follows, in one of the clauses for parsing unquoted words:

unquoted_word([C|Cs]) -->
    [C], check_not_redirect(C),
    { \+ memberchk(C, `\n ;&|><`) },
    !, unquoted_word_(Cs).

So, it reads a character, then invokes check_not_redirect//1 with the just-read character.

If that character is anything but a “2”, it will take the second clause and do nothing.

If it is a “2”, it reads the next code. If there is no next code (i.e. we’re at the end of input) then it just goes to the second clause & does nothing.

If there is a next character, it checks that the next character isn’t a “>” character. If it is, then the predicate fails, which because of the cut makes the whole containing read fail, making unquoted_word/1 give up the character it just read, so it can be parsed as part of a redirect instead.

If it wasn’t a “>” character though, we don’t want to have consumed it, because we want it read it normally! This is what the [Peek] in the head does – after the body of the DCG completes, it puts Peek back on the input stream.

Semi-context notation is a little weird (especially when using it with recursive predicates), but it makes DCGs even more powerful than they already are.

Tracking Edits

The most ambitious part of Bard is its ability to detect file edits and generate a diff. It does this without any special indication on the user’s part – you can just open up vi, nano, anything that it knows is an editor, then do whatever arbitrary editor things you want. Open more files, rename, anything; Bard should be able to track it & generate diffs.

It does this by using Linux’s strace API to watch the system calls that the editor makes and keep track of the open and write calls.

This is definitely the most complex part of Bard, which entailed figuring out a number of non-trivial things.

The idea is that the editor is run under strace, then watch the output of strace and use those syscalls to be able to tell whenever the editor is modifying a file. Strace writes its output to a file, so we need to be able to detect when it writes output. We can do this with inotify, another Linux API that we can use to be notified whenever a given file changes.

Writing a Foreign Wrapper

The first, simplest part, was just using the inotify API in Prolog. I found out that there’s an inotify pack, created by Jan (the main author of SWI-Prolog) which Bard now uses, but I hadn’t even looked initially, because SWI’s FFI makes it so easy to wrap C functions.

Turns out we need less than 100 lines of C to have a handy little wrapper:

/* Compiling:
   clang -I$HOME/.swivm/versions/v8.1.28/lib/swipl/include -fpic -c inotify.c
   clang -shared -o inotify.so inotify.o
*/
#include <SWI-Stream.h>
#include <SWI-Prolog.h>

#include <errno.h>
#include <poll.h>
#include <stdlib.h>
#include <sys/inotify.h>
#include <unistd.h>

/* defining our one function that will monitor a particular file for changes */
/* first argument is the file name to monitor */
/* second argument is a pipe that we'll write to make the inotify loop stop */
/* third argument is a prolog callback we'll invoke when there's new data */
static foreign_t pl_open_inotify(term_t file, term_t done_pipe,
                                 term_t callback) {
  /* Get the file name into a C string */
  char *file_name_chars;
  int fd;
  if (!PL_get_chars(file, &file_name_chars, CVT_ATOM | CVT_STRING)) {
    return PL_domain_error("string", file);
  }

  /* get the file descriptor for the pipe argument */
  int stop_fd;
  IOSTREAM *pipe;
  if (!PL_get_stream(done_pipe, &pipe, 0)) {
    return PL_domain_error("not_a_stream", done_pipe);
  }
  stop_fd = Sfileno(pipe);
  if (stop_fd <= 0) {
    return PL_domain_error("pipe_stream", done_pipe);
  }

  /* prepare the callback predicate */
  module_t module = NULL;
  if (!PL_strip_module(callback, &module, callback)) {
    return FALSE;
  }

  /* inotify setup stuff */
  fd = inotify_init1(IN_NONBLOCK);
  if (fd == -1) {
    return PL_resource_error("inotify_init");
  }

  int wd;
  wd = inotify_add_watch(fd, file_name_chars, IN_MODIFY);
  if (wd == -1) {
    return PL_resource_error("inotify_watch");
  }

  /* polling on the `done` pipe and the inotify fd */
  struct pollfd fds[2];
  fds[0].fd = stop_fd;
  fds[0].events = POLLIN;

  fds[1].fd = fd;
  fds[1].events = POLLIN;

  while (1) {
    int poll_num;
    poll_num = poll(fds, 2, -1);
    if (poll_num == -1) {
      if (errno == EINTR) {
        continue;
      }
      return PL_resource_error("polling");
    }

    if (poll_num > 0) {

      if (fds[0].revents & POLLIN) {
        /* done if the `done` pipe got input */
        break;
      }

      if (fds[1].revents & POLLIN) {
        /* read the event to clean out the input */
        struct inotify_event event;
        read(fds[1].fd, &event, sizeof(event));
        /* call our callback */
        fid_t fid = PL_open_foreign_frame();
        predicate_t call1 = PL_predicate("call", 1, NULL);
        PL_call_predicate(module, PL_Q_PASS_EXCEPTION, call1, callback);
        PL_close_foreign_frame(fid);
      }
    }
  }
  inotify_rm_watch(fd, wd);
  PL_succeed;
}

install_t install_inotify() {
  /* export our function here as `monitor_file/3` */
  PL_register_foreign("monitor_file", 3, pl_open_inotify, 0);
}

Using it from Prolog would then look like so:

:- module(inotify_test, [run/1]).

:- use_module(library(debug), [debug/1, debug/3]).
:- use_module(library(unix), [pipe/2, fork/1]).

:- use_foreign_library(inotify).

something_happened :-
    debug(test, "File changed!", []).

run(File) :-
    debug(test),
    pipe(Read, Write),
    fork(Pid),
    ( Pid == child
    -> (close(Write),
        debug(test, "starting monitor", []),
        monitor_file(File, Read, something_happened),
        debug(test, "monitor done", [])
       )
    ; (close(Read),
       sleep(10),
       format("sending done", []),
       format(Write, "done", []))
    ).

This was the first time I had written a foreign module for SWI-Prolog for my own project (although I had written C code for the core Prolog libraries) and I was very pleasantly surprised how straightforward it was. It really increased my confidence in how many places I can use SWI-Prolog, since now I know that I can trivially add some functionality with a C module if I need to.

Running Pipelines

The next bit of complexity that I encountered in the process of building Bard was handling pipelines of commands.

To make the pipeline work properly, it needs to start each process in the pipeline independently, wiring up the stdin & stdout of each appropriately. We also need to connect them all up before it starts, so processes don’t fill up their output buffers before something is there to read from it.

Bard starts the processes that the shell is executing using SWI’s process_create/3. For simple commands, this works fine – we just run the command, setting the in/out/error streams to the appropriate files if there are redirects.

In a pipeline though, we need to connect the output of the first with the input of the second, and so on. It’s conceptually simple, but unfortunately SWI’s process_create/3 copies the API of SCIStus for this predicate, which only gives three options for each of the input & output streams: std, meaning same as the equivalent in/out/error of the Prolog process; null, meaning a null stream; and pipe(Stream), which seems like it’d be what we want, but crucially only accepts an unbound variable for Stream, which it then binds to a new stream.

Luckily, as I’ve written about many times before, SWI-Prolog is a very welcoming project to make enhancements too – it was short work to submit a PR which allows for another option to be passed, stream(Stream), where Stream here is already bound to an existing stream. With this, connecting the pipeline process together was a snap.