Auto-Formatting Prolog Code

My Prolog LSP server has recently gained a new feature – the ability to automatically format code.

This has been quite a fun feature to write. The approach I took was to use the built-in prolog_read_source_term/4 to parse a Prolog source file in a structured way that maintains comments and source positions of terms.

Just before this I’d written an LSP code formatter for another programming language which had a very simple Lisp-ish syntax. In that case, the syntax was so simple that I wrote a DCG to parse the source file into the structured form I would do the formatting transformations on. Since a code formatter is working entirely by changing whitespace alone, having runs of whitespaces represented in an explicit form much easier, something a generic parser wouldn’t have done.

While Prolog does have the aforementioned prolog_read_source_term/4, which will give the contents of a source file in a structured form and include information about comments and the positions of the terms in the source file, it wasn’t in a format that would be easy to work with directly. The predicate gives the location information of subterms in a nested form, so a file like this:

foo(X, Y) :-
    Y is X * 2.

Would be parsed like this:

term_position(
    0, 27, 10, 12,
    [ term_position(0,9,0,3,[4-5,7-8]),
      term_position(17,27,19,21,[17-18,term_position(22,27,24,25,[22-23,26-27])])
    ])

That is, a bunch of nested terms whose structure mirrors that of the parsed source itself.

While that’s all well and good for more typical consumers of this library who are probably wanted to work on that nested representation, for a formatter, I’m more interested in the actual textual order of the terms in the source file. Going with the approach of writing my own DCG to parse a Prolog file into the sort of “flat” format I want though promised to be a challenge, given the rich syntax of the language.

So, I compromised and wrote a bunch of code that takes in terms with the position data in the above format and transforms it to a flat list of tokens, indicating things like “term begins here”, “open list here”, “simple term here”. Then I take another pass that generates the explicit whitespace tokens by comparing distances between the start and end of those flat terms and determining where newlines go. From there, it’s a “simple matter of coding” to manipulate whitespace and align things in a subjectively nice way.

This alignment is also much harder than it was for the Lisplike, since there are so many different constructs that people want to be treated differently. I think the current state is enough for a release, but I look forward to continuing to update this and hopefully get feedback from people about what they do & don’t like!

The LSP server can be installed by running swipl pack install lsp_server (or ?- pack_install(lsp_server) from a repl) and configured for your editor using the instructions in the README. Once installed, the formatter can be run stand-alone with swipl formatter <file> too!