Introduction

`nph`` is an opinionated source code formatter for the Nim language, aiming to take the drudgery of manual formatting out of your coding day.

Following the great tradition of black, prettier, clang-format and other AST-based formatters, it discards existing styling to create a consistent and beautiful codebase.

Priorities

nph aims to format code in such way that:

  • it remains semantically unchanged, aka correct (!)
    • the AST is checked for equivalence before writing the formatted code to disk - on mismatch, the code is left untouched
  • it remains simple, consistent and pleasant to read
    • most code should look as good as or better than its hand-formatted counterpart
  • diffs are kept at a minimum
    • diff-inducing constructs such as vertical alignment are avoided, for more productive merges
    • formatting the same code again results in no differences
  • it broadly follows the Status Nim style guide and NEP1
    • this is tool aimed at making collaboration easier, with others and your future self
    • where NEP1 contradicts itself or these priorities, these priorities have precedence

The formatting rules are loosely derived from other formatters that already have gone through the journey of debating what "pleasing to read" might mean while making adaptations for both features and quirks of the Nim parser.

If in doubt, formatting that works well for descriptive identifiers and avoids putting too much information in a single like will be preferred.

If something breaks the above guidelines, it's likely a bug.

Installation

Binaries

Download binaries from the releases page.

Install via nimble

nph can be also compiled or installed using nimble version v0.14.2+:

nimble -l setup
nimble build

Nim version requirement

nph requires an specific version of nim during the build process since it reuses parts of the compiler whose API frequently changes - this may lead to nim itself being built as part of the installation process

For bonus points, replace nimpretty with a symlink to nph - similar command line options are supported ;)

Editor integration

  • VSCode (ext install arnetheduck.vscode-nph)

Usage

# Format the given files in-place
nph file0.nim file1.nim

# Format the given files, writing the formatted code to /tmp
nph file0.nim file1.nim --outdir:/tmp

# Format an entire directory
nph src/

# Use --check to verify that a file is formatted correctly as `nph` would - useful in CI
nph --check somefile.nim || echo "Not formatted!"

# You can format stuff as part of a pipe using `-` as input:
echo "echo 1" | nph -

Disabling formatting locally

You can mark a code section with #!fmt: off and #!fmt: on to disable formatting locally:

proc      getsFormatted(a, b : int    ) = discard

#!fmt: off
let
  myHandFormattedList
        :
   array[3, int]
 =
    [1, 2, 3]

#!fmt: on
proc hanging(indent: int,
             isUgly = true) = discard

To disable formatting for a whole file, simply put #!fmt: off on top!

Note

Note Internally, #!fmt: off makes nph treat the section as a big multi-line comment that it copies over to the formatted code - as such, you must be careful with indent and adjust your code to the indent that nph will generate!

The nph style

As outlined in the introduction, nph strives to maintain correctness and consistency across various language constructs with a preference for styles that work well for collaboration.

This section of the book documents some of the style choices and why they were made.

Overview

nph generally approaches formatting by considering several choices of formatting and choosing a reasonable one based on a number of heuristics.

To get an idea what the format looks like, here's a typical proc definition - everything fits on one line, nice!

proc function(par0: SomeType): bool =
  ...

If we add more arguments, it starts getting long - nph will try with a version where the arguments sit on a line of their own:

proc function(
    par0: SomeType, par1: SomeType
): bool =
  ...

The above idea extends to most formatting: if something is simple, format it in a simple way - if not, use a bit of style to break down what's going on into more easily consumable pieces - here's a function with several information-dense parameters and a pragma:

proc function(
    par0: SomeType,
    par1, par2: SomeOtherType,
    par3 = default(SomeType),
): bool {.inline.} =
  ...

Example styling

The examples are illustrative and not based on exact rendering semantics - in particular, a different line length was used to make the point

Lists

Lists appear frequently in source code: import modules, parameter lists, arrays and sequence initializers, function call parameters, etc etc.

Generally, list rendering is done according to a number of heuristics, striving to balance information density with the use of available screen space.

If the whole list fits on the current line, it is rendered in-place. Short sequences, single-parameter functions etc usually fit into this category:

import dir/module

const v = [1, 2, 3]

type T = proc(a, b: int)

If it doesnt fit from the current position, we try fitting it in one line on a new line - this frequently happens with parameter lists and constants where the name takes up space

import
  dir/[module1, module2, module3, module4]

const mylongvariablename =
  [100000000, 200000000, 300000000]

proc function(
  param0: int, param1: int, param2: int
)

If the list still doesn't fit on a single line, we look at the contents to choose between two styles.

If it contains complex complex values, we render one value per row - this happens most often for function parameters and other information-dense constructs.

import
  dir/[module1, module2, module3],
  dir2/[
    module4, module5, module6, module7,
    module8, module9,
  ]

let myVariable = [
  functionCall(a, b, c),
  functionCall(a, b, c, d),
]

functionCall(
  functionCall(a, b, c),
  functionCall(a, b, c, d),
)

Extra separator

In the long style, we'll insert an extra separator at the end where permissible - this makes it easier to reorder entries and reduces git conflicts!

For simple values, we use a compact style that fits several items per row:

const values = [
  10000000, 2000000000, 3000000000,
  40000000, 5000000000,
]

functionCall(
  10000000, 2000000000, 3000000000,
  40000000, 5000000000,
)

What is simple?

  • literals (2, "string" etc)
  • simple identifiers (myvar etc)
  • dot expressions of the the above (myObject.field)

Parameter lists

Parameter lists, such as function parameters and generics, are rendered using the above list style. In the AST, each parameter group is made up of 3 components: one or more names, a type and a default.

If both type and default are missing, we disambiguate parsing multiple names and groups using a ;.

# Usually we can use comma to separate items
proc f(a, b: int, c: float)

# A semicolon is necessary to ensure `T` is interpreted as a type and not part
# of the `v: static int` identifier group
proc g[T; v: static int]

# Semicolons are also significant for type-less parameters - the following two
# templates parse to different ASTs:
template weare(a; b) = discard
template notthesame(a, b) = discard

# Semicolons cannot be used at all for inline procedures:
proc f(
  myParameter = 0,
  callback: SomeCallback = proc() =
    discard
  ,
  nextParameter = 1,
)

Infix operators

nph puts spaces around infix operators such as and and ...

Although NEP1 suggests not having spaces around .. and ..< in particular, this creates an exception to the normal infix spacing rules.

In spite of this recommendation, lots of code out there maintains spaces around the operators which makes decision based on "existing practice" hard.

Adding to the complexity is in order to not break the AST, one would have to take care to remove the spaces only in cases where the infix is not followed by another operator (such as -) - this means that we sometimes have to put spaces around these infixes and sometimes not, leading to irregularity.

Since there's no consensus in existing code at the time of writing, the rule is irregular and causes implementation complexity, nph formats .. and ..< with spaces.

Expressions

Expressions appear in many places, such as after certain keywords (return, yield), as part of control flows (if, while), in assignments etc.

Whenever possible, nph will try to keep the full expression on a single line:

let myvariable = shortexpression(abc)

If this is not possible, the second preference is to move the whole expression to a new line, assuming it fits:

let myvariable =
  someevenlongerexpression(abc, def)

If the expression still doesn't fit, we'll split it up on multiple lines:

let myvariable = someevenlongerexpression(
  aaa, bbb, ccc, ddd
)

Certain expressions linked by related keywords that don't fit on a single line will also be moved to a new line - for example, a multi-line if/else nested in a return will be lined up like so:

return
  if condition:
    complex(call)
  else:
    alsocomplex(call)

FAQ

Why use a formatter?

A formatter removes the tedium of manually adding structure to code to make it more readable - overlong lines, inconsistent indentation, lack of visual structure and other small distractions quickly nibble away at the mental budget available for writing code while a formatter solves this and many other things at the press of a button.

When you work with others, debates and nitpicking over style go away and collaborative efforts can focus on substance instead.

Finally, the code is likely to look better - manually formatting code takes a lot of effort which ultimately can be spent better elsewhere - as such, poorly formatted code ends up being more common than not.

But I've spent a significant part of my life realigning code and now it's lost!

https://en.wikipedia.org/wiki/Sunk_cost

How do I introduce nph in an existing codebase?

Assuming git is used, format all code using nph, put it in a single commit and add a CI rule to ensure that future commits are all formatted using the same nph version.

Formatting commits can be ignored for the purpose of git blame by adding a file named .git-blame-ignore-revs containing the formatted source code to the root of the project:

cd myproject

# Format all source code with nph
git ls-files | grep ".nim$" | xargs -n1 nph

# Create a single commit with all changes
git commit -am "Formatted with nph $(nph --version)"

# Record the commit hash in the blame file
echo "# Formatted with nph $(nph --version)" >> .git-blame-ignore-revs
echo $(git rev-parse HEAD) >> .git-blame-ignore-revs

then configure git to use it:

git config --global blame.ignoreRevsFile .git-blame-ignore-revs

The same strategy can be used when upgrading nph to a new version that introduces formatting changes.

nph complains about my code!

One of several things could have happened:

  • The code was not valid enough - nph can only parse valid Nim grammar and while it would be nice to handle partially formatted stuff gracefully, we're not there yet.
  • The parser has a bug and is unable to parse valid Nim code
    • Probably you can move some comments around to make it work!
  • the formatter has a bug and the resulting formatting is invalid
    • Probably you can move some comments around to make it work!
  • the AST equivalence checker complains
    • This often happens in complex expressions such as do and parenthesis used for indent purposes where the Nim grammar has ambiguities and parsing complexity - it can usually be worked around by simplifying complex expressions, introducing a template or similar
    • It could also be that the AST checker is too strict - the Nim parser will generate different AST:s depending on whitespace even if semantically there is no difference

Regardless of what happened, nph takes the conservative approach and retains the original formatting!

If you have time, try to find the offending code snippet and submit an issue.

Why the cited formatters in particular?

  • black because of our syntactic similarity with Python and its stability policy
  • prettier for its wisdom in how formatting options are approached and for the closeness to user experience of its developers
  • clang-format for being the formatter that made me stop worrying about formatting
    • its secret sauce was treating formatting as a balancing of priorities rather than a mechanical stringification using a lowest-penalty algorithm

What is meant by consistency?

  • Similar constructs are formatted with similar rules
    • Does it look like a list? Format it with list-like rules regardless if its a parameter list, array of values or import list
  • Original styling is generally not preserved - instead, the formatting is based on the semantic structure of the program
  • Spacing emphasizes structure and control flow to help you read the code

nph makes your code consistent without introducing hobgoblins in your mind!

Why are there no options?

The aim of nph is to create a single consistent style that allows you to focus on programming while nph takes care of the formatting, even across different codebases and authors.

Consistency helps reading speed by removing unique and elaborate formatting distractions, allowing you, the experienced programmer, to derive structural information about the codebase at a glance.

The style might feel unfamiliar in the beginning - this is fine and not a reason to panic - a few weeks from now, you'll forget you ever used another one.

Do you accept style suggestions and changes?

Yes! The project is still in its early phase meaning that the style is not yet set in stone.

To submit a proposal, include some existing code, how you'd like it to be formatted and an option-free algorithm detailing how to achieve it and how the outcome relates to the above styling priorities.

When in doubt, look at what other opinionated formatters have done and link to it!

Eventually, the plan is to adopt a stability policy similar to black, meaning that style changes will still be accepted, but introduced only rarely so that you don't have to worry about massive PR-breaking formatting diffs all the time.

Why does the formatting code look an awful lot like the Nim compiler renderer?

Because it is based on it, of course! As a starting point this is fine but the code would benefit greatly from being rewritten with a dedicated formatting AST - and here we are.

Should it be upstreamed?

Maybe parts - feel free to make PR:s to the Nim repo from this codebase! That said, the aim of a compiler is to compile while a formatter formats - we are not the same.

What about nimpretty?

nimpretty formats tokens, not the AST. Use whichever you like better, but keep a backup if you don't use nph :)

Why 88 characters?

This is an experiment.

Astute and experienced programmers have noticed two things: longer variable names aren't that bad and monitors have gotten bigger since the 80 standard was set.

Going beyond allows code that uses descriptive names to look better - how much extra is needed here is an open question but 10% seems like a good start for a language like Nim which defaults to 2-space significant indent and a naive module system that encourages globally unique identifiers with longer names.

Automated formatting keeps most code well below this limit but the extra 10% allows gives it some lenience - think of it as those cases where a prorgammer would use their judgement and common sense to override a style guide recommendation.

What about comments?

Comments may appear in many different places that are not represented in the Nim AST. When nph reformats code, it may have to move comments around in order to maintain line lengths and introduce or remove indentation.

nph uses heuristics to place comments into one of several categories which broadly play by similar rules that code does - in particular, indentation is used to determine "ownership" over the comment.

The implementation currently tracks several comment categories:

  • comment statement nodes - comments that appear with regular indent in statement list contexts (such as the body of a proc) as represented as such, ie as statement nodes and get treated similar to how regular code would
  • node attachments - comments that are anchored to an AST node depending on their location in the code relative to that node:
    • prefix - anything leading up to a particular AST node - for example less indented or otherwise appearing before the node
    • mid - at midpoints in composite nodes - between the : and the body of an if for example
    • postfix - appearing after the node, meaning on the same line or more indented than the node

When rendering the code, nph will use these categories to guide where the comment text should go, maintaining comment output in such a way that parsing the file again results in equivalent comment placement.

How are blank lines handled?

Coming up with a fully automatic rendering of blank lines is tricky because they are often used to signal logical groupings of code for which no other mechanism exists to represent them.

nph current will:

  • generally retain blank space in code but normalise it to a single line
  • insert blanks around complex statements

This strategy is expected to evolve over time, including the meaning of "complex".

What features will likely not be added?

  • formatting options - things that change the way the formatting is done for aesthetic reasons - exceptions here might include options that increase compatiblity (for example with older Nim versions)
  • semantic refactoring - the focus is on style only
    • import reording in particular changes order in which code executes!

What's with the semicolons?

Nim's grammar unfortunately allows the use of either , or ; in some places with a subtly different AST being produced which sometimes has a semantic impact.

Parameters in particular are parsed using identifier groups where each group consists of one or more names followed by an option type and default.

Names are separated by , - if the type and default are missing, a ; is needed to start a new group or the name would be added to the previous group if a ; was used originally to create a new group.

However, if the group has a default, ; cannot be parsed because it's swallowed in certain cases (proc implementations in particular) by the default value parsing.

As such, nph will normalise usage of , and ; to:

  • Use , after a group that has a type and/or default
  • Use ; otherwise

Regardless, you can usually type either and nph will clean it up in such a way that the AST remains unambiguous, compatible with all possible values and in line with the common expectation that , is used where possible.

Updating this book

The book is built using mdBook, and published to gh-pages using a github action.

# Install or update tooling (make sure you add "~/.cargo/bin" to PATH):
cargo install mdbook --version 0.4.36
cargo install mdbook-toc --version 0.14.1
cargo install mdbook-open-on-gh --version 2.4.1
cargo install mdbook-admonish --version 1.14.0

# Edit book and view through local browser
mdbook serve