Introduction
nph
is an opinionated source code formatter for the Nim language, aiming to
take the drudgery of manual formatting out of your coding day.
Following the great tradition of black
,
prettier
, clang-format
and other AST-based formatters, it discards existing styling to create a
consistent and beautiful codebase.
Priorities
nph
aims to format code in such way that:
- it remains semantically unchanged, aka correct (!)
- the AST is checked for equivalence before writing the formatted code to disk - on mismatch, the code is left untouched
- it remains simple, consistent and pleasant to read
- most code should look as good as or better than its hand-formatted counterpart
- diffs are kept at a minimum
- diff-inducing constructs such as vertical alignment are avoided, for more productive merges
- formatting the same code again results in no differences
- it broadly follows the Status Nim style guide
and NEP1
- this is tool aimed at making collaboration easier, with others and your future self
- where NEP1 contradicts itself or these priorities, these priorities have precedence
The formatting rules are loosely derived from other formatters that already have gone through the journey of debating what "pleasing to read" might mean while making adaptations for both features and quirks of the Nim parser.
If in doubt, formatting that works well for descriptive identifiers and avoids putting too much information in a single like will be preferred.
If something breaks the above guidelines, it's likely a bug.
Installation
Binaries
Download binaries from the releases page.
Install via nimble
nph
can be compiled or installed using nimble
v0.16.4+:
# Install globally
nimble install nph
# Alternatively, clone and build:
git clone https://github.com/arnetheduck/nph.git
cd nph
nimble setup -l
nimble build
nph
requires an specific version of nim
during the build process since it
reuses parts of the compiler whose API frequently changes - this may lead to
nim
itself being built as part of the installation process!
For bonus points, replace nimpretty
with a symlink to nph
- similar
command line options are supported ;)
Editor integration
- VSCode (
ext install NimLang.nimlang
) extension via nimlangserver that supportsnph
out of the box - NeoVim - Install neoformat in your neovim setup then add the nim formating option with nph with this option in init.vim
let g:neoformat_enabled_nim = ['nph']
- Zed Editor - Use this in your editor settings
"languages": {
"Nim": {
"formatter": {
"external": {
"command": "nph",
"arguments": ["-"]
}
}
}
}
- vscode-nph (
ext install arnetheduck.vscode-nph
) for a formatting-only option for the official Nim extension.
Continuous integration
Check out the companion Github Action for a convenient CI option!
Usage
# Format the given files in-place
nph file0.nim file1.nim
# Format the given files, writing the formatted code to /tmp
nph file0.nim file1.nim --outdir:/tmp
# Format an entire directory
nph src/
# Use --check to verify that a file is formatted correctly as `nph` would - useful in CI
nph --check somefile.nim || echo "Not formatted!"
# You can format stuff as part of a pipe using `-` as input:
echo "echo 1" | nph -
Disabling formatting locally
You can mark a code section with #!fmt: off
and #!fmt: on
to disable formatting locally:
proc getsFormatted(a, b : int ) = discard
#!fmt: off
let
myHandFormattedList
:
array[3, int]
=
[1, 2, 3]
#!fmt: on
proc hanging(indent: int,
isUgly = true) = discard
To disable formatting for a whole file, simply put #!fmt: off
on top!
Note Internally, #!fmt: off
makes nph treat the section as a big multi-line
comment that it copies over to the formatted code - as such, you must be careful
with indent and adjust your code to the indent that nph
will generate!
The nph
style
As outlined in the introduction, nph
strives
to maintain correctness and consistency across various language constructs with
a preference for styles that work well for collaboration.
This section of the book documents some of the style choices and why they were made.
Overview
nph
generally approaches formatting by considering several choices of
formatting and choosing a reasonable one based on a number of heuristics.
To get an idea what the format looks like, here's a typical proc definition - everything fits on one line, nice!
proc function(par0: SomeType): bool =
...
If we add more arguments, it starts getting long - nph will try with a version where the arguments sit on a line of their own:
proc function(
par0: SomeType, par1: SomeType
): bool =
...
The above idea extends to most formatting: if something is simple, format it in a simple way - if not, use a bit of style to break down what's going on into more easily consumable pieces - here's a function with several information-dense parameters and a pragma:
proc function(
par0: SomeType,
par1, par2: SomeOtherType,
par3 = default(SomeType),
): bool {.inline.} =
...
The examples are illustrative and not based on exact rendering semantics - in particular, a different line length was used to make the point
Lists
Lists appear frequently in source code: import modules, parameter lists, arrays and sequence initializers, function call parameters, etc etc.
Generally, list rendering is done according to a number of heuristics, striving to balance information density with the use of available screen space.
If the whole list fits on the current line, it is rendered in-place. Short sequences, single-parameter functions etc usually fit into this category:
import dir/module
const v = [1, 2, 3]
type T = proc(a, b: int)
If it doesnt fit from the current position, we try fitting it in one line on a new line - this frequently happens with parameter lists and constants where the name takes up space
import
dir/[module1, module2, module3, module4]
const mylongvariablename =
[100000000, 200000000, 300000000]
proc function(
param0: int, param1: int, param2: int
)
If the list still doesn't fit on a single line, we look at the contents to choose between two styles.
If it contains complex complex values, we render one value per row - this happens most often for function parameters and other information-dense constructs.
import
dir/[module1, module2, module3],
dir2/[
module4, module5, module6, module7,
module8, module9,
]
let myVariable = [
functionCall(a, b, c),
functionCall(a, b, c, d),
]
functionCall(
functionCall(a, b, c),
functionCall(a, b, c, d),
)
In the long style, we'll insert an extra separator at the end where permissible - this makes it easier to reorder entries and reduces git conflicts!
For simple values, we use a compact style that fits several items per row:
const values = [
10000000, 2000000000, 3000000000,
40000000, 5000000000,
]
functionCall(
10000000, 2000000000, 3000000000,
40000000, 5000000000,
)
- literals (
2
,"string"
etc) - simple identifiers (
myvar
etc) - dot expressions of the the above (
myObject.field
)
Parameter lists
Parameter lists, such as function parameters and generics, are rendered using the above list style. In the AST, each parameter group is made up of 3 components: one or more names, a type and a default.
If both type and default are missing, we disambiguate parsing multiple names and
groups using a ;
.
# Usually we can use comma to separate items
proc f(a, b: int, c: float)
# A semicolon is necessary to ensure `T` is interpreted as a type and not part
# of the `v: static int` identifier group
proc g[T; v: static int]
# Semicolons are also significant for type-less parameters - the following two
# templates parse to different ASTs:
template weare(a; b) = discard
template notthesame(a, b) = discard
# Semicolons cannot be used at all for inline procedures:
proc f(
myParameter = 0,
callback: SomeCallback = proc() =
discard
,
nextParameter = 1,
)
Infix operators
nph
puts spaces around infix operators such as and
and ..
.
Although NEP1 suggests not having spaces around ..
and ..<
in particular,
this creates an exception to the normal infix spacing rules.
In spite of this recommendation, lots of code out there maintains spaces around the operators which makes decision based on "existing practice" hard.
Adding to the complexity is in order to not break the AST, one would have to
take care to remove the spaces only in cases where the infix is not followed by
another operator (such as -
) - this means that we sometimes have to put
spaces around these infixes and sometimes not, leading to irregularity.
Since there's no consensus in existing code at the time of writing, the rule is
irregular and causes implementation complexity, nph
formats ..
and ..<
with spaces.
Expressions
Expressions appear in many places, such as after certain keywords (return
,
yield
), as part of control flows (if
, while
), in assignments etc.
Whenever possible, nph
will try to keep the full expression on a single line:
let myvariable = shortexpression(abc)
If this is not possible, the second preference is to move the whole expression to a new line, assuming it fits:
let myvariable =
someevenlongerexpression(abc, def)
If the expression still doesn't fit, we'll split it up on multiple lines:
let myvariable = someevenlongerexpression(
aaa, bbb, ccc, ddd
)
Certain expressions linked by related keywords that don't fit on a single line
will also be moved to a new line - for example, a multi-line if
/else
nested
in a return
will be lined up like so:
return
if condition:
complex(call)
else:
alsocomplex(call)
FAQ
Why use a formatter?
A formatter removes the tedium of manually adding structure to code to make it more readable - overlong lines, inconsistent indentation, lack of visual structure and other small distractions quickly nibble away at the mental budget available for writing code while a formatter solves this and many other things at the press of a button.
When you work with others, debates and nitpicking over style go away and collaborative efforts can focus on substance instead.
Finally, the code is likely to look better - manually formatting code takes a lot of effort which ultimately can be spent better elsewhere - as such, poorly formatted code ends up being more common than not.
But I've spent a significant part of my life realigning code and now it's lost!
https://en.wikipedia.org/wiki/Sunk_cost
How do I introduce nph
in an existing codebase?
Assuming git
is used, format all code using nph
, put it in a single commit
and add a CI rule to ensure that future commits are all formatted using the same
nph
version.
Formatting commits can be ignored for the purpose of git blame
by adding a
file named .git-blame-ignore-revs
containing the formatted source code to the
root of the project:
cd myproject
# Format all source code with nph
git ls-files | grep ".nim$" | xargs -n1 nph
# Create a single commit with all changes
git commit -am "Formatted with nph $(nph --version)"
# Record the commit hash in the blame file
echo "# Formatted with nph $(nph --version)" >> .git-blame-ignore-revs
echo $(git rev-parse HEAD) >> .git-blame-ignore-revs
then configure git to use it:
git config --global blame.ignoreRevsFile .git-blame-ignore-revs
The same strategy can be used when upgrading nph
to a new version that
introduces formatting changes.
nph
complains about my code!
One of several things could have happened:
- The code was not valid enough -
nph
can only parse valid Nim grammar and while it would be nice to handle partially formatted stuff gracefully, we're not there yet. - The parser has a bug and is unable to parse valid Nim code
- Probably you can move some comments around to make it work!
- the formatter has a bug and the resulting formatting is invalid
- Probably you can move some comments around to make it work!
- the AST equivalence checker complains
- This often happens in complex expressions such as
do
and parenthesis used for indent purposes where the Nim grammar has ambiguities and parsing complexity - it can usually be worked around by simplifying complex expressions, introducing a template or similar - It could also be that the AST checker is too strict - the Nim parser will generate different AST:s depending on whitespace even if semantically there is no difference
- This often happens in complex expressions such as
Regardless of what happened, nph
takes the conservative approach and retains
the original formatting!
If you have time, try to find the offending code snippet and submit an issue.
Why the cited formatters in particular?
black
because of our syntactic similarity with Python and its stability policyprettier
for its wisdom in how formatting options are approached and for the closeness to user experience of its developersclang-format
for being the formatter that made me stop worrying about formatting- its secret sauce was treating formatting as a balancing of priorities rather than a mechanical stringification using a lowest-penalty algorithm
What is meant by consistency?
- Similar constructs are formatted with similar rules
- Does it look like a list? Format it with list-like rules regardless if its a parameter list, array of values or import list
- Original styling is generally not preserved - instead, the formatting is based on the semantic structure of the program
- Spacing emphasizes structure and control flow to help you read the code
nph
makes your code consistent without introducing hobgoblins in your mind!
Why are there no options?
The aim of nph
is to create a single consistent style that allows you to
focus on programming while nph
takes care of the formatting, even across
different codebases and authors.
Consistency helps reading speed by removing unique and elaborate formatting distractions, allowing you, the experienced programmer, to derive structural information about the codebase at a glance.
The style might feel unfamiliar in the beginning - this is fine and not a reason to panic - a few weeks from now, you'll forget you ever used another one.
Do you accept style suggestions and changes?
Yes! The project is still in its early phase meaning that the style is not yet set in stone.
To submit a proposal, include some existing code, how you'd like it to be formatted and an option-free algorithm detailing how to achieve it and how the outcome relates to the above styling priorities.
When in doubt, look at what other opinionated formatters have done and link to it!
Eventually, the plan is to adopt a stability policy
similar to black
, meaning that style changes will still be accepted, but
introduced only rarely so that you don't have to worry about massive PR-breaking
formatting diffs all the time.
Why does the formatting code look an awful lot like the Nim compiler renderer?
Because it is based on it, of course! As a starting point this is fine but the code would benefit greatly from being rewritten with a dedicated formatting AST - and here we are.
Should it be upstreamed?
Maybe parts - feel free to make PR:s to the Nim repo from this codebase! That said, the aim of a compiler is to compile while a formatter formats - we are not the same.
What about nimpretty
?
nimpretty
formats tokens, not the AST. Use whichever you like better, but keep
a backup if you don't use nph
:)
Why 88 characters?
This is an experiment.
Astute and experienced programmers have noticed two things: longer variable names aren't that bad and monitors have gotten bigger since the 80 standard was set.
Going beyond allows code that uses descriptive names to look better - how much extra is needed here is an open question but 10% seems like a good start for a language like Nim which defaults to 2-space significant indent and a naive module system that encourages globally unique identifiers with longer names.
Automated formatting keeps most code well below this limit but the extra 10% allows gives it some lenience - think of it as those cases where a prorgammer would use their judgement and common sense to override a style guide recommendation.
What about comments?
Comments may appear in many different places that are not represented in the
Nim AST. When nph
reformats code, it may have to move comments around in order
to maintain line lengths and introduce or remove indentation.
nph
uses heuristics to place comments into one of several categories which
broadly play by similar rules that code does - in particular, indentation is
used to determine "ownership" over the comment.
The implementation currently tracks several comment categories:
- comment statement nodes - comments that appear with regular indent in
statement list contexts (such as the body of a
proc
) as represented as such, ie as statement nodes and get treated similar to how regular code would - node attachments - comments that are anchored to an AST node depending on
their location in the code relative to that node:
- prefix - anything leading up to a particular AST node - for example less indented or otherwise appearing before the node
- mid - at midpoints in composite nodes - between the
:
and the body of anif
for example - postfix - appearing after the node, meaning on the same line or more indented than the node
When rendering the code, nph
will use these categories to guide where the
comment text should go, maintaining comment output in such a way that parsing
the file again results in equivalent comment placement.
How are blank lines handled?
Coming up with a fully automatic rendering of blank lines is tricky because they are often used to signal logical groupings of code for which no other mechanism exists to represent them.
nph
current will:
- generally retain blank space in code but normalise it to a single line
- insert blanks around complex statements
This strategy is expected to evolve over time, including the meaning of "complex".
What features will likely not be added?
- formatting options - things that change the way the formatting is done for aesthetic reasons - exceptions here might include options that increase compatiblity (for example with older Nim versions)
- semantic refactoring - the focus is on style only
import
reording in particular changes order in which code executes!
What's with the semicolons?
Nim's grammar unfortunately allows the use of either ,
or ;
in some places
with a subtly different AST being produced which sometimes has a semantic
impact.
Parameters in particular are parsed using identifier groups where each group consists of one or more names followed by an option type and default.
Names are separated by ,
- if the type and default are missing, a ;
is
needed to start a new group or the name would be added to the previous group
if a ;
was used originally to create a new group.
However, if the group has a default, ;
cannot be parsed because it's swallowed
in certain cases (proc
implementations in particular) by the default value
parsing.
As such, nph
will normalise usage of ,
and ;
to:
- Use
,
after a group that has a type and/or default - Use
;
otherwise
Regardless, you can usually type either and nph
will clean it up in such a way
that the AST remains unambiguous, compatible with all possible values and in
line with the common expectation that ,
is used where possible.
Updating this book
The book is built using mdBook, and published to gh-pages using a github action.
# Install or update tooling (make sure you add "~/.cargo/bin" to PATH):
cargo install mdbook --version 0.4.36
cargo install mdbook-toc --version 0.14.1
cargo install mdbook-open-on-gh --version 2.4.1
cargo install mdbook-admonish --version 1.14.0
# Edit book and view through local browser
mdbook serve