Nix NYC 4/27/26
See you at the Nix NYC meetup in two weeks! We’re looking forward to hosting again.
29 W 30TH STR FL 11NEW YORK NY 10001
See you at the Nix NYC meetup in two weeks! We’re looking forward to hosting again.
29 W 30TH STR FL 11Tip for when you’re struggling to activate a nixos or nix-darwin config: if you can build it, but the switch fails, you can run the activation script manually. A NixOS or nix-darwin config isn’t special; it’s an “activation script”:
$ sudo nixos-rebuild switch ...
=
$ nixos-rebuild build ... $ sudo ./result/activate
And with flakes in particular, you get:
$ sudo nixos-rebuild switch --flake .#foo
=
$ nix build .#nixosConfigurations.foo.config.system.build.toplevel $ sudo ./result/activate
(Or `darwin-rebuild ...` and `.#darwinConfigurations`, respectively)
The latter can be useful to bootstrap a darwin configuration on a system without darwin-rebuild, or with a broken darwin config. Or just to peek inside a given os configuration’s files, without installing it.
In our NixOS tests we often spin up datastores -- dynamodb, elasticmq, redis, etc. But they are way too eager to say they are ready, which causes dependent tasks to fail to connect.
We realized we could add a oneliner to the systemd config for these
services which waits until the port for the datastore is open. Then,
downstream tasks waiting for `default.target` will not
start until the datastores are actually ready.
`until nc -z localhost "$1"; do sleep 1; done`
We put it in wait-for-port.
Usage in our case looks like adding to the systemd config and
leveraging the
`postStart`
option. E.g.
systemd.services.elasticmq = {
postStart = "wait-for-port ${toString config.services.elasticmq.port}";
path = [ wait-for-port ];
};
As an aside, we found that the first result for netcat on NixOS
Search,
netcat-gnu, works on darwin but did not work in a NixOS test in Linux, in such
a way that caused `wait-for-port` to hang forever... It
was last updated in 2006 and lives on sourceforge.
While building dune2nix, Shun ran into some lockfile issues in Dune and submitted patches to fix them:
Dune has a Nix flake, so you can try these changes with: `nix run github:ocaml/dune#dune`
We open-sourced dune2nix, a Nix library to turn Dune-based OCaml projects into Nix derivations.
Like uv2nix and package-lock2nix, `dune2nix` parses Dune's lockfiles fully at Nix eval time, which gives us: no codegen, no Import From Derivation (IFD), no hardcoded hash.
Released under AGPLv3 (but open to other licenses)
Apparently `xargs -n 1 𝑥` ≠ `xargs -I % 𝑥 %`
$ echo a b c | xargs -n 1 echo a b c $ echo a b c | xargs -I % echo % a b c
No, you must instead:
$ echo a b c | xargs -n 1 | xargs -I % echo % a b c
Why are you like this, POSIX? T_T
We use CDKTF but now that it's deprecated, providers stopped publishing bindings. While we're migrate off it, we need to generate provider bindings ourselves in the meantime. The official way of doing this is by running `cdktf get`, but we got nerdsniped (as always), and wrote a little derivation that generates provider bindings at Nix build time, in a sandboxed environment:
{
cdktf-cli,
writableTmpDirAsHomeHook,
nodejs,
terraform,
stdenv,
writeTextFile,
lib,
# We use https://github.com/nix-community/nixpkgs-terraform-providers-bin
#
# Something like:
#
# inputs.nixpkgs-terraform-providers-bin.legacyPackages.${system}.providers;
terraform-providers,
}:
let
# Providers you want to generate bindings for
providers = with terraform-providers; [
hashicorp.aws
hashicorp.random
hashicorp.null
];
# Language of the bindings
language = "typescript";
# Minimal cdktf.json used for geneting the bindings.
cdktfJson = writeTextFile {
name = "cdktf.json";
text = builtins.toJSON {
inherit language;
app = "unused-can-be-anything";
terraformProviders = map (
provider:
# Assuming registry.terraform.io because nixpkgs-terraform-providers-bin has
# an everlasting TODO: https://github.com/nix-community/nixpkgs-terraform-providers-bin/blob/4f8dfea41cd94403a6c768923b3ddcb15fd4c611/default.nix#L26
lib.replaceString "registry.terraform.io/" "" provider.provider-source-address
) providers;
};
};
in
stdenv.mkDerivation {
name = "cdktf-bindings";
nativeBuildInputs = [
cdktf-cli
nodejs
(terraform.withPlugins (_: providers))
# cdktf wants to write in homedir for cache
writableTmpDirAsHomeHook
];
dontUnpack = true;
# Disable telemetry, requires internet access.
CHECKPOINT_DISABLE = 1;
buildPhase = ''
cp ${cdktfJson} cdktf.json
cdktf get
'';
installPhase = ''
mkdir -p $out
cp -r .gen/* $out/
'';
}
Writing this was fun, but maintaining it would not be fun. Given that CDKTF is officially deprecated, we have chosen to just directly vendor the bindings for now, while we migrate off of CDKTF entirely. That being said, we thought this was a cool little snippet and rather than bin it, we wanted to send it out into the ether on its own journey. Maybe he can find someone out there who can properly appreciate him :)
さよなら、CDKTF。
To populate your binary cache with e.g. the last 24 hours’ worth of derivations from your machine’s /nix/store you can use:
$ nix path-info --all --json \
| jq -r 'with_entries(select(.value.registrationTime > (now - 60 * 60 * 24))) | keys | .[]' \
| xargs -r nix copy --to ....
Or for Cachix you can use:
...
| xargs -r cachix push my-cache-name
Related: neither of these tools’ native concurrency or chunking primitives seem to be quite as reliable as just plain old multi process parallelism using xargs. In the end, this always wins:
...
| xargs -P 20 -n 1000 -r ...
Kind of sad. :(
Our brrr workers use BLPOP to consume jobs from Redis. After enabling redis-py's built-in retry (which covers ConnectionError and TimeoutError), alerts kept firing during failovers:
ResponseError: UNBLOCKED force unblock from blocking operation, instance state changed
Redis sends this when it boots a blocked client during failover. You can't just add `ResponseError` to `retry_on_error` -- it's the base class for OOM, READONLY, NOPERM and others, most of which indicate a persistent problem. And redis-py doesn't expose a structured error code for types it doesn't recognize. For `UNBLOCKED`, you just get a generic `ResponseError` with the raw message string.
So we subclassed `Retry` to parse the RESP error type ourselves and only retry on `UNBLOCKED`:
def _redis_response_error_type(exc: ResponseError) -> str:
message = str(exc).strip()
if not message:
return ""
return message.split(None, 1)[0].upper()
class CustomRetry(Retry):
async def call_with_retry(self, do, fail):
# same retry loop as Retry, but with an extra branch:
while True:
try:
return await do()
except ResponseError as error:
if _redis_response_error_type(error) == "UNBLOCKED":
... # backoff and retry
raise # OOM, READONLY, etc. -- don't retry
except self._supported_errors as error:
... # backoff and retry (ConnectionError, TimeoutError)
Replace `Retry` with `CustomRetry` and done. Alerts resolved!
We’re fans of csvtk, a CLI toolkit to manipulate CSV/TSV files and pipelines in scripts. It makes for some elegant combinations with jq and awscli2 when building cleanup scripts etc.
It wouldn’t be a CLI if it didn’t have some odd gotchas. Today:
$ printf '2026-03-04T17:00:00-04:00\tfoo\n1998-01-01T00:00:00+00:00\tbar\n' > data.tsv $ <data.tsv csvtk add-header -tn date,name | csvtk filter2 -stf '$name < "goo"' date name 2026-03-04T17:00:00-04:00 foo 1998-01-01T00:00:00+00:00 bar $ <data.tsv csvtk add-header -tn date,name | csvtk filter2 -stf '$name < "doo"' date name 1998-01-01T00:00:00+00:00 bar $ <data.tsv csvtk add-header -tn date,name | csvtk filter2 -stf '$name < "aaa"' date name $ <data.tsv csvtk add-header -tn date,name | csvtk filter2 -stf '$name > "aaa"' date name 2026-03-04T17:00:00-04:00 foo 1998-01-01T00:00:00+00:00 bar
So far, so good. But:
$ <data.tsv csvtk add-header -tn date,name | csvtk filter2 -stf '$date > "aaa"' [WARN] row 1: Value '1.772658e+09' cannot be used with the comparator '>', it is not a number [WARN] row 2: Value '8.836128e+08' cannot be used with the comparator '>', it is not a number date name $ <data.tsv csvtk add-header -tn date,name | csvtk filter2 -stf '$date > "2026"' [WARN] row 1: Value '1.772658e+09' cannot be used with the comparator '>', it is not a number [WARN] row 2: Value '8.836128e+08' cannot be used with the comparator '>', it is not a number date name
What‽
Gemini has no idea. Thankfully, we have Shun, who figured out that:
Date constants (single quotes, using any permutation of RFC3339, ISO8601, ruby date, or unix date; date parsing is automatically tried with any string constant)
- https://github.com/Knetic/govaluate
Sure enough, if you use a “fuller” date:
$ <data.tsv csvtk add-header -tn date,name | csvtk filter2 -stf '$date > "2026-01-01"' date name 2026-03-04T17:00:00-04:00 foo $ <data.tsv csvtk add-header -tn date,name | csvtk filter2 -stf '$date < "2026-01-01T00:00:00+00:00"' date name 1998-01-01T00:00:00+00:00 bar
Thanks, Shun and Shen ☺
Our CDN now transparently binds all access tokens to the IP of the client. CloudFront Functions make this relatively pain free and fool proof.
When the origin server gives a web browser a login token, it mints a JWT and puts it in a `Set-Cookie` header. This token is effectively equivalent to a username + password + 2FA combo for the duration of the session. We’ve set up two CloudFront functions: one to add a `clientAddress` to every outgoing JWT (and resign), and one to validate it on any incoming token. The origin server is none the wiser, but if any token ever leaks, it can only be used if you can convince CloudFront that you come from the same IP as the original user.
Relevant excerpt from the “clientAddress enricher”:
const cookie = response.cookies["access_token"];
if (!cookie) {
return response;
}
const decoded = _jwt_decode(cookie.value, secret);
const payload = decoded.payload;
payload["clientAddress"] = event.viewer.ip;
const toSign = decoded.header + "." + Buffer.from(JSON.stringify(payload)).toString("base64url");
response.cookies["access_token"].value = toSign + "." + _sign(toSign, secret, SIGNING_METHOD);
return response;
... and the “jwt validator”:
if (payload.clientAddress && payload.clientAddress != event.viewer.ip) {
throw new Error("viewer ip does not match token clientAddress");
}
An important reason this works for us: our users don’t use mobile. We only serve people on desktops with (relatively) static IPs. This technique won’t work for an arbitrary B2C website.
We maintain a brrr SDK in TypeScript and Python. They both provide implementations for the same backing datastructures, and those classes provide the same docstrings. To avoid them going out of sync, Shun created a tool called `docsync`. It scans docstrings with a <docsync>SomeKey</docsync> tag using treesitter, and compares them to be equal across both languages. E.g.:
/**
* A full brrr request payload.
*
* This is a low-level brrr primitive.
*
* The memo key must be generated by the instantiator of this class, and it
* must be deterministic: the "same" args and kwargs must always encode to the
* same memo key.
*
* Using the same memo key, we store the task and its argv here so we can
* retrieve them in workers.
*
* <docsync>Call</docsync>
*/
export interface Call {
...
and:
@dataclass
class Call:
"""A full brrr request payload.
This is a low-level brrr primitive.
The memo key must be generated by the instantiator of this class, and it
must be deterministic: the "same" args and kwargs must always encode to the
same memo key.
Using the same memo key, we store the task and its argv here so we can
retrieve them in workers.
<docsync>Call</docsync>
"""
We hooked it up to `nix flake check` so it’s automatically checked in CI.
It’s in brrr @ 137527a but we’ll probably move it out to its own repo at some point.
We’ll be hosting the next Nix NYC meetup, 3/18/26. See you there!
Yesterday, Ben noticed this blog’s contents weren’t refreshing, even if you explicitly clicked refresh; seeing changes required a hard refresh. Let’s look at the headers:
$ curl -D /dev/stderr -s -o /dev/null https://電.anterior.app/auth/login.html HTTP/2 200 content-type: text/html content-length: 11233 date: Fri, 27 Feb 2026 20:42:51 GMT cache-control: max-age=86400 accept-ranges: bytes last-modified: Thu, 01 Jan 1970 00:00:01 GMT vary: accept-encoding x-cache: Miss from cloudfront via: 1.1 a086f9674a01c7542c440ffacd39476a.cloudfront.net (CloudFront) x-amz-cf-pop: JFK52-P9 x-amz-cf-id: 7_XCBzHLxLFTjlJuOa1cG0WLhZv_yQ_pZfYopz23SUWy0KJGkgn4IQ== x-frame-options: DENY content-security-policy: connect-src 'self' https://anterior-master-platform.s3.us-east-2.amazonaws.com/artifacts/ https://anterior-master-platform.s3.us-east-2.amazonaws.com/uploads/; default-src 'none'; font-src 'self'; form-action 'self' https://anterior-master-platform.s3.us-east-2.amazonaws.com/uploads/; img-src 'self'; manifest-src 'self'; media-src 'self'; script-src-elem 'self'; style-src-elem 'self'; upgrade-insecure-requests ; worker-src 'self'; x-content-type-options: nosniff strict-transport-security: max-age=31536000; includeSubDomains; preload
What’s that Last-Modified header? That’s the time to which all files are set when stored in the /nix/store:
$ nix eval --raw --expr 'builtins.toFile "foo" "hello\n"' | xargs -r date -u -Iseconds -r 1970-01-01T00:00:01+00:00
Unfortunately, even when you click refresh, a browser will send the If-Modified-Since header, and the server will say: nope, nothing changed since you last loaded this page; 304 Not Modified. And the browser won’t get the new content.
So the solution would seem to be: stop static-web-server from sending the Last-Modified header when that’s the value? A grep through their source code finds this:
// If the file's modified time is the UNIX epoch, then it's likely not valid and should
// not be included in the Last-Modified header to avoid cache revalidation issues.
let modified = meta
.modified()
.ok()
.filter(|&t| t != std::time::UNIX_EPOCH)
.map(LastModified::from);
They already thought of it. So why isn’t it working for us? Taking a closer look at that timestamp from the nix store: apparently it’s *1 second* after the epoch. Not exactly the epoch. Sure enough, the Nix source code confirms:
const time_t mtimeStore = 1; /* 1 second into the epoch */
Nooo. What’s easier, patching Nix, or patching static-web-server? Let’s try our hand at editing some Rust through sed through Nix, in an overlay on our monorepo’s nixpkgs instance:
overlays = [
(self: super: {
...
static-web-server = super.static-web-server.overrideAttrs {
prePatch = ''
${self.gnused}/bin/sed \
-i \
-e 's/\(\.filter.*t\) != .*UNIX_EPOCH/\1 > (std::time::UNIX_EPOCH + std::time::Duration::from_secs(1))/' \
src/response.rs
'';
# Some tests which implicitly relied on the above behavior now
# break. Force an mtime update to fix.
postUnpack = ''
find . -exec touch -m {} +
'';
};
};
];
Rebuild the web server and run it locally to test:
$ curl -D /dev/stderr -s -o /dev/null http://localhost:12345/auth/login.html HTTP/1.1 200 OK content-length: 11233 content-type: text/html accept-ranges: bytes vary: accept-encoding cache-control: max-age=86400 date: Fri, 27 Feb 2026 20:57:34 GMT
Change a CSS rule, do a regular refresh, and: it works :)
AWS Struggle of the day: graceful exit of ECS tasks handling long running async jobs.
The clearest signal that ECS wants you to terminate is a SIGTERM, eventually followed by a SIGKILL. The maximum grace period ECS grants you is 2 minutes. 2 minutes is too short for our long running async tasks. :(
It seems we are not alone. For such cases, ECS introduced task termination protection: tasks can self identify as protected, escaping downscaling until they’re done. This definitely solves the problem for fleet with <1✕ sustained job / worker load, notably auto scaling fleet without parallel handling of jobs by workers. But if your workers support handling concurrent jobs, it’s unlikely they’ll ever be completely out of any work. And until they get a signal, they don’t know whether or not they’re “old”. :((
We settled on workers just scheduling themselves to gracefully exit every hour, so even in times of sustained load there will be task rescheduling events which will give ECS the opportunity to upgrade the tasks. But it’s convoluted, and it’s a hack on top of another hack. Wouldn’t it be nicer if you could just set a delay of 2 hours between SIGTERM and SIGKILL, instead of 2 minutes?
Our new favorite nix command is `nix flake archive`: copy all flake inputs to your store, and/or to a binary cache. Goes very nicely with `nix copy` to ensure private substituters always have all your flake inputs cached.
To pipe this into `nix copy` (or Cachix’s `cachix push`), use:
nix flake archive --json \ | jq '.. | .path? | strings' \ | xargs nix copy --to ... # or: cachix push my-cachix-bin
The implementation is surprisingly simple.
Does anyone know how you’re supposed to just build a flake app (not program) without running it? Best we could come up with is:
nix eval --raw --impure --expr \
'let
f = builtins.getFlake "git+file://${toString ./.}";
prg = f.apps.${builtins.currentSystem}.foobar.program;
in
builtins.head (builtins.attrNames (builtins.getContext prg))' \
| xargs -r nix-store -r
Surely there has to be a better way...
We open sourced our codegen flake module for declaring auto generated files in your flake.
Usage is as simple as:
$ nix run .#codegen
and:
$ nix flake check
We installed an edge function in Cloudfront to validate any JWTs were signed by a known JWT key. Copied almost verbatim from the Cloudfront docs.
We explicitly whitelisted certain subdirectories from this check, `/auth/*` among others, to allow unauthenticated users to log in. That’s why we host this page on `/auth/login.html` ☺
The benefit: extremely small surface area for the code which does JWT validation. Severely limits impact of large amount of potential bugs in the origin.
When you publish a flake, a sane base level sanity check is usually: do my exposed packages at least build? The checkBuildAll flake module does that:
inputs.anterior-tools.url = "github:anteriorcore/tools";
...
flake.parts.lib.mkFlake { inherit inputs; } {
imports = [
inputs.anterior-tools.flakeModules.checkBuildAll
...
Now, `nix flake check` builds everything exposed through your flake’s `packages`.
From our nix tools repo.
We’ll be at the NY Nix Meetup this Wednesday. Looking forward to it!
We also open sourced brrr; a library-only, high performance, bring-your-own-infra workflow scheduler. Crucial feature: no central orchestrator → no single point of failure.
TypeScript and Python implementations provided. Nix powered demo in the repo. Under active development.
Shun submitted patches to include elasticmq and dynamodb-local in services-flake. They both got merged, so you can now easily use them in process-compose:
services.dynamodb-local.mydynamodb.enable = true; services.elasticmq.myelasticmq.enable = true;
We open sourced package-lock2nix, a tool to build NPM projects with a package-lock.json directly in Nix. Full package-lock.json parsing is done at eval time, meaning no separate `*2nix` command stage to run. Just `nix build` your project directly, and manage the package-lock.json file itself with regular build tools like npm.
Released under AGPLv3 (but open to other licenses)
Launched the anterior dev log. We’re hosting it under /auth/login.html because that’s the only path that our edge functions allow through unauthenticated.
Chose a non-ascii app name to test the system’s handling of unicode.