Christian Decker made a side comment on my PR about a potential further optimization for Postgres. I spent an hour on it, trying to get it to fit in our db API. Since my wife was out with friends last night, so I dived into the implementation in earnest after the kids in bed. Man, this is a horrible API which forces you to understand the design and history of PostgreSQL. Until v14, the async API only supports one request at a time. (Surprise!!) You must explicitly turn on pipelining. Then you get one or more responses per request, with NULL between them. This is because you can have multiple SQL statements per string, or other modes which split responses like this (?). You can no longer use non-pipelining APIs, once you turn this on. You can't use PQsendQuery in pipeline mode. You need to use the lower level PQsendQueryParams with NULL params. This is documented as literally "You can't use this in pipeline mode" without explanation (the docs otherwise imply it's just a convenient wrapper). This is why you should always check your error returns! And your code will still block forever. You need to explicitly flush the pipeline, otherwise it's cached locally. There are multiple different APIs to do this, and I'm not sure which to use yet. Also, if you get an error in a SQL query, you need to drain the pipeline, turn pipeline mode off and on again. Finally, the documentation warns that you are in danger of deadlock unless you turn on non-blocking mode. This makes some sense: the server won't read more commands if you're not reading, but I would prefer it to buffer somewhere. This whole API seems to be implemented by and for people who have deep familiarity with PostgreSQL internals. Hope the latency gain for CLN is worth it!
I think I got nerd sniped into implementing online compaction for CLN's gossip store. Mainly from people agitating that we should use a DB for gossip messages. This will speed startup and reduce memory usage. And it's only going to take me a couple of days' work, depending on how many side-cleanups I do. It can be hard to tell a flawed implementation of a good idea from a bad idea. But trust me, this is gonna be great!
The last CLN release was terrible for really large nodes: the bookkeeper migrations were exponentially slow on Postgres, then the first call to bookkeeper would grind the node and even OOM it. Some quickbfixes went in, and there will probably be another point release with those. Thanks to those who donated their giant accounting dbs! But the exciting thing is all the optimization and cleanup work that came from me profiling how we handle this large data. Not just reducing the time (those single linked lists being abused!) which I've been spending the last weeks on, but last night I started measuring *latency* while we're under stress. 1. Anyone can spray us with requests: the initial problem was caused by bookkeeper querying a few hundred thousand invoices. 2. A plugin can spew almost infinite logs. 2. xpay intercepts every command in case xpay-handle-pay is set, so it can intercept "pay" commands. Fixes: 1. Yield after 100 JSON commands in a row: we had logic for autoclean, but that was 250ms, not count based. We now do both. 2. Fix our I/O loop to rotate through fds and deferred tasks, so yield is fair. 3. Yielding on plugin reply RPC too, which definitely helps. 4. Hooks can now specify an array of filter strings: for the command hook, these are the commands you're interested in. As a result, my hand-written sleep1/lightning-cli help test spikes at 44ms, not 110 seconds! My next steps are to use synthetic data (say, 10M records to allow for future growth) and formalize latency measurements into our CI. Next release looking amazing!
I think finding a bug where printf("%*.s") was used instead of printf(".*s") was the point at which I realized that some C issues cannot be mitigated with better tooling...
I had way too much fun optimizating CLN due to a Boltz report recently. Like most engineers, I love optimizing, but bitter experience means I only allow myself to do so when I have an actual problem to solve. My favourites: 1. Adding 1M entries to a sqlite3 table with two indexes: 81 seconds. Adding entries *then* creating indices: 17 seconds. 2. Optimization of O(n²) cleanup code by realizing it could be deleted entirely. I admit I continued to optimize long after their performance problem was fixed, but you gotta have *some* fun!
I hate price talk, but if you're going to do it, please understand that "market cap" is a very rough *ceiling* on current value. It's neither the amount of money which has gone in, nor the amount of money which can come out. So the order of magnitude is useful to compare against other assets. But abusing it in terms of profits and losses is a category error, and I assume done mainly because it's so easy to measure. Grump over.
That's in 10 minutes! View quoted note →
In two days' time Julian and I will be doing an open Jitsi meeting to discuss the work on Script Restoration. Come and ask questions! 1pm Berlin time:
Doing another open call on Script Restoration BIPs in about 11.5 hours, 1pm Berlin. Bring your questions!