Thread

🛡️
Article header

The Gap Between PIR and Nostr: Open Problems in Private Relay Queries

Can PIR hide Nostr queries from relays? Compound filters and subscriptions don't map to existing schemes. Here are the open problems.

The Query Surveillance Problem

Nostr separates identity from servers. Your keypair is yours. You can publish to any relay, read from any relay, and move between them freely. This is a real improvement over platforms where your identity lives in a corporate database.

But there is a gap in this design, and it raises a question: can you query a relay without the relay learning what you queried? Every time your client opens a subscription, the relay sees exactly what you asked for. A filter requesting notes from Alice, Bob, and Carol tells the relay you follow Alice, Bob, and Carol. A filter requesting events that tag your pubkey tells the relay which account is yours. The relay builds a complete picture of your social graph and reading behavior in real time.

This enables two threats that encryption cannot solve. First, reader censorship: a relay can silently omit events matching certain criteria, and you have no way to detect the omission. Second, behavioral surveillance: even if you use a VPN and rotate IP addresses, your query patterns fingerprint you uniquely.

The query surveillance problem is distinct from the publicness of follow lists. NIP-02 contact lists are intentionally public because they enable web of trust verification. The problem is that real-time query observation lets relays correlate your reading behavior, build attention profiles, and selectively filter your view of the network.

The PIR Promise

Private Information Retrieval, developed in cryptographic research since 1995, directly addresses this class of problem. A PIR protocol lets a client retrieve an item from a database without the server learning which item was retrieved. The server processes the query, returns a result, and learns nothing about what the client wanted.

Recent PIR schemes have become remarkably practical. SimplePIR achieves 10 gigabytes per second throughput on commodity hardware. HintlessPIR eliminates the need for clients to download database-dependent state. Keyword PIR allows queries by key rather than by row number. The theoretical foundations exist, and the performance is no longer a barrier.

If you could apply PIR to Nostr relay queries, the relay would process your filter and return matching events without learning what you filtered for. Your follow list would remain private at query time. Reader censorship would become impossible because the relay couldn't identify which events to censor. The gap in Nostr's privacy model would close.

Why It Doesn't Map

The problem is that Nostr filters are not simple key lookups. A NIP-01 filter combines multiple conditions: a set of authors (OR'd together), a set of kinds (OR'd together), a time range, tag constraints, and a limit. These conditions are AND'd within a filter, and multiple filters in one REQ are OR'd together.

PIR, in all its current variants, retrieves one item at a known position. Keyword PIR extends this to retrieve one item by key. Batch PIR retrieves multiple items at known positions. Range PIR retrieves items within a key range. None of these primitives directly support compound predicates with variable result cardinality.

Consider a filter like: authors in {Alice, Bob}, kinds in {1, 6, 7}, since yesterday, limit 50. This asks for the fifty most recent notes and reposts from two authors. The result set could contain zero events or hundreds of candidates before the limit applies. The relay must evaluate a conjunction of conditions across an inverted index, sort by timestamp, and truncate. There is no existing PIR scheme that hides this query while computing this result.

Open Research Questions

The gap between PIR's capabilities and Nostr's requirements suggests a research agenda. These are not implementation details. They are open problems in applied cryptography.

Compound Predicate PIR. Existing PIR handles single-key lookup. Nostr needs conjunction and disjunction across multiple fields. Can you compose PIR queries for separate indexes and intersect client-side? What leakage does this create? If a client issues PIR queries against an author index and a kind index separately, the relay learns that some author and some kind were queried, even without learning which ones. Is there a PIR-compatible index structure that evaluates conjunctions natively?

Variable Cardinality Results. PIR returns fixed-size responses. Nostr filters match variable numbers of events. If you pad all responses to a maximum size, you leak the upper bound and waste bandwidth. If you return variable sizes, you leak exact cardinality, which may fingerprint the queried accounts. Can you hide result cardinality without padding to worst-case size?

Range Queries. The since and until parameters define time ranges. PIR retrieves specific positions. Existing range PIR requires knowing the result count in advance. Can you partition by time buckets and PIR-query buckets? At what granularity? Does querying a specific hour bucket leak information about the accounts you follow?

Inverted Index Queries. Tag filters like #p ask for events that mention a pubkey. This is an inverted lookup: not "events by author X" but "events tagging X." Do you need separate PIR databases for forward and inverted indexes? How many index structures must a relay maintain?

Limit and Pagination. A limit parameter requests the top k results by timestamp. PIR cannot compute "top k" without knowing positions. Can you structure indexes so that recent events occupy known positions? How do you paginate? If you query pages sequentially, does the pattern leak that you are paging through one account's history?

Subscription vs. Request-Response. Nostr REQ creates a persistent subscription. New events stream to the client as they arrive. PIR is inherently request-response. Is polling acceptable? At what frequency? Does polling cadence leak activity patterns? Can you PIR-query a "new events since timestamp T" structure? Live-query PIR does not exist in the literature. Is this a fundamental limitation or unexplored territory?

Multi-Relay Coordination. NIP-65's outbox model means each author publishes to a few specific relays. To read from multiple authors, you query multiple relays. Multi-server PIR assumes you can choose servers freely. The outbox model constrains this. If relays you query collude, they reconstruct your follow list. Can you add noise queries to relays you don't need?

Client State. Hint-based PIR requires clients to store hints per database. With outbox routing, you might query fifty relays. At 121 megabytes per hint, that's six gigabytes of client state. HintlessPIR eliminates hints but increases response size. What's acceptable on mobile? Can hints be shared or compressed?

Index Freshness. Nostr relays receive events continuously. PIR databases require preprocessing. SimplePIR's hint depends on database contents. When the database changes, hints become stale. HintlessPIR handles updates gracefully, but the underlying index still needs rebuilding. Can you maintain hot and cold partitions? How do you handle queries during index rebuilds?

Economic Viability. PIR imposes compute costs. SimplePIR processes 10 gigabytes per second, but a popular relay serving thousands of concurrent users may not keep up. Who pays for the additional compute? Can users pay per-query without payment metadata leaking their identity? Do economics push toward centralization?

The Honest Assessment

This is not a blog post announcing a solution. It is a post identifying a problem that does not yet have a solution.

The cryptographic primitives for private database queries exist and are practical. The problem is that Nostr's query model is significantly more complex than what these primitives currently support. Compound predicates, variable cardinality, streaming subscriptions, and constrained multi-server topologies each represent open problems. Their conjunction may require fundamentally new constructions.

There are researchers working on related problems. Keyword PIR, batch PIR, and range PIR are active areas. Incremental PIR addresses dynamic databases. But the specific combination of requirements that Nostr presents does not appear in the literature. Someone needs to write the papers that bridge this gap.

If you are a cryptographer looking for applied problems with real-world impact, the Nostr query privacy problem is waiting. The stakes are concrete: censorship resistance for a communication protocol used by real people. The constraints are well-defined. The gap between current capabilities and requirements is clear.

What's missing is the research.

Replies (0)

No replies yet. Be the first to leave a comment!