Thread - Nostr Hypermedia

Where do web of trust score calculations come from? Some say they should be based on follows. Others say no, we should replace follows with explicit attestations of trust. The article synthesizes these viewpoints to propose that web of trust must incorporate ANY AND EVERY PIECE OF DATA that is available and relevant to the question at hand. For this to work, we propose a process called INTERPRETATION that converts raw data of any format into an internally standardized "ratings format" that can be recognized by our web of trust algorithm of choice.

For nostr, one of the most important questions regarding decentralized Web of Trust (WoT) is this: Where does the trust signal come from?

In other words: How do I know whether Alice is trustworthy? What does trustworthy even mean? Should it be based on follows, or zaps, or interactions, or .... or what?

This question kicks off a train of thought that may seem long and winding, but which leads inevitably to the notion of interpretation. Without interpretation, WoT will forever disappoint. But once we incorporate interpretation into nostr WoT, we will start to see WoT live up to its promise.

We invite the community to consider the thought process in this article from start to finish, discuss it, critique it, add to it, improve upon it, and ultimately, implement it.

Let's Use Follows

Most of us start out thinking NIP-02 follows as the best signal for trust in nostr. Why? Because follows are ubiquitous. Almost everyone uses them, so the data is plentiful, ready and available. Sure, it's a proxy indicator for trust, and an imperfect one at that, but it's better than nothing!

In point of fact, from the early days of nostr, most WoT scores have been based on follows. Some developers add NIP-51 mutes to the mix, but follows have historically done most of the heavy lifting.

But wait! Follows does not equal trust!!

At some point we come to the realization that just because I follow someone, this does not mean I trust that user! Maybe I follow an npub because that user is entertaining. Or maybe I follow a user whom I specifically do NOT trust because I want to keep track of this particular idiot's shenanigans! And even if I do trust that user to bring me entertaining content, that doesn't mean I trust that user to babysit the kids. The point is that a follow does not equal trust ... whatever "trust" even means ... which brings us to the next step in the thought process:

Trust is Contextual

I might trust Bob to do my taxes. That doesn't mean I trust him to review a movie, or to build my next nostr app, or that he is a credentialed physician. Different types of trust require different contexts.

So where does that leave us? We need a way to indicate trust, something that is not a follow, but also different trust metrics for different trust contexts. We need a NIP where Alice can attest that she trusts Bob in one context but not necessarily in another. The NIP will be a crusade against ambiguity: she should be empowered to attest as precisely as possible the trust context, to quantify the degree of trust numerically, and while we're at it, to quantify the confidence in her assessment of Bob, based upon the amount of information she relies upon for her assessment.

Sounds like a simple enough NIP. Right? But wait. Is trust quantification going to be on a 0 to 5 star scale? Or will it range from 0 to 100? Or just binary yes/no? How many contexts are there, and what are they? The questions start to pile up. Who answers them???

Explicit Attestations of Trust

Let's say we plow ahead, undaunted, with this idea. In place of proxy indicators of trust, like a follow, we want users to say precisely what they mean and mean precisely what they say. No ambiguity, nothing left open to interpretation. We want a NIP for explicit, contextual trust attestations.

Someone has to spell out the details of these attestations, and that someone will have to be the NIP author (or so our thinking goes). So let's say we spell out a list of contexts, we specify trust will be on a 0 to 100 point scale, and we add a field for confidence which is between 0 and 100 percent. "Alice trusts Bob to build her next nostr app" becomes, according to this hypothetical NIP: "Alice attests that Bob's skill in building nostr apps is 99 out of100, and she states this with a confidence of 90 (because she's worked with him many times before)."

We are now faced with the problem that this is not a particularly great user experience. It would seem that the more details the user is expected to provide, the less likely a user is to author these attestations in the first place. Attestations that follow this NIP may be ideally suited as input into an algorithm, but that doesn't mean that users will actually issue them! -- a point made on a WoT panel discussion in 2024 Nostriga. But regardless, we can certainly make NIPs like this ... and guess what? Many people have! Are they widely used? No, not really. They certainly never got as much usage as kind 3 follows. Still, suppose our explicit attestations of trust somehow manage to garner wide usage. What are we going to do with the data? How do we deal with the bots and bad actors who use our NIP to spam favorable ratings to their friends and 0 point ratings to their competitors or enemies?

This turns out to be tougher than we thought! Do we continue to use follows data that is plentiful but ambiguous as to what it means? Do we try to make the switch to explicit contextual trust attestations that eliminate ambiguity in favor of detail, but deliver such a bad UX that no one will use them? Neither one of these extremes seems to be the answer. But wait a sec ... why do we have to choose?

Why not both?

How about this: Let's use proxy indicators of trust, such as follows. And let's ALSO use explicit, unambiguous, contextual, quantifiable trust attestations. Neither extreme is enough by itself to build a WoT that does what it needs to do. So let's use both extremes. Plus everything in between. Let's use it all!

We come to the realization that web of trust should rely upon any and every piece of data that is available and relevant to the question at hand.

OK, so how does that work? How is it possible to use "any and every piece of data that is available and relevant to the question at hand?"

Enter the notion of interpretation.

Interpretation

The idea of interpretation is simple: start with whatever raw data is available and relevant to whatever trust metric we wish to calculate (follows, mutes, reports, reactions, reposts, labels, zaps, etc) and convert it into a common format that is maximally convenient and useful for my web of trust algorithm. What common format are we talking about? It's the explicit trust attestation discussed above, stripped of ambiguity because the context, the numerical rating, and the confidence are spelled out in full detail.

The idea is that if users are disinclined to provide attestations following this idealized Rating Format, we take the notes that they have authored and use them to make our best guess as to what they would have said if they, hypothetically, had expressed their thoughts in this format.

To this end, the GrapeRank algorithm makes use of interpreted data that uses an idealized Rating Format with these five fields:

Rating Context: a string, e.g. "5 star Product Quality" which tells us what to expect in the fields below
Rater: string (a pubkey)
Ratee: a string (often a pubkey but could be an event id, a product name, etc)
Rating: Typically a scalar, e.g. 0 to 5 stars, but could be a boolean or some other variable type. The Rating Context dictates the rating type and its allowed range.
Confidence: a number between 0 and 1 (or 0 and 100%)

The end result of the Interpretation step is a Rating with five fields, as described above. To do the actual GrapeRank calculations, we write a "calculation engine" that inputs ratings that are formatted as above and produces weighted average scores. If a Ratings dataset is formatted as above, then the GrapeRank algorithm will know exactly what to do with it, no matter the context.

So the idea of interpretation is that we will take whatever data source that is available and relevant to the question at hand, write a bespoke script that transforms that data into the idealized GrapeRank Rating Format, and input that into a calculation engine that runs the GrapeRank algorithm. With each new data source, we need a new interpretation script, but there is no need to rewrite the GrapeRank calculation engine for each new trust metric.

As an example, consider this Brainstorm prototype. This service calculates a metric, the GrapeRank Trust Metric, designed to address the question: which nostr npubs can be entrusted with our time and attention? The prototype makes use of follows, mutes, and NIP-56 reports as the raw data that is both available and relevant to the question at hand. Follows, mutes and reports do not follow the Ratings Format presented above, but we imagine: if Alice were hypothetically to express her follow, mute or report using this format, how do we imagine she would do it? We interpret a follow as a rating of 1, meaning "worthy of attention," with mutes and reports being a rating of 0, meaning "not worthy of attention". What about the confidence field? Nostr users tend to follow relatively indiscriminantly. A mute is generally a little more carefully considered; but if Alice reports someone, she probably means it. Therefore the default settings are to interpret a follow as having a very low confidence, only 4 percent, with a report having a much higher confidence of 50 percent, and a mute a confidence somewhere in between. These are the parameters we use to interpret follows, mutes and reports into the Rating Format so they can be input into the GrapeRank algorithm.

Conclusion

We started with the question: where does the trust signal come from? This takes us on a journey of intellectual exploration as we consider the question. Our thought process starts with the notion that the trust signal should come from proxy indicators of trust such as follows, switch to the notion that the trust signal should come exclusively from explicit trust attestations, and finally arrive at the notion that the trust signal should come from any and every piece of data that is available and relevant to the question at hand. We realize that this is impossible without the process of interpretation, in which a script, one that is customized to each new data source we wish to ingest, converts the raw data into a common Rating Format recognized by our algorithm of choice (such as GrapeRank). We provide an example Ratings Format, the five fields mentioned above, and illustrate the role it plays in the Brainstorm prototype service.

This thought process may seem long and complicated, but the general path of its logic is inescapable. So what do we need to do next?

Build it. Start with a question we want to ask -- who are the credentialed physicians in my geographic location? -- identify the raw nostr data that is relevant and available to this question (and maybe we need to build a NIP to generate relevant data if none is currently available), write an interpretation script, feed the interpreted ratings into an algorithm such as GrapeRank, and communicate the results to nostr clients in a manner that generates real utility.

If you want to help build this, if you want to be a part of the nostr Web of Trust revolution, join the NosFabric web of trust hackathon where these and other related ideas are becoming reality!

Web of Trust: Where is the Trust Signal?