Defines how platform weights are set and changed: founding vote (initiator defines eligible members, averaged proposals), annual community vote (all platforms simultaneously, median of submitted distributions), and structural change tier. Updates ADR 002 and ADR 007 to reflect the new mechanism, and ARCHITECTURE.md to mark weight governance as resolved rather than a future direction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
207 lines
9.9 KiB
Markdown
207 lines
9.9 KiB
Markdown
# Architecture Overview
|
|
|
|
This document describes how the Agency system works, why it is structured the way it is,
|
|
and how to reason about it. It is updated with each meaningful change to the codebase.
|
|
|
|
---
|
|
|
|
## What this system does
|
|
|
|
Agency is a participation signal system for open source communities.
|
|
It was built with OSArch as the first community, but is designed from the start to be
|
|
adopted, forked, and adapted by any community that wants the same thing:
|
|
a visible, legible signal of who is contributing and where.
|
|
|
|
It observes where community members are active (forum, code, wiki, chat, funding) and
|
|
produces a ranked participation signal. Each community controls its own weights, data
|
|
sources, and interpretation of the output.
|
|
|
|
It is not a governance engine. It does not make decisions. It makes existing human activity
|
|
legible so that people can act with more confidence, knowing their contributions are seen.
|
|
|
|
The core idea: **legitimacy comes from participation, not structure.**
|
|
|
|
> OSArch is the reference implementation. If you are from another community reading this,
|
|
> everything here is designed to be forked. Start with `config.yaml`.
|
|
|
|
---
|
|
|
|
## Data flow
|
|
|
|
```
|
|
data/*.json → src/scoring/score.py → src/outputs/table.py
|
|
(participation data) (weighted formula) (ranked table)
|
|
↑
|
|
config.yaml
|
|
(adjustable weights)
|
|
```
|
|
|
|
1. A JSON file holds raw participation counts per user per platform
|
|
2. `config.yaml` defines how much each platform counts toward the score
|
|
3. `score.py` applies the weights and returns a numeric signal per user
|
|
4. `aggregate.py` collects all users, sorts by score
|
|
5. `table.py` renders the result as a human-readable ranked list
|
|
|
|
---
|
|
|
|
## Key design principles
|
|
|
|
### Correctability over precision
|
|
Weights are in `config.yaml`, not hardcoded. Anyone can fork the repo, change the weights,
|
|
and run a different interpretation. Disagreement is a feature, not a bug.
|
|
|
|
### Fake data first, real integrations later
|
|
The system starts with hand-crafted sample data in `data/sample.json`. Real API integrations
|
|
come one at a time, after the model is stable. This prevents over-engineering before the
|
|
signal itself is validated.
|
|
|
|
### Git as the distributed database
|
|
All participation data lives in version-controlled JSON files. The entire history of the
|
|
system — data, weights, and scoring logic — is visible in git. Forking the repo forks the
|
|
system. This is also how another community adopts it: fork, replace the data, adjust the
|
|
weights, run it for their context.
|
|
|
|
### Public data only
|
|
Collectors may only use publicly accessible data — no API keys, authentication, or
|
|
platform permission required. If the data isn't public, it isn't in scope. This keeps
|
|
the system independent, immediately forkable, and auditable: anyone can verify what
|
|
is being collected by visiting the same public URLs the collectors use.
|
|
See [ADR 003](decisions/003-public-data-only.md) for the full reasoning.
|
|
|
|
### No black boxes
|
|
Every scoring decision is visible in plain text. A newcomer (human or AI) should be able
|
|
to read `config.yaml` and `score.py` and understand exactly how a score is produced.
|
|
|
|
### OSArch is the reference implementation, not the intended audience
|
|
All documentation — ADRs, architecture notes, templates — is written for any community
|
|
adopting this system, not specifically for OSArch. OSArch examples are framed explicitly
|
|
as examples. A contributor from a community that has never heard of OSArch should be
|
|
able to read any document in this repo and act on it. See [docs/STYLE.md](STYLE.md) for
|
|
the full documentation conventions.
|
|
|
|
---
|
|
|
|
## Directory structure
|
|
|
|
```
|
|
agency/
|
|
main.py entry point, CLI
|
|
config.yaml scoring weights (community-adjustable)
|
|
requirements.txt Python dependencies
|
|
data/
|
|
sample.json mock participation data (starting point)
|
|
src/
|
|
collectors/ future: one file per data source (forum, github, wiki, etc.)
|
|
scoring/
|
|
score.py weighted scoring formula for a single user
|
|
aggregate.py applies score() across all users, returns ranked dict
|
|
outputs/
|
|
table.py renders ranked scores as a CLI table
|
|
docs/
|
|
ARCHITECTURE.md this file
|
|
decisions/ one file per significant decision (ADR format)
|
|
```
|
|
|
|
---
|
|
|
|
## Future directions (not yet built)
|
|
|
|
### Distributed database / tamper-evident records
|
|
The long-term goal is a data layer where no single party can retroactively alter participation
|
|
history. Blockchain is one path to this. The challenge with public chains (Ethereum, etc.) is
|
|
gas costs per write and the wallet/token barrier, which would exclude most OSArch contributors.
|
|
More accessible alternatives to evaluate when the time comes:
|
|
|
|
- **IPFS + content-addressed JSON** — immutable, distributed, no fees, no wallets required
|
|
- **Hypercore / Dat protocol** — append-only logs with cryptographic integrity, peer-to-peer
|
|
- **Signed append-only log** — GPG-signed JSON commits; tamper-evident without any chain
|
|
- **Private/consortium blockchain** — full blockchain properties without public gas costs,
|
|
but reintroduces a trust question about who runs the nodes
|
|
|
|
The git-tracked JSON approach used today already provides a weak form of this: history is
|
|
visible and forks are public. The upgrade path is additive, not a rewrite.
|
|
|
|
See [ADR 001](decisions/001-python-and-json.md) for why git-tracked JSON was chosen to start.
|
|
|
|
### Funding signals: merged or separate?
|
|
Currently all funding activity is collapsed into a single low-weighted signal (`funding_activity: 0.1`).
|
|
An open question for when real funding collectors are built:
|
|
|
|
- Should all funding platforms (Open Collective, GitHub Sponsors, Patreon, etc.) roll into
|
|
one combined `funding_activity` score — money is money regardless of platform?
|
|
- Or should they be distinct signals, because the *transparency* of the funding act matters
|
|
as much as the act itself? Open Collective is fully public; other platforms are not.
|
|
A public, traceable funding contribution may be meaningfully different from a private one.
|
|
|
|
This is unresolved. The decision should be made when the first funding collector is built,
|
|
not before.
|
|
|
|
### Should funding have representation beyond a score weight?
|
|
Funding currently contributes to agency scores at a low weight (`0.1`). An open question
|
|
is whether funders — particularly sustained, transparent funders — should have a distinct
|
|
form of representation beyond that score contribution.
|
|
|
|
The case for: sustained funding is a form of commitment. Excluding it from representation
|
|
entirely may signal that financial support is unwelcome, which could affect the project's
|
|
long-term sustainability.
|
|
|
|
The case against: the moment funding buys representation, a wealthy actor who never
|
|
participates could outweigh someone who has contributed for years. That is the exact
|
|
capture problem this system is designed to avoid.
|
|
|
|
A possible middle path: weight transparent, sustained funding more generously than
|
|
one-off donations within the existing score — giving it more voice without creating a
|
|
separate governance track. But whether that is sufficient, or whether funders deserve
|
|
distinct representation of some kind, is an open question.
|
|
|
|
This should be a community decision before any funding collector is built.
|
|
|
|
### Community weight governance
|
|
Platform weights are determined by community vote, not by maintainers or direct
|
|
`config.yaml` edits. The mechanism — founding vote, annual adjustment via averaged
|
|
proposals, and structural change process — is defined in
|
|
[ADR 008](decisions/008-platform-weight-governance.md). The goal is to keep
|
|
meta-governance from being captured by whoever scores highest.
|
|
|
|
### How should the community decide which sites to collect from?
|
|
The list of platforms the system collects from is not a technical decision — it defines
|
|
what "participation" means. Adding a platform amplifies activity there; removing one
|
|
diminishes it. This is a form of power that should be community-governed.
|
|
|
|
This is an open question. One possible approach:
|
|
|
|
*Criteria-first, then process.* Before debating any specific platform, the community
|
|
agrees on a set of inclusion criteria. A candidate site would need to meet all of them:
|
|
|
|
- Data is publicly accessible (required — see ADR 003)
|
|
- Platform is actively used by a meaningful portion of the community
|
|
- Activity on the platform represents genuine effort, not easily gamed
|
|
- Platform is stable enough to depend on
|
|
- Platform is relevant to the community's actual work, not peripheral activity
|
|
|
|
With clear criteria, the process becomes more mechanical: open a proposal using the
|
|
[PROPOSAL_TEMPLATE.md](sites/PROPOSAL_TEMPLATE.md) (a file in `docs/sites/proposed/`),
|
|
a defined discussion period, then a vote. Voting would require a minimum agency score
|
|
threshold — proving you are an active participant before having a say in what participation
|
|
means. Above that threshold, every vote counts equally regardless of rank. Approved sites move to `docs/sites/active/`. Modifications to existing sites
|
|
(scope changes, weight adjustments) use [CHANGE_TEMPLATE.md](sites/CHANGE_TEMPLATE.md).
|
|
Removed sites move to `docs/sites/retired/` using [CHANGE_TEMPLATE.md](sites/CHANGE_TEMPLATE.md)
|
|
with a reason recorded.
|
|
|
|
Retired sites stay in the record permanently. If a platform was removed because it was
|
|
being gamed or because the community migrated away, that history matters for future decisions.
|
|
|
|
This is one possible path forward. The right process should be decided by the community
|
|
before the first real collector is built and the list becomes consequential.
|
|
|
|
---
|
|
|
|
## Current limitations (known, intentional)
|
|
|
|
- Scores are not normalized — raw weighted sums, not percentages
|
|
- No deduplication across platforms (same person with different usernames counts separately)
|
|
- Data is hand-entered, not yet pulled from live APIs
|
|
- No time windowing — all activity is treated as equally recent
|
|
|
|
These are not oversights. They are deliberate starting points. See [docs/decisions/](decisions/) for
|
|
the reasoning behind each.
|