Sciop at SfN 2025 and our Federation Plans

The Once and Future SciOp

Jonny

2025-12-08

We were just at the Society for Neuroscience 2025 meeting presenting to some of our neuroscientific colleagues, the poster and abstract are here, and have a decent overview of the ongoing work in the sciop extended universe: https://aharoni-lab.com/sfn2025/sciop

Work on Sciop Proper has slowed a bit as it is largely stabilized¹ in its pre-federation state. There are two major features, one of which is already completed, that we are working on before turning fully to federation:

Reporting: completed in #471 - report items for a variety of concerns like validity, malicious content, rule violations, etc.
Commenting: TODO - to be able to coordinate scrapes and discuss uploads on the site itself.

The other pieces we have been working on surrounding sciop include

torrent-models - nice data models for working with torrents within python, as well as filling the gap where a lot of other python torrent creation/editing software doesn’t support v2.
noob - a graph processing library that we’ll be using to implement the chains of side effects with activitypub (more on that below).
FEP-1580 - a spec proposal for how to move objects around on activitypub. this is important for regular microblog activitypub as well as for sciop, as we will want to be able to relocate datasets and uploads between instances relatively freely (to the degree they are attached to instances, more below).
FEP-d8c8 - A Torrent type for activitypub, which represents torrents as first-class activitypub objects and allows them to be extended within-protocol, allowing us to embed metadata that makes torrents self-describing, self-contained, and self-updating as we federate them between instances :)

There is still a bit of work to do to get these pieces up and running, particularly with noob, but we are nearly there, and will begin the work of federating sciop early next year.

Federation Plans

We plan to federate sciop using a federation overlay package that does for python² what other projects like activitypub-express and fedify have done for javascript.

In particular, we plan on writing

a package with a set of core activitypub pydantic models
extensions of those models to be used with sqlalchemy
an ORM-like package that allows pydantic models to be used with oxigraph
a set of pure function routines that implement federation and its side effects as a graph of operations
a set of server framework wrappers that expose those routines to fastapi and litestar

As described in our federation roadmap, we plan on implementing a different kind of activitypub federation than usual: a distributed federation with FEP-c390 as a model that decouples objects and actors from instances.

Some detail that probably strays too far into the technical weeds on these …

Actually Modeling The Graph

One of the underrated achievements of ActivityPub is how it bridges the Linked Data world with the rest of the web tech world³. Collections in particular are an underrated achievement, as dealing with collections of objects with traditional RDF tools is a literal nightmare.

However the challenges of trying to jam a graph-shaped thing into a relational-shaped hole mean that for the most part we can’t really realize the full potential of what activitypub can do. The promise of building activitypub out of linked data is that we can federate any kind of object, however you can’t really store any kind of object in a relational database in any kind of coherent or performant way. This means that even minor changes in the spec require an enormous amount of labor touching all the little finicky bits of every tradweb AP server, rather than having those changes take place at the schema level, keeping model and view isolated from one another.

Part of the problem is that the alternative of just using RDF and graph databases is not exactly technically or cognitively viable either. How do you federate an unbounded graph of triples? ActivityPub gives a specific unit of graph in Objects, which is great - people think about things as discrete entities, as objects, as records; not as a big graph soup where i need to deal with the infinity of all possible properties that could be declared for something. Type annotations are great, but how can i type annotate a big blob of triples? Technically, postgres is great! wonderful! we love it! but there aren’t really comparable graph databases - they are all ancient, unmaintained, unwieldy behemoths that have the nasty habit of knocking over a server when processing even a moderately complex SPARQL query.

What to do?

We want to take a page out of bluesky’s playbook, where they can run large large services by using many single-tenant sqlite databases rather than one humongous sharded database⁴. This fits with a distributed federation model (below), where an instance is just a “delegate” that can receive and act on behalf of an account, but the account is not intrinsically tied to the instance.

To do that, we want to make the programming experience nice by making an ORM-like adapter between pydantic models and oxigraph. The python type annotation system makes a very nice analogy to triples, where an object instance - type - value maps onto a subject - predicate - object triple. From that we want to make something that looks like this

this should be considered PSEUDOCODE as an EXAMPLE

from typing import Annotated as A
from pydantigraph import GraphModel, Namespace, ObjectURI, ClassURI
from pyoxigraph import Store

Schema = Namespace("http://schema.org/")
Sciop = Namespace("http://sciop.net/")

class Person(GraphModel):
    class_uri: ClassURI = Schema.Person

    object_uri: ObjectURI
    name: A[str, Schema.name]

# create
with Store('my-data.ox') as store:
    me = Person(
      object_uri = Sciop / "accounts" / "me"
      name = "Jonny"
    )
    store.add(me)

# query
with Store('my-data.ox') as store:
    me = store.select(Person).where(name == "Jonny")

# dump to rdf
me.to_ttl()

roughly…

@prefix schema: <http://schema.org/> .

<http://sciop.net/accounts/me> a schema:Person ;
    schema:name "Jonny" .

Assuming additional methods to query additional properties, dynamic class constructors, and so on, this should let us keep the sensible modeling of relational/object-oriented world while being able to accommodate arbitrary data in the graph.

For SciOp, we want to implement the graph data as an “underlay” to the data of the sciop site - where the sciop instance will keep copies of each account’s data in a separate pyoxigraph database and create digested views against that data in its relational database. This is to let us make a gradual transition to a graph database, as well as gauge the performance consequences of this experiment so we don’t rewrite the whole backend and find out that it’s much worse. Our goal is to make it so that most of what we do with sciop is not vertically integrated and can only be used with sciop, but to make sure that we are making as much reusable technology as possible for the benefit of the broader fediverse.

Noob + Side Effects

Within activitypub, given activity will imply a number of side effects, both “officially” in the protocol and “unofficially” in how a practical implementation has to work. So e.g. when receiving a Create[Note] activity, one might have to go and fetch the replies for the post, then go and fetch the profile for the accounts that posted it, create notifications, and so on.

An example activitypub side effect graph showing how a follow activity results in a number of implied actions: fetching an account, fetching pinned posts, backfilling a timeline, fetching link previews, verifying backlinks, and so on.

Noob will allow us to model these graphs of side effects in a declarative way that allows them to be run flexibly, scaling from very small instances with only a few CPUs to very large instances that can distribute those processes over compute clusters.

For (a simplified) example, we could model a side effect where we fetch a post and and download its attachments as a two-step noob pipeline like this that fetches the post, fetches its attachments, and returns the post.

noob_id: ex-fetch-post

input:
  status_uri: str

nodes:
  fetch_post:
    type: example.fetch_post
    depends:
      - uri: input.status_uri
  attachments:
    type: example.fetch_attachments
    depends:
      - post: fetch_post.post
  return:
    type: return
    depends: 
      - post: fetch_post.post

This always happens whenever we fetch a post, and we might fetch posts in lots of different cases. Noob then lets us compose pipelines so that, e.g. if we were to fetch a post as part of some other side effect chain, we could do so by reusing the pipeline:

noob_id: ex-expand-timeline

input:
  collection_uri: str

nodes:
  iterate_collection:
    type: example.iterate_collection
    depends:
      - uri: input.collection_uri
  object_map:
    type: map
    depends:
      - iterate_collection.object_uri
  fetch_post:
    type: tube
    params:
      - id: ex-fetch-post
  something_else:
    type: idk.something.else
    depends:
      - post: fetch_post.post

where we map a collection of object uris into the ex-fetch-post timeline, fetch them using that pipeline, and then use its results in the parent pipeline.

Modeling activitypub side effects as a declarative graph like this should help us keep the code very general, but we are also hoping that this paves the way to being able to federate an activity’s behavior as well as the activity itself. This is useful for a few reasons:

One is that it makes activities self-describing - if we initially don’t know how to handle an object, we can consult its “recommended” or “default” processing chain for it, after reviewing the code of course.⁵
It is a straightforward path for giving objects capabilities: How does an activitypub instance know how to “Like” a post? Currently by convention, but an object could provide a Like capability that refers to a pipeline of actions, and objects could provide any number of capabilities that are relevant for it.
We eventually want to bridge federated data with federated compute - but that’s a longer term goal that’s a bit out of scope for this blog post…

Distributed Federation

I won’t repeat the relevant specs or our roadmap here, but briefly - the model of federation we will be building towards is one that decouples actor identity and objects from specific instances.

Much like how bittorrent trackers are just lightweight databases of references to the data referred to by the torrent, a sciop instance will be a lightweight database of copies and references to activitypub objects - the “always-on” server that can store and forward events when your local peer might be offline, but if you want to pick up and move your data to another instance, or host it on multiple instances simultaneously, you’re more than welcome to.

The basic outline that’s emerging is that a given identity starts as a keypair, but that identity can create one-way “derived” identities and “delegate” certain permissions for those identities to a set of instances. So, I’m little old me, and I create an ActorDelegation that says “sciop.net is allowed to do x y and z on my behalf.” The Delegation object contains a signature that lets it be validated against the public key that serves as my identity, and objects created by sciop.net are signed by this delegated identity.

This creates a chain from me -> DelegatedIdentity -> sciop that can be validated by recipients, and in the case that I want to revoke those permissions from a potentially hostile instance, allows me to publish a DelegateRevocation that severs the connection from me to DelegatedIdentity. The instance doesn’t get to hold onto my “main” private key so I retain control over the root identity, but it can act on my behalf as if it did have the key for those specified activities. Lots left to work out here re: key management, but that’s the draft so far.

We’ll also be working on a protocol extension for ActivityPub collections that makes them function like git-like merkle trees for efficient updating and change propagation, but still too early to say anything about that.

A few of the main desiderata we have here are:

Identity and all its data (uploads, comments, etc.) can be moved and restored in case an instance suddenly goes down
Identity can be present on many instances simultaneously
Identities remain durable and relevant for webs of trust (as opposed to most ‘censorship resilient’ protocols with fascist undertones, where identity is cheap and abuse is rampant), but can still be deniable for external observers - balancing moderability with autonomy.
“ownership” of objects can be shared via “group-like” delegated identities and forked with clear change in ownership

Summary

In summary, we’ve gotten started on, and will continue to work on a model of federation where

We can use a wide variety of objects with minimal implementation labor
We can extend our programs to accommodate new objects by federating action chains and capabilities
Identity and data are delocalized from instances, but use instances as flexible network actors that have a level of responsibility more than passthrough relays, but less than traditional AP instances.

We call this “federated p2p” or “hybrid federated/p2p systems” that blend an always-on federated backbone that coordinates swarms of p2p actors.

Towards the mission of making archives resilient to the pressures of those with vested interests in rewriting history and keeping people confused on a sea of filtered and always-changing information.

You are of course welcome to take part in any of this work! start chatting on the issue boards of any of these projects, @ me on fedi, or email the contact address at the bottom of https://sciop.net <3

Appendix: Even More Sneak Peeks

In the even longer future we are working on merging these projects with work elsewhere on wikilike and document-like systems. A currently WIP project (that I’m unsure how much detail I can write about yet) in our lab is working to federate packs of pages between semantic mediawiki instances. On SMW instances, “wiki pages” consist of both traditional wiki-markup human-readable text, but also data schemas and forms and templates for creating and displaying structured data.

We are attempting to design sciop’s systems to eventually accommodate wikilike editing, with shared ownership of objects, edit histories, forking histories, etc.

Mediawiki is useful as a matured wiki framework, but it can be largely seen as a frontend to editing the underlying federable data. We imagine datasets and uploads being accompanied by more information needed to contextualize it that might come from multiple distinct sources: put one way, a wiki overlay to the datasets in a sciop instance, the other way, a wiki that can support a bunch of bulk data beneath it.

A bridge between MediaWiki and ActivityPub could look as simple as federating a bunch of Article objects around, but with SMW we should be able to create and structure arbitrary object types. So a small group like a lab or an archive can run a wiki, a sciop instance, and be able to organize their internal work on the wiki and then bridge work that should be shared more broadly out to a network of activitypub instances and p2p peers for hosting the bulk data: making “open data” an actually-attainable goal without needing to pay AWS a brazillionty dollars to host enormous centralized archives.

With federable processing pipelines and action chains, we imagine being able to compute over the data shared via torrent, create derived objects, populate “dashboards” on small-scale sciop/mediawiki instances, and bridge the domains of bulk data, communication, computation, and documentation..

except for one dastardly bug with the scheduler ↩
because it’s a great language and we love it, that’s why. ↩
See this excellent post from Christine on the matter. ↩
though see their more recent documentation on distributed databases using nosql clusters and view servers, we don’t need to replicate much of this since we aren’t planning on being a “big world” federation model where that kind of scale becomes relevant - https://atproto.com/articles/atproto-for-distsys-engineers ↩
Nobody is suggesting to just run random code from the internet! these pipelines should be subject to the same inspection and moderation as the content itself! ↩