<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://blog.sciop.net/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.sciop.net/" rel="alternate" type="text/html" /><updated>2025-12-08T23:25:38+00:00</updated><id>https://blog.sciop.net/feed.xml</id><title type="html">SciOp The Blog</title><subtitle>SciOp: The Blog for SciOp: The Website</subtitle><entry><title type="html">Sciop at SfN 2025 and our Federation Plans</title><link href="https://blog.sciop.net/2025-12-08/sfn-and-federation" rel="alternate" type="text/html" title="Sciop at SfN 2025 and our Federation Plans" /><published>2025-12-08T22:00:00+00:00</published><updated>2025-12-08T22:00:00+00:00</updated><id>https://blog.sciop.net/2025-12-08/sfn-and-federation</id><content type="html" xml:base="https://blog.sciop.net/2025-12-08/sfn-and-federation"><![CDATA[<p>We were just at the Society for Neuroscience 2025 meeting presenting to some of our neuroscientific colleagues,
the poster and abstract are here, and have a decent overview of the ongoing work in the sciop extended universe: <a href="https://aharoni-lab.com/sfn2025/sciop">https://aharoni-lab.com/sfn2025/sciop</a></p>

<p><a href="https://aharoni-lab.com/sfn2025/build/sfn2025-sciop-bd17624589550638786a97489024a318.pdf"><img src="https://aharoni-lab.com/sfn2025/build/sfn2025-sciop-01-c161cfb4460fe69f5a90001009fd68d6.png" alt="Sciop Poster, click the link and the PDF has screen-readable text" /></a></p>

<p>Work on Sciop Proper has slowed a bit as it is largely stabilized<sup id="fnref:scheduler"><a href="#fn:scheduler" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> in its pre-federation state. There are two major features, one of which is already completed, that we are working on before turning fully to federation:</p>

<ul>
  <li>Reporting: completed in <a href="https://codeberg.org/Safeguarding/sciop/pulls/471">#471</a> - report items for a variety of concerns like validity, malicious content, rule violations, etc.</li>
  <li>Commenting: TODO - to be able to coordinate scrapes and discuss uploads on the site itself.</li>
</ul>

<p>The other pieces we have been working on surrounding sciop include</p>

<ul>
  <li><a href="https://github.com/p2p-ld/torrent-models">torrent-models</a> - nice data models for working with torrents within python, as well as filling the gap where a lot of other python torrent creation/editing software doesn’t support v2.</li>
  <li><a href="https://github.com/miniscope/noob">noob</a> - a graph processing library that we’ll be using to implement the chains of side effects with activitypub (more on that below).</li>
  <li><a href="https://fediverse.codeberg.page/fep/fep/1580/">FEP-1580</a> - a spec proposal for how to move objects around on activitypub. this is important for regular microblog activitypub as well as for sciop, as we will want to be able to relocate datasets and uploads between instances relatively freely (to the degree they are attached to instances, more below).</li>
  <li><a href="https://fediverse.codeberg.page/fep/fep/d8c8/">FEP-d8c8</a> - A <code class="language-plaintext highlighter-rouge">Torrent</code> type for activitypub, which represents torrents as first-class activitypub objects and allows them to be extended within-protocol, allowing us to embed metadata that makes torrents self-describing, self-contained, and self-updating as we federate them between instances :)</li>
</ul>

<p>There is still a bit of work to do to get these pieces up and running, particularly with noob, but we are nearly there, and will begin the work of federating sciop early next year.</p>

<h2 id="federation-plans">Federation Plans</h2>

<p>We plan to federate sciop using a federation overlay package that does for python<sup id="fnref:whypython"><a href="#fn:whypython" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> what other projects like <code class="language-plaintext highlighter-rouge">activitypub-express</code> and <a href="https://fedify.dev/">fedify</a> have done for javascript.</p>

<p>In particular, we plan on writing</p>
<ul>
  <li>a package with a set of core activitypub pydantic models</li>
  <li>extensions of those models to be used with <code class="language-plaintext highlighter-rouge">sqlalchemy</code></li>
  <li>an ORM-like package that allows pydantic models to be used with oxigraph</li>
  <li>a set of pure function routines that implement federation and its side effects as a graph of operations</li>
  <li>a set of server framework wrappers that expose those routines to fastapi and litestar</li>
</ul>

<p>As described in our <a href="https://sciop.net/docs/intro/roadmap/#federation">federation roadmap</a>, we plan on implementing a different kind of activitypub federation than usual: a distributed federation with <a href="https://codeberg.org/fediverse/fep/src/branch/main/fep/c390/fep-c390.md">FEP-c390</a> as a model that decouples objects and actors from instances.</p>

<p>Some detail that probably strays too far into the technical weeds on these …</p>

<h3 id="actually-modeling-the-graph">Actually Modeling The Graph</h3>

<p>One of the underrated achievements of ActivityPub is how it bridges the Linked Data world with the rest of the web tech world<sup id="fnref:standards-division"><a href="#fn:standards-division" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>.
Collections in particular are an underrated achievement, as dealing with collections of objects with traditional RDF tools is <a href="https://piracy.solutions/docs/comparison/ld/rdf.html">a literal nightmare</a>.</p>

<p>However the challenges of trying to jam a graph-shaped thing into a relational-shaped hole mean that for the most part we can’t really realize the full potential of what activitypub can do.
The promise of building activitypub out of linked data is that we can federate <em>any</em> kind of object, however you can’t really store <em>any</em> kind of object in a relational database in any kind of coherent or performant way. 
This means that even minor changes in the spec require an enormous amount of labor touching all the little finicky bits of every tradweb AP server, rather than having those changes take place at the schema level, keeping model and view isolated from one another.</p>

<p>Part of the problem is that the alternative of just using RDF and graph databases is not exactly technically or cognitively viable either. How do you federate an unbounded graph of triples? ActivityPub gives a specific <em>unit</em> of graph in <code class="language-plaintext highlighter-rouge">Objects</code>, which is great - people think about things as discrete entities, as objects, as records; not as a big graph soup where i need to deal with the infinity of all possible properties that could be declared for something. Type annotations are great, but how can i type annotate a big blob of triples? Technically, postgres is great! wonderful! we love it! but there aren’t really comparable graph databases - they are all ancient, unmaintained, unwieldy behemoths that have the nasty habit of knocking over a server when processing even a moderately complex SPARQL query.</p>

<p>What to do?</p>

<p>We want to take a page out of bluesky’s playbook, where they can run large large services by using <a href="https://github.com/bluesky-social/atproto/pull/1705">many single-tenant sqlite databases</a> rather than one humongous sharded database<sup id="fnref:bsky-distributed"><a href="#fn:bsky-distributed" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>. This fits with a distributed federation model (below), where an instance is just a “delegate” that can receive and act on behalf of an account, but the account is not intrinsically tied to the instance.</p>

<p>To do that, we want to make the programming experience nice by making an ORM-like adapter between pydantic models and <a href="https://github.com/oxigraph/oxigraph">oxigraph</a>. The python type annotation system makes a very nice analogy to triples, where an object instance - type - value maps onto a subject - predicate - object triple. From that we want to make something that looks like this</p>

<p><em>this should be considered PSEUDOCODE as an EXAMPLE</em></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">Annotated</span> <span class="k">as</span> <span class="n">A</span>
<span class="kn">from</span> <span class="n">pydantigraph</span> <span class="kn">import</span> <span class="n">GraphModel</span><span class="p">,</span> <span class="n">Namespace</span><span class="p">,</span> <span class="n">ObjectURI</span><span class="p">,</span> <span class="n">ClassURI</span>
<span class="kn">from</span> <span class="n">pyoxigraph</span> <span class="kn">import</span> <span class="n">Store</span>

<span class="n">Schema</span> <span class="o">=</span> <span class="nc">Namespace</span><span class="p">(</span><span class="sh">"</span><span class="s">http://schema.org/</span><span class="sh">"</span><span class="p">)</span>
<span class="n">Sciop</span> <span class="o">=</span> <span class="nc">Namespace</span><span class="p">(</span><span class="sh">"</span><span class="s">http://sciop.net/</span><span class="sh">"</span><span class="p">)</span>

<span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">GraphModel</span><span class="p">):</span>
    <span class="n">class_uri</span><span class="p">:</span> <span class="n">ClassURI</span> <span class="o">=</span> <span class="n">Schema</span><span class="p">.</span><span class="n">Person</span>

    <span class="n">object_uri</span><span class="p">:</span> <span class="n">ObjectURI</span>
    <span class="n">name</span><span class="p">:</span> <span class="n">A</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Schema</span><span class="p">.</span><span class="n">name</span><span class="p">]</span>

<span class="c1"># create
</span><span class="k">with</span> <span class="nc">Store</span><span class="p">(</span><span class="sh">'</span><span class="s">my-data.ox</span><span class="sh">'</span><span class="p">)</span> <span class="k">as</span> <span class="n">store</span><span class="p">:</span>
    <span class="n">me</span> <span class="o">=</span> <span class="nc">Person</span><span class="p">(</span>
      <span class="n">object_uri</span> <span class="o">=</span> <span class="n">Sciop</span> <span class="o">/</span> <span class="sh">"</span><span class="s">accounts</span><span class="sh">"</span> <span class="o">/</span> <span class="sh">"</span><span class="s">me</span><span class="sh">"</span>
      <span class="n">name</span> <span class="o">=</span> <span class="sh">"</span><span class="s">Jonny</span><span class="sh">"</span>
    <span class="p">)</span>
    <span class="n">store</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="n">me</span><span class="p">)</span>

<span class="c1"># query
</span><span class="k">with</span> <span class="nc">Store</span><span class="p">(</span><span class="sh">'</span><span class="s">my-data.ox</span><span class="sh">'</span><span class="p">)</span> <span class="k">as</span> <span class="n">store</span><span class="p">:</span>
    <span class="n">me</span> <span class="o">=</span> <span class="n">store</span><span class="p">.</span><span class="nf">select</span><span class="p">(</span><span class="n">Person</span><span class="p">).</span><span class="nf">where</span><span class="p">(</span><span class="n">name</span> <span class="o">==</span> <span class="sh">"</span><span class="s">Jonny</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># dump to rdf
</span><span class="n">me</span><span class="p">.</span><span class="nf">to_ttl</span><span class="p">()</span>
</code></pre></div></div>

<p>roughly…</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">@prefix</span><span class="w"> </span><span class="nn">schema:</span><span class="w"> </span><span class="nl">&lt;http://schema.org/&gt;</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;http://sciop.net/accounts/me&gt;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">schema:</span><span class="n">Person</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">schema:</span><span class="n">name</span><span class="w"> </span><span class="s">"Jonny"</span><span class="w"> </span><span class="p">.</span><span class="w">
</span></code></pre></div></div>

<p>Assuming additional methods to query additional properties, dynamic class constructors, and so on, this should let us keep the sensible modeling of relational/object-oriented world while being able to accommodate arbitrary data in the graph.</p>

<p>For SciOp, we want to implement the graph data as an “underlay” to the data of the sciop site - 
where the sciop instance will keep copies of each account’s data in a separate pyoxigraph database and create digested views against that data in its relational database.
This is to let us make a gradual transition to a graph database, as well as gauge the performance consequences of this experiment so we don’t rewrite the whole backend and find out that it’s much worse. 
Our goal is to make it so that most of what we do with sciop is not vertically integrated and can only be used with sciop, but to make sure that we are making as much reusable technology as possible for the benefit of the broader fediverse.</p>

<h3 id="noob--side-effects">Noob + Side Effects</h3>

<p>Within activitypub, given activity will imply a number of side effects, both “officially” in the protocol and “unofficially” in how a practical implementation has to work. So e.g. when receiving a <code class="language-plaintext highlighter-rouge">Create[Note]</code> activity, one might have to go and fetch the replies for the post, then go and fetch the profile for the accounts that posted it, create notifications, and so on.</p>

<p><img src="/assets/img/ap_graph.svg" alt="An example activitypub side effect graph showing how a follow activity results in a number of implied actions: fetching an account, fetching pinned posts, backfilling a timeline, fetching link previews, verifying backlinks, and so on." /></p>

<p>Noob will allow us to model these graphs of side effects in a declarative way that allows them to be run flexibly, scaling from very small instances with only a few CPUs to very large instances that can distribute those processes over compute clusters.</p>

<p>For (a simplified) example, we could model a side effect where we fetch a post and and download its attachments as a two-step noob pipeline like this that fetches the post, fetches its attachments, and returns the post.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">noob_id</span><span class="pi">:</span> <span class="s">ex-fetch-post</span>

<span class="na">input</span><span class="pi">:</span>
  <span class="na">status_uri</span><span class="pi">:</span> <span class="s">str</span>

<span class="na">nodes</span><span class="pi">:</span>
  <span class="na">fetch_post</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">example.fetch_post</span>
    <span class="na">depends</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">uri</span><span class="pi">:</span> <span class="s">input.status_uri</span>
  <span class="na">attachments</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">example.fetch_attachments</span>
    <span class="na">depends</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">post</span><span class="pi">:</span> <span class="s">fetch_post.post</span>
  <span class="na">return</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">return</span>
    <span class="na">depends</span><span class="pi">:</span> 
      <span class="pi">-</span> <span class="na">post</span><span class="pi">:</span> <span class="s">fetch_post.post</span>

</code></pre></div></div>

<p>This always happens whenever we fetch a post, and we might fetch posts in lots of different cases.
Noob then lets us compose pipelines so that, e.g. if we were to fetch a post as part of some other side effect chain, we could do so by reusing the pipeline:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">noob_id</span><span class="pi">:</span> <span class="s">ex-expand-timeline</span>

<span class="na">input</span><span class="pi">:</span>
  <span class="na">collection_uri</span><span class="pi">:</span> <span class="s">str</span>

<span class="na">nodes</span><span class="pi">:</span>
  <span class="na">iterate_collection</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">example.iterate_collection</span>
    <span class="na">depends</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">uri</span><span class="pi">:</span> <span class="s">input.collection_uri</span>
  <span class="na">object_map</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">map</span>
    <span class="na">depends</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s">iterate_collection.object_uri</span>
  <span class="na">fetch_post</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">tube</span>
    <span class="na">params</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">id</span><span class="pi">:</span> <span class="s">ex-fetch-post</span>
  <span class="na">something_else</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">idk.something.else</span>
    <span class="na">depends</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">post</span><span class="pi">:</span> <span class="s">fetch_post.post</span>
</code></pre></div></div>

<p>where we <code class="language-plaintext highlighter-rouge">map</code> a collection of object uris into the <code class="language-plaintext highlighter-rouge">ex-fetch-post</code> timeline, fetch them using that pipeline, and then use its results in the parent pipeline.</p>

<p>Modeling activitypub side effects as a declarative graph like this should help us keep the code very general, but we are <em>also</em> hoping that this paves the way to being able to <em>federate an activity’s behavior</em> as well as the activity itself. This is useful for a few reasons:</p>

<ul>
  <li>One is that it makes activities <em>self-describing</em> - if we initially don’t know how to handle an object, we can consult its “recommended” or “default” processing chain for it, after reviewing the code of course.<sup id="fnref:no-autocompute"><a href="#fn:no-autocompute" class="footnote" rel="footnote" role="doc-noteref">5</a></sup></li>
  <li>It is a straightforward path for giving objects <em>capabilities</em>: How does an activitypub instance know how to “Like” a post? Currently by convention, but an object could provide a <code class="language-plaintext highlighter-rouge">Like</code> capability that refers to a pipeline of actions, and objects could provide any number of capabilities that are relevant for it.</li>
  <li>We eventually want to bridge federated <em>data</em> with federated <em>compute</em> - but that’s a longer term goal that’s a bit out of scope for this blog post…</li>
</ul>

<h3 id="distributed-federation">Distributed Federation</h3>

<p>I won’t <a href="https://codeberg.org/fediverse/fep/src/branch/main/fep/c390/fep-c390.md">repeat</a> the <a href="https://codeberg.org/fediverse/fep/src/branch/main/fep/8b32/fep-8b32.md">relevant</a> <a href="https://codeberg.org/fediverse/fep/src/branch/main/fep/ef61/fep-ef61.md">specs</a> or our <a href="https://sciop.net/docs/intro/roadmap/#federation">roadmap</a> here, but briefly - the model of federation we will be building towards is one that decouples actor identity and objects from specific instances.</p>

<p>Much like how bittorrent trackers are just lightweight databases of <em>references</em> to the data referred to by the torrent, a sciop instance will be a lightweight database of copies and references to activitypub objects - the “always-on” server that can store and forward events when your local peer might be offline, but if you want to pick up and move your data to another instance, or host it on multiple instances simultaneously, you’re more than welcome to.</p>

<p>The basic outline that’s emerging is that a given identity starts as a keypair, but that identity can create one-way “derived” identities and “delegate” certain permissions for those identities to a set of instances.
So, I’m little old <code class="language-plaintext highlighter-rouge">me</code>, and I create an <code class="language-plaintext highlighter-rouge">ActorDelegation</code> that says “<code class="language-plaintext highlighter-rouge">sciop.net</code> is allowed to do x y and z on my behalf.” The Delegation object contains a signature that lets it be validated against the public key that serves as my identity, and objects created by <code class="language-plaintext highlighter-rouge">sciop.net</code> are signed by this delegated identity.</p>

<p>This creates a chain from <code class="language-plaintext highlighter-rouge">me</code> -&gt; <code class="language-plaintext highlighter-rouge">DelegatedIdentity</code> -&gt; <code class="language-plaintext highlighter-rouge">sciop</code> that can be validated by recipients,
and in the case that I want to revoke those permissions from a potentially hostile instance, allows me to publish a <code class="language-plaintext highlighter-rouge">DelegateRevocation</code> that severs the connection from <code class="language-plaintext highlighter-rouge">me</code> to <code class="language-plaintext highlighter-rouge">DelegatedIdentity</code>. The instance doesn’t get to hold onto my “main” private key so I retain control over the root identity, but it can act on my behalf as if it did have the key for those specified activities. Lots left to work out here re: key management, but that’s the draft so far.</p>

<p>We’ll also be working on a protocol extension for ActivityPub collections that makes them function like git-like merkle trees for efficient updating and change propagation, but still too early to say anything about that.</p>

<p>A few of the main desiderata we have here are:</p>

<ul>
  <li>Identity and all its data (uploads, comments, etc.) can be moved and restored in case an instance suddenly goes down</li>
  <li>Identity can be present on many instances simultaneously</li>
  <li>Identities remain <em>durable</em> and <em>relevant</em> for webs of trust (as opposed to most ‘censorship resilient’ protocols with fascist undertones, where identity is cheap and abuse is rampant), but can still be <em>deniable</em> for external observers - balancing moderability with autonomy.</li>
  <li>“ownership” of objects can be <em>shared</em> via “group-like” delegated identities and <em>forked</em> with clear change in ownership</li>
</ul>

<h2 id="summary">Summary</h2>

<p>In summary, we’ve gotten started on, and will continue to work on a model of federation where</p>
<ul>
  <li>We can use a wide variety of objects with minimal implementation labor</li>
  <li>We can extend our programs to accommodate new objects by federating action chains and capabilities</li>
  <li>Identity and data are delocalized from instances, but use instances as flexible network actors that have a level of responsibility more than passthrough relays, but less than traditional AP instances.</li>
</ul>

<p>We call this “federated p2p” or “hybrid federated/p2p systems” that blend an always-on federated backbone that coordinates swarms of p2p actors.</p>

<p>Towards the mission of making archives resilient to the pressures of those with vested interests in rewriting history and keeping people confused on a sea of filtered and always-changing information.</p>

<p>You are of course welcome to take part in any of this work! start chatting on the issue boards of any of these projects, @ me on fedi, or email the contact address at the bottom of <a href="https://sciop.net">https://sciop.net</a> &lt;3</p>

<h2 id="appendix-even-more-sneak-peeks">Appendix: Even More Sneak Peeks</h2>

<p>In the even longer future we are working on merging these projects with work elsewhere on wikilike and document-like systems. 
A currently WIP project (that I’m unsure how much detail I can write about yet) in our lab is working to 
federate packs of pages between <a href="https://www.semantic-mediawiki.org/">semantic mediawiki</a> instances.
On SMW instances, “wiki pages” consist of both traditional wiki-markup human-readable text,
but also data schemas and forms and templates for creating and displaying structured data.</p>

<p>We are attempting to design sciop’s systems to eventually accommodate wikilike editing,
with shared ownership of objects, edit histories, forking histories, etc.</p>

<p>Mediawiki is useful as a matured wiki framework, but it can be largely seen as a frontend to editing the underlying federable data. We imagine datasets and uploads being accompanied by more information needed to contextualize it that might come from multiple distinct sources: put one way, a wiki overlay to the datasets in a sciop instance, the other way, a wiki that can support a bunch of bulk data beneath it.</p>

<p>A bridge between MediaWiki and ActivityPub could look as simple as federating a bunch of <a href="https://www.w3.org/TR/activitystreams-vocabulary/#dfn-article"><code class="language-plaintext highlighter-rouge">Article</code></a> objects around, but with SMW we should be able to create and structure arbitrary object types. So a small group like a lab or an archive can run a wiki, a sciop instance, and be able to organize their internal work on the wiki and then bridge work that should be shared more broadly out to a network of activitypub instances and p2p peers for hosting the bulk data: making “open data” an actually-attainable goal without needing to pay AWS a brazillionty dollars to host enormous centralized archives.</p>

<p>With federable processing pipelines and action chains, we imagine being able to compute over the data shared via torrent, create derived objects, populate “dashboards” on small-scale sciop/mediawiki instances, and <a href="https://jon-e.net/infrastructure/#trackers-clients--wikis">bridge the domains of bulk data, communication, computation, and documentation.</a>.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:scheduler">
      <p>except for one dastardly bug with <a href="https://codeberg.org/Safeguarding/sciop/issues/475">the scheduler</a> <a href="#fnref:scheduler" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:whypython">
      <p>because it’s a great language and we love it, that’s why. <a href="#fnref:whypython" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:standards-division">
      <p>See this excellent <a href="https://dustycloud.org/blog/on-standards-divisions-collaboration/">post from Christine on the matter</a>. <a href="#fnref:standards-division" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:bsky-distributed">
      <p>though see their more recent documentation on distributed databases using nosql clusters and view servers, we don’t need to replicate much of this since we aren’t planning on being a “big world” federation model where that kind of scale becomes relevant - <a href="https://atproto.com/articles/atproto-for-distsys-engineers">https://atproto.com/articles/atproto-for-distsys-engineers</a> <a href="#fnref:bsky-distributed" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:no-autocompute">
      <p>Nobody is suggesting to just run random code from the internet! these pipelines should be subject to the same inspection and moderation as the content itself! <a href="#fnref:no-autocompute" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Jonny</name><uri>https://neuromatch.social/@jonny</uri></author><category term="intro" /><summary type="html"><![CDATA[Towards Decentralized ActivityPub, federated compute, and indestructible archives!]]></summary></entry><entry><title type="html">This Post Goes on the Homepage</title><link href="https://blog.sciop.net/2025-09-04/post-to-homepage" rel="alternate" type="text/html" title="This Post Goes on the Homepage" /><published>2025-09-04T06:43:00+00:00</published><updated>2025-09-04T06:43:00+00:00</updated><id>https://blog.sciop.net/2025-09-04/post-to-homepage</id><content type="html" xml:base="https://blog.sciop.net/2025-09-04/post-to-homepage"><![CDATA[<p>Many have said “what is going on over there” and “why is the last update from march 2025 when the earth is older than that now.”</p>

<p>In addition to the sciop blog existing, now it is embedded on the frontpage of sciop dot net,
a routine feature for a website that has most of its core components relatively squared away.</p>

<p>The Pull Request, first initiated by <a href="https://codeberg.org/transorsmth">transorsmth</a> some 3 months ago: <a href="https://codeberg.org/Safeguarding/sciop/pulls/404">https://codeberg.org/Safeguarding/sciop/pulls/404</a></p>

<p>As with all new development on sciop, it is intended to be configurable by other instances aside from sciop dot net,
when such things exist,
and can accept multiple atom feeds for e.g. different topics, covering different projects, with different authors, and so on.</p>

<p>So without further ado, a screenshot of text of the other two posts that are also on this website:</p>

<p><img src="/assets/img/whats-new-bloggycat-whooooaoaoaoaaaaooaa.png" alt="A screenshot of a windows 98-style window showing the metadata for the previous two posts on the site!" /></p>

<h2 id="a-hint-of-future-plugins">A Hint Of Future Plugins</h2>

<p>Since this is a very simple feature, and we don’t have much in the way of developer docs at the moment,
it seems a good enough example to show the pattern of how most things are implemented in sciop.</p>

<p>It consists of</p>
<ul>
  <li>A config model - <a href="https://codeberg.org/Safeguarding/sciop/src/commit/99ad45e4372b8f5b16a139905b7776ee62f0ff57/src/sciop/config/services.py#L200"><code class="language-plaintext highlighter-rouge">UpdateFeedsConfig</code></a></li>
  <li>Some database models - <a href="https://codeberg.org/Safeguarding/sciop/src/commit/99ad45e4372b8f5b16a139905b7776ee62f0ff57/src/sciop/models/atom.py"><code class="language-plaintext highlighter-rouge">models/atom.py</code></a></li>
  <li>A background service to update feeds - <a href="https://codeberg.org/Safeguarding/sciop/src/commit/99ad45e4372b8f5b16a139905b7776ee62f0ff57/src/sciop/services/atom.py"><code class="language-plaintext highlighter-rouge">services/atom.py</code></a></li>
  <li>A job wrapper that links the config to the scheduler - <a href="https://codeberg.org/Safeguarding/sciop/src/commit/99ad45e4372b8f5b16a139905b7776ee62f0ff57/src/sciop/jobs.py#L13"><code class="language-plaintext highlighter-rouge">update_atom_feeds</code></a></li>
  <li>An API endpoint to generate an HTML partial - <a href="https://codeberg.org/Safeguarding/sciop/src/commit/99ad45e4372b8f5b16a139905b7776ee62f0ff57/src/sciop/frontend/partials.py#L37"><code class="language-plaintext highlighter-rouge">/partials/whatsnew</code></a>, and</li>
  <li>Some templates - <a href="https://codeberg.org/Safeguarding/sciop/src/commit/99ad45e4372b8f5b16a139905b7776ee62f0ff57/src/sciop/templates/macros/whats-new.html"><code class="language-plaintext highlighter-rouge">templates/macros/whats-new.html</code></a></li>
</ul>

<p>Pretty much “website things.”</p>

<p>These components (and maybe a handful more) are destined to become the basic of a <a href="https://codeberg.org/Safeguarding/sciop/issues/296">plugin system</a>, where most current functionality is moved into plugins, and most new functionality is implemented as plugins.
There is still a bit of work do be done to get us there, 
mostly in refactoring the existing templates to support plugins modifying and adding components to them<sup id="fnref:edittemplates"><a href="#fn:edittemplates" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>,
and a bit of metaprogramming to wrap API endpoints and handle model migrations,
but we have already been writing sciop in separable services with this in mind,
so once that is done then that shall be the pattern going forward.
As we become a “fediverse app,” the last thing we want to be is “a monolithic fediverse app,”
and want to start with a very pluggable design rather than trying to back flexibility into the system later.</p>

<p>For embedding within institutions with rich histories of work patterns and homemade infrastructure,
this kind of flexibility and ease of writing custom behavior is essential,
and the same is true for supporting different p2p communities.
One of the first plugins that we plan on writing after the plugin system lands is integrating external tracker software,
where rather than relying on other public trackers,
a sciop instance may want to provide its own, embed the tracker link in uploads,
generate custom URLs for private torrents to track upload stats,
and do all the other fun things one might want from a tracker.
We also want to <a href="https://sciop.net/docs/intro/roadmap/#federation">experiment with all the non- or sparsely-implemented FEPs floating around</a>,
and will be writing our federation layer as a generic fastapi<sup id="fnref:orlitestar"><a href="#fn:orlitestar" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> overlay
that can be useful outside sciop.</p>

<h2 id="example">Example</h2>

<p>To configure feeds, stick something like this in your <code class="language-plaintext highlighter-rouge">sciop.yaml</code></p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">services</span><span class="pi">:</span>
  <span class="na">update_feeds</span><span class="pi">:</span>
    <span class="na">enabled</span><span class="pi">:</span> <span class="kc">true</span>
    <span class="na">interval</span><span class="pi">:</span> <span class="m">10</span>
    <span class="na">feeds</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">sciop</span>
        <span class="na">url</span><span class="pi">:</span> <span class="s">https://blog.sciop.net/feed.xml</span>
</code></pre></div></div>

<p>with a URL pointing to some atom feed.
That’s all that someone using the software would need to do!</p>

<p>That corresponds to boilerplate pydantic models:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">UpdateFeed</span><span class="p">(</span><span class="n">BaseModel</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">
    A single feed to use for </span><span class="sh">"</span><span class="s">whats new</span><span class="sh">"</span><span class="s"> updates
    </span><span class="sh">"""</span>

    <span class="n">url</span><span class="p">:</span> <span class="n">AnyHttpUrl</span>
    <span class="sh">"""</span><span class="s">URL of an Atom feed</span><span class="sh">"""</span>
    <span class="n">name</span><span class="p">:</span> <span class="nb">str</span>
    <span class="sh">"""</span><span class="s">
    Short, taglike name to use when displaying posts from this feed.
    </span><span class="sh">"""</span>


<span class="k">class</span> <span class="nc">UpdateFeedsConfig</span><span class="p">(</span><span class="n">JobConfig</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">
    A set of feeds and service config for updating them
    </span><span class="sh">"""</span>

    <span class="n">interval</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">30</span>
    <span class="sh">"""</span><span class="s">
    Frequency (in minutes) to check feed for updates.
    
    If the feed has not been updated in that time, no changes are made to local objects.
    </span><span class="sh">"""</span>
    <span class="n">feeds</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">UpdateFeed</span><span class="p">]</span> <span class="o">|</span> <span class="bp">None</span> <span class="o">=</span> <span class="bp">None</span>
    <span class="sh">"""</span><span class="s">
    Potentially multiple feeds to pull updates from.
    If `feeds` is not provided, service will be disabled.
    </span><span class="sh">"""</span>
    <span class="n">max_n_posts</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">10</span>
    <span class="sh">"""</span><span class="s">
    Only keep the last n posts from each configured feed.
    </span><span class="sh">"""</span>
</code></pre></div></div>

<p>Internally, that config is used by function wrapper that schedules the service:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@scheduler.interval</span><span class="p">(</span>
    <span class="n">minutes</span><span class="o">=</span><span class="nf">get_config</span><span class="p">().</span><span class="n">services</span><span class="p">.</span><span class="n">update_feeds</span><span class="p">.</span><span class="n">interval</span><span class="p">,</span>
    <span class="n">enabled</span><span class="o">=</span><span class="nf">bool</span><span class="p">(</span>
        <span class="nf">get_config</span><span class="p">().</span><span class="n">services</span><span class="p">.</span><span class="n">update_feeds</span><span class="p">.</span><span class="n">enabled</span> <span class="ow">and</span> 
        <span class="nf">get_config</span><span class="p">().</span><span class="n">services</span><span class="p">.</span><span class="n">update_feeds</span><span class="p">.</span><span class="n">feeds</span>
    <span class="p">),</span>
<span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">update_atom_feeds</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">
    Put documentation in the real thing, but this is merely a simulacrum
    </span><span class="sh">"""</span>
    <span class="k">await</span> <span class="n">services</span><span class="p">.</span><span class="nf">update_feeds</span><span class="p">()</span>
</code></pre></div></div>

<p>and then the scheduler will launch a task every <code class="language-plaintext highlighter-rouge">interval</code> minutes to update the feed.</p>

<p>Simple as pie. the rest are details.</p>

<h2 id="appendix---build-and-deploy-a-static-site-with-codebergforgejo-using-webhooks">Appendix - Build and Deploy a Static Site with Codeberg/Forgejo using Webhooks</h2>

<p>If you are unsure how one might go about making an atom feed -</p>

<p>We write <a href="https://codeberg.org/Safeguarding/sciop-blog">this very blog</a> with jekyll, 
generate the feed with <code class="language-plaintext highlighter-rouge">jekyll-feed</code>,
and use a <a href="https://github.com/adnanh/webhook">webhook</a> to trigger a rebuild <a href="https://forgejo.org/docs/latest/user/webhooks/">on push</a> on one of our machines. 
Total time from push to deployment is ~8 seconds or so.</p>

<p>I wasn’t able to find another blogpost describing this exact process,
since the forgejo webhook documentation is a little sparse.
So if it saves anyone time…</p>

<ul>
  <li>generate a secret with <code class="language-plaintext highlighter-rouge">openssl rand -hex 32</code></li>
  <li>configure a <a href="https://github.com/adnanh/webhook">webhook</a> like this:</li>
</ul>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w">
  </span><span class="p">{</span><span class="w">
    </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"build"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"execute-command"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"/path/to/your/build/command.sh"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"command-working-directory"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/wherever/your/blog/repo/is"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"response-message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Deployed!"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"trigger-rule"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"and"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="p">{</span><span class="w">
          </span><span class="nl">"match"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"payload-hmac-sha256"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"secret"</span><span class="p">:</span><span class="w"> </span><span class="s2">"{YOUR_SECRET}"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"parameter"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
              </span><span class="nl">"source"</span><span class="p">:</span><span class="w"> </span><span class="s2">"header"</span><span class="p">,</span><span class="w">
              </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"X-Forgejo-Signature"</span><span class="w">
            </span><span class="p">}</span><span class="w">
          </span><span class="p">}</span><span class="w">
        </span><span class="p">},</span><span class="w">
        </span><span class="p">{</span><span class="w">
          </span><span class="nl">"match"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"value"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"refs/heads/main"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"parameter"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
              </span><span class="nl">"source"</span><span class="p">:</span><span class="w"> </span><span class="s2">"payload"</span><span class="p">,</span><span class="w">
              </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ref"</span><span class="w">
            </span><span class="p">}</span><span class="w">
          </span><span class="p">}</span><span class="w">
        </span><span class="p">}</span><span class="w">
      </span><span class="p">]</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>

<ul>
  <li>for a build script <code class="language-plaintext highlighter-rouge">/path/to/your/build/command.sh</code> like this<sup id="fnref:jekyll"><a href="#fn:jekyll" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>:</li>
</ul>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

git pull

/home/webhook/.rbenv/shims/bundle <span class="nb">install</span>
/home/webhook/.rbenv/shims/bundle <span class="nb">exec </span>jekyll build <span class="nt">-d</span> /some/web/directory
</code></pre></div></div>

<ul>
  <li>configure nginx to forward the URL to the webhook service<sup id="fnref:certbot"><a href="#fn:certbot" class="footnote" rel="footnote" role="doc-noteref">4</a></sup></li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>server {
  listen 443 ssl;
  server_name your.url.here;

  root /some/web/directory;

  location / {
    try_files $uri $uri/ $uri.html =404;
  }

  location /hooks/ {
    proxy_pass http://localhost:9000/hooks/;
  }
  
  # other stuff for ssl and logging and ratelimiting and etc.
}
</code></pre></div></div>

<ul>
  <li>Configure a webhook for your repository,
    <ul>
      <li>POSTing JSON</li>
      <li>to your hooks url</li>
      <li>with the secret generated above,</li>
      <li>on push events</li>
      <li>with a branch filter for <code class="language-plaintext highlighter-rouge">main</code>.</li>
    </ul>
  </li>
</ul>

<p>and ka blammo. 
push to deploy the blog, and sciop will catch up when it runs its next update.</p>

<p>Now you’re talking to online from your personal computer.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:edittemplates">
      <p>One can already override any template by configuring <a href="https://sciop.net/docs/python/config/#sciop.config.PathConfig.template_override"><code class="language-plaintext highlighter-rouge">paths.template_override</code></a> with a directory of template overrides. E.g. to override <code class="language-plaintext highlighter-rouge">src/sciop/templates/pages/datasets.html</code>, one would write their own template and put it in <code class="language-plaintext highlighter-rouge">{template_override}/pages/datasets.html</code> . <a href="#fnref:edittemplates" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:orlitestar">
      <p>Or, we may jump ship to <a href="https://litestar.dev/">litestar</a>, since contributing to fastAPI has proven to be infuriating, and the non-bot PR merge rate can attest to that. <a href="#fnref:orlitestar" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:jekyll">
      <p>After installing rbenv and ruby, if you aren’t using jekyll then obviously do whatever your thing is to build it obvi <a href="#fnref:jekyll" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:certbot">
      <p>Using <a href="https://certbot.eff.org/">certbot</a> to configure ssl <a href="#fnref:certbot" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Jonny</name><uri>https://neuromatch.social/@jonny</uri></author><category term="intro" /><summary type="html"><![CDATA[We can embed atom feeds on the homepage now, and this is the proof of that.]]></summary></entry><entry><title type="html">The Modern Webseed</title><link href="https://blog.sciop.net/2025-08-29/webseeds" rel="alternate" type="text/html" title="The Modern Webseed" /><published>2025-08-29T23:00:00+00:00</published><updated>2025-08-29T23:00:00+00:00</updated><id>https://blog.sciop.net/2025-08-29/webseeds</id><content type="html" xml:base="https://blog.sciop.net/2025-08-29/webseeds"><![CDATA[<ol id="markdown-toc">
  <li><a href="#how-it-works" id="markdown-toc-how-it-works">How It Works</a></li>
  <li><a href="#adding-a-webseed" id="markdown-toc-adding-a-webseed">Adding A Webseed</a></li>
  <li><a href="#why-this-is-cool" id="markdown-toc-why-this-is-cool">Why This Is Cool</a>    <ol>
      <li><a href="#institutional-pipes" id="markdown-toc-institutional-pipes">Institutional Pipes</a></li>
      <li><a href="#bridging-archives" id="markdown-toc-bridging-archives">Bridging Archives</a></li>
      <li><a href="#division-of-labor---scrapers--seeds" id="markdown-toc-division-of-labor---scrapers--seeds">Division of Labor - Scrapers &amp; Seeds</a></li>
    </ol>
  </li>
  <li><a href="#todo" id="markdown-toc-todo">TODO</a></li>
  <li><a href="#appendix-on-malicious-use" id="markdown-toc-appendix-on-malicious-use">Appendix on Malicious Use</a></li>
</ol>

<p>BitTorrent is great, but what if you, graceful and mighty but in adverse network conditions, cannot run a torrent client?</p>

<p>Sciop has just the feature for you! 
As of today<sup id="fnref:nottoday"><a href="#fn:nottoday" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> you can now add webseeds straight from the website!
This means that we can bridge datasets archived in any number of traditional archives,
efficiently use existing resources,
and bring a new category of institutionalized peers into the swarm.</p>

<p>This work builds on one of our attempts at revitalizing the broader bittorrent tooling ecosystem,
where after integrating <a href="https://github.com/p2p-ld/torrent-models"><code class="language-plaintext highlighter-rouge">torrent-models</code></a><sup id="fnref:tm"><a href="#fn:tm" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>
we can now safely<sup id="fnref:bencode"><a href="#fn:bencode" class="footnote" rel="footnote" role="doc-noteref">3</a></sup> and transparently make serverside edits to uploaded torrents.<sup id="fnref:moresoon"><a href="#fn:moresoon" class="footnote" rel="footnote" role="doc-noteref">4</a></sup></p>

<p>Stick around after the ad break to learn how webseeds could enrich your life and make you more complete as a person.</p>

<p><a href="/pages/jims-craigs-discount-real-science.html"><img src="/assets/img/discount_science_ad.jpeg" alt="An advertisement for jims craigs discount real climate data 100% science" /></a></p>

<h2 id="how-it-works">How It Works</h2>

<p>Webseeds are very simple and require no special behavior on the part of the HTTP server aside from supporting <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Range_requests">range requests</a>.
Where normally the bittorrent client would <a href="https://sciop.net/docs/intro/bittorrent/">request pieces from other peers</a>,
the client instead requests ranges from the http server.
Webseeds can be used along with normal peer swarms, and multiple webseeds can be used at once.</p>

<p>Webseed URLs are constructed like this for multi-file torrents:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{webseed_url}/{torrent_name}/{path}
</code></pre></div></div>

<p>and like this for single files:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{webseed_url}/{torrent_name}
</code></pre></div></div>

<p>where <code class="language-plaintext highlighter-rouge">torrent_name</code> is the value of the <code class="language-plaintext highlighter-rouge">name</code> field in the torrent’s infodict
(not the filename).
The <code class="language-plaintext highlighter-rouge">name</code> field is the name that’s shown in your torrent client and the folder that is created when you download a torrent.<sup id="fnref:addtosite"><a href="#fn:addtosite" class="footnote" rel="footnote" role="doc-noteref">5</a></sup></p>

<p>So for a torrent named <code class="language-plaintext highlighter-rouge">tentacoli</code> with a file named <a href="https://archive.org/details/tentacoli-15-tentacles-versione-2/tentacoli-07-too+risky+a+day+for+a+regatta.mp3"><code class="language-plaintext highlighter-rouge">track-07.mp3</code></a>,
if an http server was hosting that file at <code class="language-plaintext highlighter-rouge">https://example.com/desktop/tentacoli/track-7.mp3</code>
then the webseed url would be <code class="language-plaintext highlighter-rouge">https://example.com/desktop/</code>.</p>

<h2 id="adding-a-webseed">Adding A Webseed</h2>

<p>If you have hosted some data in a torrent somewhere, or are aware of some alternate source for it,
you can add it to the torrent from the web interface<sup id="fnref:andapi"><a href="#fn:andapi" class="footnote" rel="footnote" role="doc-noteref">6</a></sup></p>

<p>When you are logged in to sciop, on an upload page, you will see a button to add a webseed.</p>

<p><img src="/assets/img/webseeds-0.webp" alt="A table of webseeds, showing one existing webseed, grayed out because it's invalid, and a button prompting one to add a webseed" /></p>

<p>If you click it, you can then type in a url</p>

<p><img src="/assets/img/webseeds-1.webp" alt="A textbox appears! in it we have typed https://example.com/desktop/" /></p>

<p>And once you submit it, it will be added to the validation queue</p>

<p><img src="/assets/img/webseeds-2.webp" alt="Our webseed url now shows in the table with the status of &quot;queued&quot;" /></p>

<p>The server will now attempt to validate that the URL works as a webseed by requesting some subset of the pieces
and checking them against the piece hashes in the torrent.<sup id="fnref:webseeddocs"><a href="#fn:webseeddocs" class="footnote" rel="footnote" role="doc-noteref">7</a></sup></p>

<p>If the webseed validates, if the account that added it has the <code class="language-plaintext highlighter-rouge">upload</code> permission scope,
then it will be added to the torrent!
If the account does not have the <code class="language-plaintext highlighter-rouge">upload</code> scope, 
it will enter the moderation queue and only added after manual review by a moderator.</p>

<p>The account that uploaded the torrent retains control of the torrent,
so they are able to remove webseeds if they object to them for some reason,
and webseeds from accounts without <code class="language-plaintext highlighter-rouge">upload</code> permissions are not
displayed publicly until they are approved to avoid them being a vandalism vector.</p>

<p>This follows the general <a href="http://meatballwiki.org/wiki/SoftSecurity">soft security</a> moderation pattern of sciop -
rather than gatekeeping at the level of account creation,
any account can propose a change to be made, 
but only known accounts automatically have those changes take effect.
This allows us to keep the site trustworthy while allowing everyone to participate.</p>

<h2 id="why-this-is-cool">Why This Is Cool</h2>

<p>What’s the big deal, it’s just adding a url to a list in a torrent?
Like objects in mirrors, webseeds in blogposts are cooler than they appear.</p>

<p>We are not aware of any other trackers that allow post-hoc addition of validated webseeds
in a participatory way by people that aren’t the original uploader,
but please let us know if we missed something.</p>

<h3 id="institutional-pipes">Institutional Pipes</h3>

<p>As much as we would love for everyone to experience the joy of a torrent swarm,
many would-be collaborators have been unable to contribute because their resources are housed in some setting
where bittorrent traffic is blocked, or it would otherwise be impossible for them to run a torrent client.
This has been a common refrain from our colleagues at academic institutions or archives.</p>

<p>We are not protocol purists, and don’t use bittorrent for bittorrent’s sake,
but instead use it to <em>bring all available resources to bear</em>
from cute little raspis seeding from a flash drive to enormous Swedish Seedboxes.
What are HTTP servers and S3 buckets but very big peers?</p>

<p>Adding webseeds via the web UI provides a new way for those in constrained circumstances to join the swarm
by getting the data<sup id="fnref:seedblock"><a href="#fn:seedblock" class="footnote" rel="footnote" role="doc-noteref">8</a></sup> and rehosting it on a traditional HTTP server.
We also hope to incentivize joining the swarm by providing a little pro-social gamification -
having your URL listed as a webseed on sciop lets others know that you care about the preservation of cultural memory.</p>

<h3 id="bridging-archives">Bridging Archives</h3>

<p>Archives have a problem: there are more than one of them.
This is a problem for preserving threatened data, 
where different archival groups have been patching together dataset storage on zenodo, google drive, globus, and so on;
as well as for more conventional archiving,
where researchers have to search or upload their work to multiple archives that are mutually incompatible.</p>

<p>Aggregation and indexing overlays address this problem to some degree by collecting potentially multiple links
to the potentially multiple places a dataset might be hosted, but they don’t <em>make use</em> of those multiple hosts,
allowing the redundancy of resources to improve availability, resilience, and performance.
Even if a dataset is hosted in multiple places, I still have to download it from only one of them,
so the fastest and largest archives end up shouldering all the costs
and smaller archives don’t have a great “path to relevance” - why would i store my data on the slow, unpopular archive?</p>

<p>Treating a torrent as a verifiable shorthand for a dataset and then allowing anyone to add a webseed
means that now it <em>is</em> possible to make use of <em>all</em> the available resources for a given dataset -
if i have previously archived something to zenodo, or on my special S3 bucket server<sup id="fnref:s3isjusthttp"><a href="#fn:s3isjusthttp" class="footnote" rel="footnote" role="doc-noteref">9</a></sup>, 
and I see someone uploaded it to sciop, I can add in my copy as an additional source.
When multiple webseeds are present, bandwidth can be shared between each of them (and the rest of the swarm),
decreasing the burden on each individual host.</p>

<p>This allows sciop to take indexing overlays one step further - 
rather than just aggregating the <em>metadata</em> from multiple hosts,
we can aggregate the <em>hosts themselves.</em></p>

<h3 id="division-of-labor---scrapers--seeds">Division of Labor - Scrapers &amp; Seeds</h3>

<p>Sciop as a project is about coordinating people with varying resources, expertise, and commitment
towards the same goal - it should be possible for someone to wander in off the street<sup id="fnref:orfedi"><a href="#fn:orfedi" class="footnote" rel="footnote" role="doc-noteref">10</a></sup>
and contribute, whether that means spending 5 minutes improving the docs or 5 months scraping alongside us.</p>

<p>We have observed a natural division of labor emerge, where people often fall into one or a few basic roles based on expertise or interest.
One of those divisions has been between scrapers and seeders -
where some people love to do the work of scraping and downloading, but don’t have the resources to actually store and upload it;
and others don’t enjoy scraping but have plenty of spare storage and bandwith.</p>

<p>The pattern of</p>

<ul>
  <li>low-resource scrapers downloading, hashing, and discarding data</li>
  <li>uploading a torrent with a webseed, and</li>
  <li>other seeders bootstrapping the swarm off the webseed</li>
</ul>

<p>has proven to be <strong><em>ridiculously effective.</em></strong></p>

<p>This provides a way for people without seeding resources to effectively call down a swarm of seeders onto an at-risk dataset,
who then automatically<sup id="fnref:viarss"><a href="#fn:viarss" class="footnote" rel="footnote" role="doc-noteref">11</a></sup> and collaboratively back it up.
This process is <em>much</em> more respectful to the hosting servers than everyone scraping their own copy would be,
as in the ideal case the webseed needs to only upload the full dataset twice,<sup id="fnref:uploadtwice"><a href="#fn:uploadtwice" class="footnote" rel="footnote" role="doc-noteref">12</a></sup>
and then all future downloads have bandwidth supplemented by the swarm.
Scraping is additionally deduplicated by our quest system and scraping tools that turn web archiving into an all-play activity,
and will be the subject of a future blog post :).</p>

<p>Adding the ability to add webseeds after the fact extends that by being able to adapt in the case that the dataset moves,
as well as opens up new opportunities for labor division,
where scouts aware of copies of data being held in other places e.g. by other archival groups
can do the curation work of adding those to the swarm.</p>

<h2 id="todo">TODO</h2>

<p>There are a number of obvious expansion points we’ll be pursuing</p>

<ul>
  <li>The multiple-use of the <code class="language-plaintext highlighter-rouge">name</code> field for display and as part of the url in a webseed is a big pain in the ass.
this is the reason that, e.g. when downloading the TIF archive of the <a href="https://sciop.net/datasets/si-nmaahc">NMAAHC</a>,
one ends up with a million torrents with the same name.<sup id="fnref:nmaahc"><a href="#fn:nmaahc" class="footnote" rel="footnote" role="doc-noteref">13</a></sup>
This also poses a problem when the relevant files are not all stored under one subdirectory,
or have urls for files with any other naming structure but that of the torrent.
We want to create a plugin for sciop that allows you to use a <em>sciop URL</em> as a webseed-by-proxy
that then redirects to the appropriate URL on the webseed server,
and in the future we’ll be working on our own client that escapes some of the stagnation of current clients and decouples the <code class="language-plaintext highlighter-rouge">name</code> field.</li>
  <li>When a webseed is added on the server, it is not added to clients that have already downloaded the torrent.
We’ll be writing a general <code class="language-plaintext highlighter-rouge">sync</code> function for sciop-cli that resolves this and other torrent mutability
problems at the client level, updating existing torrents with new metadata, 
replacing torrents that have been superceded by a repack, and so on.</li>
  <li><strong>Validating</strong> files in a torrent against a webseed url is <em>almost the same operation</em> as <strong>creating</strong> torrents from a url.
We want to support that for, e.g., using spare server resources to create torrents when scraping resources are thin,
as well as for “importing” datasets from other archival systems.
This also opens up interesting possibilities for hybrid http/bittorrent-backed web archives for, say, 
<a href="https://replayweb.page/">replaywebpage</a> from <a href="https://webrecorder.net/">webrecorder</a>,
but more information on that is TBD</li>
  <li>We’ll also be adding stats to account pages to encourage adding webseeds, it’s good to be able to brag about doing good things!</li>
</ul>

<p>If you’d like to get involved with these or any other sciop projects,
you are more than welcome to hop on the <a href="https://codeberg.org/Safeguarding/sciop/issues">issues</a>,
or contact me or anyone else in <a href="https://fedihum.org/@SafeguardingResearch">SRC</a> to ask to be added to our contributor chat &lt;3</p>

<h2 id="appendix-on-malicious-use">Appendix on Malicious Use</h2>

<p>What about privacy? isn’t making people ping a server a whole attack vector?</p>

<p>Adding arbitrary webseeds is no more of an information leak than being able to add arbitrary peers to a swarm.
Anyone who wanted to surveil the swarm could do so far more easily by simply joining it or scraping trackers.</p>

<p>Sciop validates a random subset of pieces from any added webseeds before adding them into torrents,
and since the selected pieces are random, it would be relatively hard to fake hosting some tiny subset of the data and yield zero reward.
If a webseed were to try and serve copies of the data spiked with malware,
torrent clients would reject it since it doesn’t match the piece hash and ban the webseed IP.</p>

<p>However if there is some security or privacy risk that we failed to consider,
please <a href="https://codeberg.org/Safeguarding/sciop/issues">submit an issue</a> 
or contact us privately at <code class="language-plaintext highlighter-rouge">contact (at) safeguar (dot) de</code> if raising an issue could compromise sciop’s security.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:nottoday">
      <p>Actually not today, for a week or so, but it was broken and we were letting shrimp have the first dibs. <a href="#fnref:nottoday" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:tm">
      <p>“interacting with torrent files but not a total nightmare.” <a href="#fnref:tm" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:bencode">
      <p>It’s possible to just directly edit bencoded objects,
but we don’t make a habit of inviting hell into our minds by passing around anonymous dicts. <a href="#fnref:bencode" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:moresoon">
      <p>More to come, including embedding a first-party tracker, description metadata in the usual places,
and json-ld in unusual places. <a href="#fnref:moresoon" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:addtosite">
      <p>although while i am writing this i am realizing we need to also display it on sciop,
so by the time you read this that will likely already be done. <a href="#fnref:addtosite" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:andapi">
      <p>And via the <a href="https://sciop.net/docs/api/#/api/create_webseed_api_v1_uploads__infohash__webseeds_post">api</a> <a href="#fnref:andapi" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:webseeddocs">
      <p>See the docs for more information on the <a href="https://sciop.net/docs/python/services/#sciop.services.webseeds">service</a>
and the <a href="https://sciop.net/docs/python/config/#webseed-validation">configuration</a> <a href="#fnref:webseeddocs" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:seedblock">
      <p>In our experience, it is typically the <em>incoming</em> connections that are blocked, rather than outgoing,
and it is possible to download a torrent with a client or a tool like aria2 even when seeding is impossible. <a href="#fnref:seedblock" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:s3isjusthttp">
      <p>S3 is just HTTP!!! All data hosted in buckets is now in play.
E.g. if your AWS S3 bucket was <code class="language-plaintext highlighter-rouge">smithsonian-open-access</code>, 
the HTTP url is just <code class="language-plaintext highlighter-rouge">https://smithsonian-open-access.s3.amazonaws.com/</code>
and all the paths, or prefixes, whatever they call them, are the same. <a href="#fnref:s3isjusthttp" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:orfedi">
      <p>or the fediverse, as is more often the case. <a href="#fnref:orfedi" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:viarss">
      <p>e.g. if they are subscribed to the relevant torrent .rss feed <a href="#fnref:viarss" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:uploadtwice">
      <p>Once for the initial hashing, once for initial seeding to the swarm <a href="#fnref:uploadtwice" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:nmaahc">
      <p><code class="language-plaintext highlighter-rouge">~ user experience ~</code> <img src="/assets/img/nmaahc-name.webp" alt="A list of torrents, all of which are named nmaahc" /> <a href="#fnref:nmaahc" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Jonny</name><uri>https://neuromatch.social/@jonny</uri></author><category term="intro" /><summary type="html"><![CDATA[We have added the ability to add and validate webseeds to torrents from the website, so even if you can't run a torrent client, you can join the swarm and keep endangered information alive.]]></summary></entry><entry><title type="html">Welcome to SciOp The Blog!</title><link href="https://blog.sciop.net/2025-08-26/welcome-to-sciop" rel="alternate" type="text/html" title="Welcome to SciOp The Blog!" /><published>2025-08-26T00:23:39+00:00</published><updated>2025-08-26T00:23:39+00:00</updated><id>https://blog.sciop.net/2025-08-26/welcome-to-sciop</id><content type="html" xml:base="https://blog.sciop.net/2025-08-26/welcome-to-sciop"><![CDATA[<p>What up everyone, this is the sciop blog.
On this website we’ll post updates about the other website, <a href="https://sciop.net">https://sciop.net</a></p>

<p>This post isn’t really about anything except the blog existing,
so you should probably go to a different one</p>]]></content><author><name>Jonny</name><uri>https://neuromatch.social/@jonny</uri></author><category term="intro" /><summary type="html"><![CDATA[This is the start of the sciop blog!]]></summary></entry></feed>