Trust·April 29, 2026·10 min read

How We Clear Rights on Every Clip of Training Data

Rights are among the most common failure points in human training data. Motionstack treats consent, provenance, and chain-of-title as part of the deliverable, so that every clip we ship is one a legal and procurement team can clear.

By Motionstack

Cleared on every clip

Most training-data defects announce themselves: a label is wrong, a sensor drifts, a kitchen reads as implausibly staged. Rights defects behave differently. They remain invisible until the moment they become expensive. A dataset can be complete, train a strong policy, and pass every technical check, and still prove unusable the day a procurement team asks a single question: where did the people in this footage agree to appear in it?

This is a structural failure mode of human training data, not an edge case. Scraped media carries no consent, because no one filmed their kitchen to train a robotics model. Crowd-uploaded clips arrive with unverifiable provenance: the identity of the person who shot them, the others in frame, and what any of them were told are all unknown. Either path produces footage that performs in a research notebook and then stalls the moment it enters a commercial pipeline and a legal review.

Motionstack treats rights as a first-class part of the deliverable rather than paperwork appended at the end. Every clip we ship arrives with its consent and provenance packaged alongside the media. This post sets out how we clear those rights, why we clear them before a camera records, and why doing so is what allows a buyer to put our data into a production training pipeline without inheriting another party's legal exposure.

Why scraped and crowd-sourced data fails the rights test

Consider the two common routes to human video, because both break at the same point. Scraping the open web delivers volume and little else. The people in that footage did not agree to anything, least of all to having their movements, faces, and homes used to train a commercial model. Regulation has made the consequences explicit: the EU AI Act and the GDPR both treat identifiable individuals and biometric data as categories requiring a lawful basis to process. Scraped footage establishes no such basis and leaves no record a regulator could examine.

Crowd uploads present better and clear worse. A contributor sends a clip of a household task, but no chain-of-title sits behind it. Whether they filmed it themselves, whether the person in frame is the uploader, and whether the second figure in the doorway was ever asked are all unresolved. Absent those answers, each clip becomes a question the buyer's counsel must adjudicate, and the prudent answer is no.

The cost of getting this wrong is rarely a takedown notice. More often it is a procurement review that locates the gap, freezes the dataset, and returns the model team to the start after they have already trained on it. The purpose of acquiring data rather than collecting it is to transfer that risk away, not to import it.

A clip without a consent record is not cheaper data; it is an unbilled liability carried until it surfaces.

Motionstack

Our process inverts the usual order. Conventional pipelines collect footage and then attempt to clear it. We clear it first. Before a contributor captures anything, they sign a release. Nothing reaches us, and nothing reaches a buyer, until that release is executed. This is the difference between consent as a formality and consent as the gate every clip must pass.

The release is drafted to be broad enough to be genuinely useful for AI and machine-learning training and specific enough to be candid about that use. It grants rights that are broad, perpetual, sublicensable, and transferable, with explicit terms covering the matters that govern embodied AI:

  • AI and ML training. The contributor agrees, in plain language, that their footage and motion may be used to train, evaluate, and improve machine-learning models, including commercial ones.
  • Biometric and likeness consent. Movement, gait, hands, and face are biometric signals. The release names them and obtains explicit consent to capture and use them rather than obscuring the point.
  • Third-party consent. Anyone else who appears identifiably in frame must consent as well. A partner, a housemate, a visiting friend: each is covered, or the clip does not ship.
  • Sublicense and transfer. Buyers frequently need to license the data onward and, in some agreements, transfer it outright. The release grants both up front, so the terms are not renegotiated per contributor.

One rule makes this operative: if an identifiable person has not consented, the clip does not ship. It is not blurred and shipped, nor shipped pending review. It is held until consent exists, or it is removed. That rule is what permits us to represent to a buyer that the faces in the dataset are consenting ones.

A signed digital consent release shown alongside a captured clip
Consent is signed before capture, not chased afterward. A clip without a release behind it does not enter the pipeline.

Private, consented spaces only

Where we capture matters as much as who consents. We work in private, consented spaces: homes, private gardens, home studios. In these environments consent can be cleared completely, because everyone present is either the contributor or someone who has agreed to take part.

We do not capture in public places. A street, a cafe, or a train platform is full of people who never agreed to be filmed and cannot reasonably be asked. Consent for a crowd is unclearable, so footage shot there carries an unresolvable rights defect regardless of its quality. Restricting capture to private, consented spaces is not only a privacy measure; it keeps the chain-of-title clean enough to withstand scrutiny. It also matches what manipulation policies require: the diverse, real-world homes where robots will actually operate, captured on a consistent rig.

Provenance and chain-of-title, packaged per clip

Consent is insufficient if it cannot be demonstrated later. Every dataset therefore ships with its provenance inside it: a consent folder holding the signed releases and the chain-of-title, alongside the datasheet, so you can establish who, where, when, and under what terms without leaving the package.

The manifest is the spine of the chain-of-title. It records what the clip is, where and when it was captured, which contributor produced it, which release version they signed, and which third parties in frame consented. Because it travels with the media rather than residing in a separate spreadsheet, the provenance does not drift from the footage as the dataset is copied, merged, and handed between teams.

This packaging is what allows a procurement review to proceed quickly rather than painfully. When a buyer's legal team asks on what basis a given person appears in the data, the answer is in the folder, not in an email thread someone must reconstruct. Clean provenance is the difference between a dataset that survives diligence and one that triggers it.

A dataset consent folder holding signed releases and a chain-of-title record
Consent and chain-of-title ship inside the dataset itself, as first-class content rather than side files.

What the manifest carries

In concrete terms, the per-clip manifest records the fields a reviewer asks about, so the chain-of-title remains legible without us present:

  • Clip and contributor identifiers, and the capture date and location type.
  • The release version signed, with a timestamp, so terms map to the correct clip.
  • Identifiable third parties in frame and their consent status.
  • Any sensitive-context flags, with the safeguard applied or the exclusion taken.
1 folder
per clip: media, manifest, and consent records together
0 clips
shipped with an unconsented identifiable person in frame
Private
spaces only: homes and home studios, never public places

Withdrawal as part of the design

Consent that cannot be withdrawn is not consent. Contributors can withdraw, and the system is built to honour withdrawal rather than treat it as an exception. When someone withdraws, we identify the affected clips, remove them from the dataset, and notify the buyers who received them so they can take the same step downstream.

This is practical because of the per-clip packaging. When provenance travels with the media, a withdrawal is a precise lookup rather than a forensic search across an undifferentiated archive. The same structure that accelerates a legal review is what makes a withdrawal enforceable months later. Honouring withdrawal also keeps us aligned with the data-subject rights that frameworks such as the GDPR require, which is a substantial part of what a commercial buyer is paying for.

Secure handling and sensitive contexts

Rights do not end at the release. How the data is handled afterward is part of keeping the commitment made to contributors. We operate on a least-access basis: people and systems see only the data they need, and footage is deletable on request, which is what makes withdrawal more than a gesture.

Some contexts demand more than the baseline. Anything involving children, medical situations, or other sensitive settings receives additional safeguards or is excluded outright. Where the risk to a person outweighs the value of the clip, the clip is not made. That standard costs us coverage in places, and we hold it regardless, because a dataset is only as trustworthy as its least careful clip.

Holding that standard is a research decision as much as an ethical one. A pipeline that quietly compromises on sensitive contexts produces footage that buyers later have to strip out, an outcome worse than never having collected it. Excluding it at the outset keeps the dataset clean end to end.

Why this de-risks a commercial training pipeline

Drawn together, the value is straightforward. A model team buying human data is not only buying hours of footage; it is buying the right to train on it, ship the result, and defend that decision later. Consent before capture establishes a lawful basis. Per-clip provenance supplies a chain-of-title they can produce. Withdrawal handling and secure deletion provide an answer when a contributor changes their mind. Sensitive-context exclusion keeps the harder surprises out from the start.

The result is a dataset that clears a procurement or legal review rather than stalling in one. The alternative is not theoretical. The research community has worked through this on egocentric video: efforts such as Ego4D and large-scale robot-data pooling such as Open X-Embodiment place real weight on consent and licensing, precisely because data without a clean rights story cannot anchor a product. We apply the same standard to every clip, not only the headline ones.

If you are scoping a humanoid or embodied-AI dataset and rights are a gate you must clear, that requirement is what we built the company around. Tell us the spec or book a call, and we will show you the consent and provenance that ship with the data before you commit to a single clip. You can also read more about how we handle trust and how the full pipeline comes together.

Get the real-world data your robot needs.

Tell us the task, the person, and the place. We field it from a network of 800k contributors and deliver it to spec, cleared for commercial training, in about four weeks.