A Complete Shotgun Schema

Reading the "private" schema for all the details.

My site-local Shotgun cache is coming together quite nicely; you can see the progress on GitHub, as well as the ongoing results of my reverse-engineering efforts.

As work progresses, there is a greater and greater need for a through understanding of Shotgun's database schema so that I can replicate its behavior. One example that I'm starting to tackle: when you query an entity or multi_entity field, the entities will return the (required) type and id, but also a name, however, if you query for that name field you will find that it doesn't actually exist:

>>> # Lets grab the pipeline step of any task:
>>> sg.find_one('Task', [], ['step'])
{'step': {'type': 'Step', 'id': 4, 'name': 'Matchmove'}, 'type': 'Task', 'id': 2}
>>> # Notice the name...           ^^^^^^ here.
>>> # Let's grab that name directly:
>>> sg.find_one('Step', [('id', 'is', 4)], ['name', 'code'])
{'code': 'Matchmove', 'type': 'Step', 'id': 4}
>>> # No 'name' to be found. Huh.

Another example is the description of back-references of many multi_entity fields. E.g. if you set Task.entity to a Shot, that task will end up in Shot.tasks as well.

I have reached out to Shotgun support, and they have confirmed to me that the schema returned by the public API's schema_read (and related) methods would need to be expanded to return the information I need.

There must be another way to get this information, because the Shotgun website works, and it must have it. So lets go digging there...

The "Private" Schema

If you view the source of any Shotgun page (after authentication), you will see a collection of interesting <script> tags in the <head>:

<script src="/page/js_globals"></script>
<script src="/javascripts/ext_5e14f74dad38933008ea607ddeed863c.js"></script>
<script src="/javascripts/base_c42fd9a9fa5653fb541fc60e22005824.js"></script>
<script src="/page/schema?page_id=1354"></script>
<script src="/javascripts/widgets_a2b85e973eef52e64aaee66b64fd1593.js"></script>

/page/js_globals is interesting, as it contains all of the state for your Shotgun instance, including settings, ActionMenuItems, details about you and your preferences, etc.. The three in the /javascripts directory implement the core functionality of the site, and are a great read if you feel like monkey-patching the site to extend ActionMenuItems. But none of these are what we are here for.

/page/schema contains (among other things) the schema for your Shotgun instance, in blistering detail.

Of particular interest to the two problems outlined above, some fields are marked with identifier_column: true which I believe indicates the field used for the implicit name, e.g. Task.content:

"content": {
    "id": 264,
    "entity_type": "Task",
    "name": "content",
    "display_name": "Task Name",
    "data_type": "text",
    "identifier_column": true
}, ...

... entity fields explicitly label their back-references, e.g. Task.entity:

"entity": {
    "id": 255,
    "entity_type": "Task",
    "name": "entity",
    "display_name": "Link",
    "data_type": "entity",
    "inverse_association": [
} ...

... and multi_entity fields describe their association objects, e.g. Asset.shots:

"shots": {
    "id": 222,
    "entity_type": "Asset",
    "name": "shots",
    "display_name": "Shots",
    "data_type": "multi_entity",
    "through_join_field": "asset_shot_connections",
    "through_join_entity_type": "AssetShotConnection",
}, ...

Reading the Private Schema

I've known about this file and its contents for a while, and have studied it to understand the design of the schema, but I have never before used its specific contents to drive anything. I did not use it because I thought retrieving it would either be a manual process, or I would need to use the credentials of a user (instead of API keys) to scrape the website. Given the opportunity for the schema to change at any moment, these options always seemed too fragile.

The python_api3 has always had a _get_session_token method for retrieving a session cookie. For some time this seemed like it be replaced at any moment as it was documented as:

def _get_session_token(self):
   """Hack to authenticate in order to download protected content
   like Attachments

However, in a relatively recent commit, this method became part of the public API, and it is now a key part of the new two-factor authentication. At this point, we can read the Javascript file with only API keys:

import requests
import shotgun_api3

shotgun = shotgun_api3(*YOUR_CREDENTIALS)
session = requests.Session()
session.cookies['_session_id'] = shotgun.get_session_token()
print session.get('https://example.shotgunstudio.com/page/schema').text

I'll leave parsing it as an exercise to the reader.

But, is this safe?

The people at Shotgun are very much not the type to perform spontaneous massive refactors of key parts of their framework. So, unless they specifically block access of that page to API-based sessions, I don't see it going away.

My feeling is that any large changes to the availability or format of this schema would coincide with large changes to the API or storage engine, in which case anything that you are writing that is sensitive enough to require such intimate knowledge is going to break anyways.

Ergo, I'm going to go ahead and start using this information to build sgcache. I'll update this post with any information that Shotgun gives me about it.

Posted . Categories: .