Google App Engine Datastore presents interesting challenges for someone used to relationship databases. If you trying to make sense of somebody else's system with SQL backend, you look into databases, tables and query them to understand a database model which is often a key to understanding the whole system. Since Datastore is schemaless, this reverse engineering approach doesn't work. You can see the entities and their properties, but it is much harder to get a sense of their relationships.
Here's what I mean by that. Say, I want to convert this database to Datastore. There are multiple way to design entity models, but one which might be this:
from google.appengine.ext import ndb class Repo(ndb.Model): # Assume that repo name will be set in key pass class Commit(ndb.Model): # Datastore will autogenerate id, although sha could be used # as key as well sha = ndb.StringProperty() committer = ndb.StringProperty() message = ndb.StringProperty() class File(ndb.Model): file_name = ndb.StringProperty() url = ndb.StringProperty()
Now we can populate and query Datastore:
# Insert repo with a couple of commits my_repo = models.Repo(id='my_repo') my_repo.put() models.Commit( sha='7c087b5', committer='Washington Irving', message='Initial commit', parent=my_repo.key ).put() models.Commit( sha='77c087b5', committer='Joe Doe', message='Added readme', parent=my_repo.key ).put() # Find all commits for this repo ancestor_key = ndb.Key('Repo', 'my_repo') commits = models.Commit.query(ancestor=ancestor_key).fetch() self.assertEqual(2, len(commits)) # Find commit with a given sha and its parent (repo) commits = models.Commit.query().filter( models.Commit.sha == '77c087b5').fetch() self.assertEqual(1, len(commits)) self.assertEqual('my_repo', commits.key.parent().id())
This is how those two commits will look:
[Commit(key=Key('Repo', 'my_repo', 'Commit', 1), committer=u'Washington Irving', message=u'Initial commit', sha=u'7c087b5'), Commit(key=Key('Repo', 'my_repo', 'Commit', 2), committer=u'Joe Doe', message=u'Added readme', sha=u'77c087b5')]
What's interesting here is that relationship between Repo and Commit is established during entity creation but not definition, meaning that you would never understand their relationships just by looking into the models - there are no primary and foreign keys there. Which means I could come up with completely ridiculous example where Commit is a parent and Repo is a child, Datastore doesn't care:
# Insert commit with a couple of repos my_commit = models.Commit(sha='7c087b5') my_commit.put() models.Repo(id='my_repo', parent=my_commit.key).put() models.Repo(id='my_repo_2', parent=my_commit.key).put() ancestor_key = models.Commit.query().fetch().key repos = models.Repo.query(ancestor=ancestor_key).fetch() self.assertEqual(2, len(repos))
What also seems bizarre is that entities of both types will peacefully coexist - Commit entity with Repo parent and Commit entity with Repo child. Go figure how this is really supposed to work. Of course there are more ways to design such a model. There could be Commits ListProperty in Repo entity and / or Commit entity could have Repo Id(sort of foreign key). Such design would carry "schema" meaning, but it is only applicable in certain conditions since children entities will be embedded into parent ones. It all depends on how entities will be queried.
See code in my playground project.