Skip to content

Introduction

Provides Django model utilities for encouraging direct data access instead of unnecessary object overhead. Implemented through compatible method and operator extensions1 to QuerySets and Managers.

The primary motivation is the experiential observation that the active record pattern - specifically Model.save - is the root of all evil. The secondary goal is to provide a more intuitive data layer, similar to PyData projects such as pandas.

Usage: instantiate the custom manager in your models.

Updates

The Bad:

book = Book.objects.get(pk=pk)
book.rating = 5.0
book.save()

This example is ubiquitous and even encouraged in many django circles. It's also an epic fail:

  • Runs an unnecessary select query, as no fields need to be read.
  • Updates all fields instead of just the one needed.
  • Therefore also suffers from race conditions.
  • And is relatively verbose, without addressing errors yet.

The solution is relatively well-known, and endorsed by django's own docs, but remains under-utilized.

The Ugly:

Book.objects.filter(pk=pk).update(rating=5.0)

So why not provide syntactic support for the better approach? The Manager supports filtering by primary key, since that's so common. The QuerySet supports column updates.

The Good:

Book.objects[pk]['rating'] = 5.0

But one might posit...

  • "Isn't the encapsulation save provides worth it in principle?"
  • "Doesn't the new update_fields option fix this in practice?"
  • "What if the object is cached or has custom logic in the save method?"

No, no, and good luck with that.2 Consider a more realistic example which addresses these concerns.

The Bad:

try:
    book = Book.objects.get(pk=pk)
except Book.DoesNotExist:
    changed = False
else:
    changed = book.publisher != publisher
    if changed:
        book.publisher = publisher
        book.pubdate = today
        book.save(update_fields=['publisher', 'pubdate'])

This solves the most severe problem, though with more verbosity and still an unnecessary read.3 Note handling pubdate in the save implementation would only spare the caller one line of code. But the real problem is how to handle custom logic when update_fields isn't specified. There's no one obvious correct behavior, which is why projects like django-model-utils have to track the changes on the object itself.4

A better approach would be an update_publisher method which does all and only what is required. So what would such an implementation be? A straight-forward update won't work, yet only a minor tweak is needed.

The Ugly:

changed = Book.objects.filter(pk=pk).exclude(publisher=publisher) \ 
    .update(publisher=publisher, pubdate=today)

Now the update is only executed if necessary. And this can be generalized with a little inspiration from {get,update}_or_create.

The Good:

changed = Book.objects[pk].change({'pubdate': today}, publisher=publisher)

Selects

Direct column access has some of the clunkiest syntax: values_list(..., flat=True). QuerySets override __getitem__, as well as comparison operators for simple filters. Both are common syntax in panel data layers.

The Bad:

{book.pk: book.name for book in qs}

(book.name for book in qs.filter(name__isnull=False))

if qs.filter(author=author):

The Ugly:

dict(qs.values_list('pk', 'name'))

qs.exclude(name=None).values_list('name', flat=True)

if qs.filter(author=author).exists():

The Good:

dict(qs['pk', 'name'])

qs['name'] != None

if author in qs['author']:

Aggregation

Once accustomed to working with data values, a richer set of aggregations becomes possible. Again the method names mirror projects like pandas whenever applicable.

The Bad:

collections.Counter(book.author for book in qs)

sum(book.rating for book in qs) / len(qs)

counts = collections.Counter()
for book in qs:
    counts[book.author] += book.quantity

The Ugly:

dict(qs.values_list('author').annotate(model.Count('author')))

qs.aggregate(models.Avg('rating'))['rating__avg']

dict(qs.values_list('author').annotate(models.Sum('quantity')))

The Good:

dict(qs['author'].value_counts())

qs['rating'].mean()

dict(qs['quantity'].groupby('author').sum())

Expressions

F expressions are similarly extended to easily create Q, Func, and OrderBy objects. Note they can be used directly even without a custom manager.

The Bad:

(book for book in qs if book.author.startswith('A') or book.author.startswith('B'))

(book.title[:10] for book in qs)

for book in qs:
    book.rating += 1
    book.save()

The Ugly:

qs.filter(Q(author__startswith='A') | Q(author__startswith='B'))

qs.values_list(functions.Substr('title', 1, 10), flat=True)

qs.update(rating=models.F('rating') + 1)

The Good:

qs[F.any(map(F.author.startswith, 'AB'))]

qs[F.title[:10]]

qs['rating'] += 1

Conditionals

Annotations and updates with Case and When expressions. See also bulk_changed and bulk_change for efficient bulk operations on primary keys.

The Bad:

collections.Counter('low' if book.quantity < 10 else 'high' for book in qs).items()

for author, quantity in items:
    for book in qs.filter(author=author):
        book.quantity = quantity
        book.save()

The Ugly:

qs.values_list(models.Case(
    models.When(quantity__lt=10, then=models.Value('low')),
    models.When(quantity__gte=10, then=models.Value('high')),
    output_field=models.CharField(),
)).annotate(count=models.Count('*'))

cases = (models.When(author=author, then=models.Value(quantity)) for author, quantity in items)
qs.update(quantity=models.Case(*cases, default='quantity'))

The Good:

qs[{F.quantity < 10: 'low', F.quantity >= 10: 'high'}].value_counts()

qs['quantity'] = {F.author == author: quantity for author, quantity in items}

  1. The only incompatible changes are edge cases which aren't documented behavior, such as queryset comparison. 

  2. In the vast majority of instances of that idiom, the object is immediately discarded and no custom logic is necessary. Furthermore the dogma of a model knowing how to serialize itself doesn't inherently imply a single all-purpose instance method. Specialized classmethods or manager methods would be just as encapsulated. 

  3. Premature optimization? While debatable with respect to general object overhead, nothing good can come from running superfluous database queries. 

  4. Supporting update_fields with custom logic also results in complex conditionals, ironic given that OO methodology ostensibly favors separate methods over large switch statements.