Jun 26, 2009

Pitfalls of lazy evaluation in Django's ORM

It's well documented that query sets in Django's ORM are lazily evaluated. The database is not accessed until you actually reference query set results. For example:
    my_set = MyModel.objects.all() # No database access
print my_set[0] # DB accessed and model instance created
The thing one needs to be aware of here is that any access to the result array will always cause a new database access and object creation for model instances:
    my_set = MyModel.objects.all() # No database access
print my_set[0] # DB access and model instantiation...
print my_set[0] # ... and the same again!
It is not uncommon to retrieve a set of results via a query set and then perform some operations on them, for example in a loop. It is tempting to view query sets just as if they were arrays - lists of in-memory objects - and use them in a similar manner. In-memory arrays tend to be quite efficient, query sets not so much: If you have to run over such a loop several times, you will end up retrieving the same object from the database multiple times as well.

If you need to iterate over the result set more than once - or just access the same element a few times - it is therefore more efficient to make copies of your individual results. That way, the query and the creation of model instance objects is done only once:
    my_set = [ m for m in MyModel.objects.all() ] # DB accessed for all
print my_set[0] # No DB access!
In effect, the list you created in the last code snippet becomes a cache for your query results. You could also have just assigned a single result-set element to a variable for the same effect.

Fortunately, when passing a query set to a template it appears as if this caching is taken care of automatically for us. Making copies of query set results therefore is important when more complex operations are necessary on result sets within the view function.


You should follow me on twitter here.

No comments:

Post a Comment