Brendel Consulting logo

Brendel Consulting

composing beautiful software solutions

Jun 1, 2009

How to do "count" with "group by" in Django

Faced with the 'age old' problem of having to do a group by query in Django, I finally came across a solution.In short, you use the lower-level query interface directly:
    q = MyModel.objects.all()
q.query.group_by = ['field_name']
When you then iterate over 'q', you get the results grouped by the value of whatever was in 'field_name'. Great! So far, so good.

Well, a word of caution at this point: Using the low-level query API is not really something Django wants you to do, apparently. In fact, try this with sqlite and it works. Try it with PostgreSQL and it does not (all sorts of error messages). So, your mileage may vary, depending on which database you are using. Ok, let's assume that your database is fine with this...

The only challenge for me now was that I had to do a count for this.
If you form your query using the usual methods of Django's ORM, you will be disappointed. For example, this here will not work:
    q = MyModel.objects.all()
q.query.group_by = ['field_name']
q.count()
It appears as if this returns only the count of the first group, not a list of counts, as you would expect.

The solution is to wade a bit deeper into the low-level query API. We can instruct the query to add a count-column. In fact, this results in merely a single column being returned, just like COUNT(*). It goes like this:
    q = MyModel.objects.all()
q.query.group_by = ['field_name']
q.query.add_count_column()
Since this returns counts, rather than complete objects, we now need to get the individual group counts as a list of values. We add one more line:
    q.values_list('id')
The values_list() function gives you not instantiated objects but instead the tuples of values for each object. The tuples contain only the fields you specify by name in the call to values_list(). Except, we have manually added the count column with the add_count_column() function. That count column is always returned first. So, what you get as result is something like this:
    [ (3,1), (19,8), ... ]
The first value of each tuple is the count, the second value is the value of the 'id' field of the last occurrence of the grouped model in the table. If you specify something other than 'id', you get the same thing: First the count and then the value of that other field.

But that's not what we want, right? We want a list of counts. We could manually extract the first element in each tuple, but Django offers us a shortcut:
    q.values_list('id', flat=True)
Setting flat=True tells the values_list() function to just return the first element of each tuple in a plain list (not a list of tuples). And since the first element now is the count column, we finally get what we want: A list of the counts for each group.

Note that we could have specified any other fields in the call to values_list(), not just 'id'. Because we specified flat those fields are ignored. It seems, though, that at least one field needs to be specified here, even though it won't be considered as part of the output.


You should follow me on twitter here.

Labels: , , ,

 
 

1 Comments:

  • Now, with Django 1.1, it is possible to use:

    MyModel.objects.values('field_name').annotate(count=Count('id'))

    which results in SQL clause:

    SELECT "myapp_mymodel"."field_name", COUNT("myapp_mymodel"."id") AS id__count FROM "myapp_mymodel" GROUP BY "myapp_mymodel"."field_name"

    By Blogger Unknown, At October 29, 2009 at 10:42 PM  

Post a Comment

Subscribe to Post Comments [Atom]



<< Home