Brendel Consulting

Apr 18, 2012

Random primary keys for Django models

The automatically generated primary keys for Django models tend to be just sequentially increasing integer numbers. If you expose those IDs at any point to your users, for example in the URLs referring to particular IDs, then your users can oftentimes easily guess how many users your site has, or how many objects of a certain kind your database holds.

You could follow the approach of never exposing those primary keys to the user, and always use an additional, randomly generated key instead whenever a user needs to refer to an object in your database. However, you then need to implement this random key creation, need to maintain two fields, etc. It would be nice if we didn't have to worry about this and could just use the primary keys of our models to refer to them, internally as well as externally.

For this purpose, I have created a new base class, which you can use instead of models.Model when you create your own Django models. This base class is called RandomPrimaryIdModel . With this base class, your primary IDs will start to look random, similar to what you know from URL shorteners.

Here's an example. Let's say you have defined a Django model like this (in the example, the only change you have to make to your normal model definition is to replace the models.Model base class with RandomPrimaryIdModel):

from random_primary import RandomPrimaryIdModel

class MyModel(RandomPrimaryIdModel):
# Normally define your Django model
...

for i in xrange(3):
m = MyModel(... parameters ...).save()
print m.id

As output you might get something like this:

Q68mfU
zjvsx3
VNuL0Lp

You can tune the key length as well as the characters that are used to construct the key. The docstring of the class is pretty extensive, so please have a look.

The code for the new model base class is free to use for anyone and can be found in this Github repository.

Hopefully, this can be useful to you. I'd welcome any feedback or comment.

Labels: django, models, primary keys, python

Aug 3, 2010

Easy to use, flexible HTTP access method in Python

Python comes with 'batteries included' as they like to say. This means that libraries for most tasks you need to tackle are already part of a standard Python install. In order to issue HTTP requests, the most common libraries mentioned are urllib and urllib2. They offer the very convenient urlopen() function, which make it a snap to quickly retrieve online resources.

One thing that always bothered me, however, was the inflexibility of those methods. Want to use PUT instead of POST? Out of luck. Need authentication? Deal with pesky URLopener classes. Want timeouts for your requests? Use global socket options or use Python 2.6. Want to specify some custom headers? Deal with more classes. And so on.

All I want is a simple function like urlopen(), which can do all of these things for me and which also works on Python 2.5 (I'm often using Jython, which still is at 2.5). Of course, there is always the low level httplib, on top of which the two urllibs actually build. And while using that library requires a few more steps, you can use it to write yourself your powerful, simple version of urlopen().

So, for RESTx - an open source project to simply and easily create RESTful web services - I wrapped all the functionality I wanted into a convenient, compact function that uses httplib directly. Currently, it only supports HTTP's basic authentication, but hopefully that can be extended at some point in the future.

Here it is then for your enjoyment. Let me know if you think this is useful. Oh, and while you are here, please follow me on Twitter.

Update: Several people pointed out httplib2 to me, which offers a lot of the convenience I was looking for. However, it is not available in Jython, or Python 2.5 in general.


import httplib
import urllib
import socket
import urlparse

def http_access(method, url, data=None, headers=None, timeout=None, credentials=None):
"""
Access an HTTP resource with GET or POST.

@param method:         The method for the HTTP request: GET, POST, etc.
@type method:          string

@param url:            The URL to access.
@type url:             string

@param data:           If present it specifies the data for a POST or PUT request.
@type data:            Data to be sent or None.

@param headers:        A dictionary of additional HTTP request headers or None.
@type headers:         dict

@param timeout:        Timeout for the request in seconds, or None. Specified as
                      floating point value, so you can set sub-second timeouts.
@type timeout:         float

@param credentials:    A username/password tuple to support basic HTTP authentication.
@type credentials:     tuple

@return:               Code and response data tuple.
@rtype:                tuple

"""
(scheme, host_port, path, params, query, fragment) = urlparse.urlparse(url)
allpath     = url[url.index(host_port)+len(host_port):]
host, port  = urllib.splitport(host_port)

if not headers:
   headers = dict()

if credentials:
   accountname, password = credentials
   headers["Authorization"] = "Basic " + base64.encodestring('%s:%s' % (accountname, password))[:-1]

if scheme == 'https':
   conn = httplib.HTTPSConnection(host, port)
else:
   conn = httplib.HTTPConnection(host, port)

conn.request(method, allpath, data, headers)
conn.sock.settimeout(timeout)

try:
   resp   = conn.getresponse()
   code   = resp.status
   data   = resp.read()
except socket.timeout, e:
   return 408, httplib.responses[408]

return code, data



if __name__ == '__main__':
# Some usage example: Getting data in JSON format from a ficticious server with
# a timeout of 0.5 seconds.
status, data = http_access("GET", "http://localhost:8001", data = None,
                          headers = { "Accept" : "application/json" }, timeout=0.5 )

print "@@@ HTTP response code: ", status
print "@@@ Received data:      ", data

Labels: httplib, jython, open source, python, REST, restx, timeout, urllib

Jul 1, 2010

A new, simpler way to create RESTful resources

Lately, I have been working on a new project called RESTx. A new, simpler way to do data integration and data publishing. Fully open source and licensed under GPLv3, fully RESTful, super simple to install and use. The download is small - just over 200k - and installation is deliberately quick and straight forward; a single command sets it all up for you. So I would like to invite you to have a look and check it out. I'd love to get your feedback and opinion on it.

What is it about?

RESTx gives developers the ability to quickly write custom data access and integration logic as components. Writing those components is very simple, the API is compact to the point where you could probably explain it in just 5 minutes. At the same time it gives you all the freedom you'd get in a custom program or script. You can even chose the language for writing components (Java and Python are currently supported, more to come).

Any component parameters are exposed, so that users can post new parameter sets to the component in order to create a new RESTful resource or RESTful web service in just seconds, simply by filling out a form in a browser. The RESTful resources are then accessed via a simple URL, suitable for access by end users in a browser, sharing, building bocks for mashups, and so on. The ability to create RESTful resources without any coding allows users to quickly build their own data sources without having to wait for IT to provide them.

There are a few key ideas behind RESTx, and they mostly center around being nice and simple:

Be nice to developers

Why fiddle with 300 lines of obscure XML if you can express the same idea in just 3 lines of concise, easy to read code? This is one of the core concepts for developers: Convention over configuration and sane defaults. If you need custom integration or data access logic, it will be much quicker for you to create a new component and express this concisely in a language you know, rather than work your way through yet another XML dialect.

For component development, RESTx currently supports Java and Python, more languages to be added soon. Writing custom data integration and access code is absolutely simple. The entire API can easily be explained in 5 minutes. If you think proper frameworks are too heavy and think you're faster just with a quick adhoc hack... don't! RESTx is simpler and quicker than adhoc hacking, but gives you all the advantages of a proper platform.

Be nice to users

Allow users to create their own RESTful data resource without coding by sending a new set of parameters to components on the RESTx server. That turns into an easy to share and use URL, totally hiding the component and parameters.

Users can discover all the available components (for which they can provide parameters to create a resource) and all existing resources (which they can use), simply by following links. Components and resources explain themselves: Each of them has human readable documentation strings and all their parameters and provided services can be discovered.

Be nice to search engines, web browsers and client applications

Everything on the server (components and resources) can be discovered just by following links. You get human AND machine readable descriptions, which is great for discovery by search engines.

RESTx automatically represents data in a way that matches the client request. For example, if you access a resource via a web browser, you get the information in HTML. If you access it with another client application, you can request to see the very same data in JSON, for example. In fact, all interactions with the server take place via a simple RESTful API, which allows the usage as well as creation of resources via simple, JSON encoded requests.

A few links you might be interested in:

My blog post, announcing the RESTx release with a bit more information.
A quick start guide, if you want to be up and running in 5 minutes.
A few words about RESTful resources.

Please enjoy! Hopefully, you will find this project useful. I am looking forward to your feedback.

Labels: api, foss, java, open source, python, REST, RESTful, restx, simplicity

Mar 30, 2010

Mixed Jython/Python development and Google AppEngine oddities

Recently, I worked on a Python project, which I developed as a stand-alone WSGI application. I also experimented with running this app in Jython. In case you don't know: Jython is a neat Python implementation, which runs in the Java VM. The advantage of this is that you have access to all the usual Java infrastructure, framework and libraries, while being able to rapidly develop your program in the Python language. I ended up designing my application so that it could run both as Python as well as Jython project.

Shortly afterwards, I moved my WSGI application to Google's AppEngine, which is is a hosted environment for Python and Java apps. AppEngine can run pure WSGI applications quite easily, so this shouldn't be too much of a problem. Using Google's local development server (dev_appserver.py), I was able to run my application on my desktop in the emulated Google environment without problem.

I then used the provided command line utility (appcfg.py) to move my app to Google's actual servers... and nothing worked. I received a "500 Internal Server Error". Looking at the logs I found that my application was not able to import any of my own modules. While standard library items were imported without an issue, any module I had provided myself resulted in an import error.

What had happened? It took me a long time to figure this out, so I thought I share it with you: When running the Python application via Jython, a bunch of .class files were created, which is normal. However, when the application was uploaded via appcfg.py, those class files were uploaded as well (files ending in *.pyc and a few other extensions are ignored, but apparently anything else is considered to be part of your application).

So then, for reasons that aren't quite clear, the presence of .class files in your Python application seems to be really confusing for Google AppEngine.

The lesson? If you are developing an application that can run locally in Jython as well as on Google's AppEngine, make sure to clear out your *.class files before uploading. It can save you hours of grief.

Oh, and by the way, you should follow me on twitter here.

Labels: appengine, google, jython, python, wsgi

Jul 4, 2009

Setting the initial value for Django's ModelChoiceField

Recently, I worked with a Django form that utilized the ModelChoiceField. That is a convenient field that normally is rendered as an HTML select tag, which usually appears as a drop-down menu on a web-page.

A ModelChoiceField is specified as:

    class MyForm(forms.Form):
        my_field = forms.ModelChoiceField(queryset = MyModel.objects.all())

As you can see, this field is used to easily specify a drop-down for all items in a table (or whatever the queryset specifies). The model is represented as the output of its __unicode__() function. If you evaluate the form after it was posted, the value for the field is going to be an instance for the actual model whose unicode representation was selected.

The problems for me started when I tried to set an initial value for the field: In an 'edit' form, I wanted all fields to reflect whatever had been saved for a particular model instance, of course. As I said above, if you evaluate the form after it has been submitted you get an actual model instance. Naturally, you would think that initial values would be set in a symmetric manner, by specifying a model instance:

    form = MyForm(initial = { 'my_field' : some_model_instance })

Sadly, this doesn't work. And try as I might, I couldn't find an answer to this on the Internet either (more on that in a moment). So, after looking at the Django code, it finally dawned on me that you need to specify the ID of the model instance as the initial value:

    form = MyForm(initial = { 'my_field' : some_model_instance.id })

That works now. It's a bit unfortunate that the retrieval value (a model instance) and the initial value (the ID of a model instance) are of a different type. It's an inconsistency in the Django API, I think. But in the end, I probably should have at least tried that one a bit sooner.

The surprising thing is that I couldn't find any discussion of this anywhere on the Internet. I should mention here that I am using Yahoo as my default search engine. Shortly after I finally found the solution, it occured to me to try Google. And wouldn't you know it? Right there, third hit from the top, I had the answer.

Why did Yahoo not give me this result? Well, the answer was discussed on Google Groups. Is Yahoo not indexing those? Or is Google not letting them index it?

Either way, that small issue between Yahoo and Google cost me a few hours of frustration. So, I'm posting the solution here on a non-Google page, so that Yahoo users may also find the answer to that problem in the future.

You should follow me on twitter here.

Labels: api, django, forms, google, initial, ModelChoiceField, orm, python, yahoo

Jun 1, 2009

How to do "count" with "group by" in Django

Faced with the 'age old' problem of having to do a group by query in Django, I finally came across a solution.In short, you use the lower-level query interface directly:

    q = MyModel.objects.all()
    q.query.group_by = ['field_name']

When you then iterate over 'q', you get the results grouped by the value of whatever was in 'field_name'. Great! So far, so good.

Well, a word of caution at this point: Using the low-level query API is not really something Django wants you to do, apparently. In fact, try this with sqlite and it works. Try it with PostgreSQL and it does not (all sorts of error messages). So, your mileage may vary, depending on which database you are using. Ok, let's assume that your database is fine with this...

The only challenge for me now was that I had to do a count for this.
If you form your query using the usual methods of Django's ORM, you will be disappointed. For example, this here will not work:

    q = MyModel.objects.all()
    q.query.group_by = ['field_name']
    q.count()

It appears as if this returns only the count of the first group, not a list of counts, as you would expect.

The solution is to wade a bit deeper into the low-level query API. We can instruct the query to add a count-column. In fact, this results in merely a single column being returned, just like COUNT(*). It goes like this:

    q = MyModel.objects.all()
    q.query.group_by = ['field_name']
    q.query.add_count_column()

Since this returns counts, rather than complete objects, we now need to get the individual group counts as a list of values. We add one more line:

    q.values_list('id')

The values_list() function gives you not instantiated objects but instead the tuples of values for each object. The tuples contain only the fields you specify by name in the call to values_list(). Except, we have manually added the count column with the add_count_column() function. That count column is always returned first. So, what you get as result is something like this:

    [ (3,1), (19,8), ... ]

The first value of each tuple is the count, the second value is the value of the 'id' field of the last occurrence of the grouped model in the table. If you specify something other than 'id', you get the same thing: First the count and then the value of that other field.

But that's not what we want, right? We want a list of counts. We could manually extract the first element in each tuple, but Django offers us a shortcut:

    q.values_list('id', flat=True)

Setting flat=True tells the values_list() function to just return the first element of each tuple in a plain list (not a list of tuples). And since the first element now is the count column, we finally get what we want: A list of the counts for each group.

Note that we could have specified any other fields in the call to values_list(), not just 'id'. Because we specified flat those fields are ignored. It seems, though, that at least one field needs to be specified here, even though it won't be considered as part of the output.

You should follow me on twitter here.

Labels: count, django, group by, python

May 30, 2009

Retrieving full objects with custom SQL in Django

While working with Django, I recently had to retrieve some model objects via custom SQL. Looking for examples on how to do this, I noticed that most tutorials describe how to retrieve specific values via SQL, but not entire model objects. The solution is actually very simple. But since I couldn't find it described anywhere, I thought I share it here.

The usual approach goes something like this:

   from django.db import connection, models
   class CustomManager(models.Manager):
       ...
       query = "SELECT id FROM mymodel WHERE ..."
       cursor = connection.cursor()
       cursor.execute(query)
       return [i[0] for i in cursor.fetchall()]

This then returns the IDs of the objects that were selected. If you want to return actual objects, you can modify the last line into this:

    return self.filter(id in [i[0] for i in cursor.fetchall()])

However, the problem with this approach is that we are executing two queries now: The custom SQL query, followed by the self.filter() query.

Here then is a very simple way to get complete objects, with just a single, custom SQL query:

   class CustomManager(models.Manager):
       ...
       query = "SELECT * FROM mymodel WHERE ..."
       cursor = connection.cursor()
       cursor.execute(query)
       return [MyModel(*i) for i in cursor.fetchall()]

Rather than making a list of IDs, which is then used to query for the actual objects, we make a list of the complete objects. We can do that, because we changed the custom SQL: Instead of selecting just the ID column, we are now selecting all columns (SELECT * ...).

With custom SQL queries we get a list for each row. Fortunately, the order of columns in each row reflect the order of arguments to the model's __init__() function. As a result, we can use each row's list representation as positional arguments for __init__().

You should follow me on twitter here.

Labels: custom manager, custom sql, django, python, sql