The basics of creating a tumblelog with Django

On my new homepage, a combined list of my tweets, bookmarks, and user comments on my site appear underneath my latest blog entries. I use a fun bit of Django code to pull these various items together, sorted by publication date, and I’d like to share how this bit of tumblelog-like functionality works. The basic concept is this:

Every time an object of type A, B, or C is created, create an object of type D that does nothing but point to an A, B, or C and keep track of its publication date.

In my case, A, B, and C are Bookmark (from Del.icio.us), Status (from Twitter), and FreeComment(from Django). D is an object I call a Stream Item. The basic Django model for this is below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from django.db import models
from django.contrib.contenttypes import generic
from django.template.loader import render_to_string

class StreamItem(models.Model):
	content_type = models.ForeignKey(ContentType)
	object_id = models.PositiveIntegerField()
	pub_date = models.DateTimeField()
	
	content_object = generic.GenericForeignKey('content_type', 'object_id')

	def get_rendered_html(self):
		template_name = 'blog/includes/stream_item_%s.html' % (self.content_type.name)
		return render_to_string(template_name, { 'object': self.content_object })

Django’s ContentTypes framework allows us to choose what other model we are pointing to, and what the ID of the specific instance of that model is. The GenericForeignKey allows us to retrieve that object just like we would with a foreign key to a known model. We store the pub_date in the StreamItem as well, since objects we point to may use a different field name and we want to have a consistent field by which to sort. The get_rendered_html method simply passes the retrieved object to a template with that object’s name.

Now that the model is defined, a StreamItem has to be created whenever a Bookmark, Status, or FreeComment is created. Conveniently, Django sends a ‘signal’ after saving any object, known as post_save, along with an argument ‘created’ that returns True if the object was just created for the first time. (Signals are sent at other times, as well. See signals documentation). The idea is this:

Create a function that saves a new StreamItem. Invoke this function whenever the post_save signal is sent from a Bookmark, Status, or FreeComment object. The basic function looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
from django.db.models import signals
from django.contrib.contenttypes.models import ContentType
from django.dispatch import dispatcher
from appname.models import Bookmark, Status
from comments.models import FreeComment

def create_stream_item(sender, instance, signal, *args, **kwargs):
	# Check to see if the object was just created for the first time
	if 'created' in kwargs:
		if kwargs['created']:
			create = True
			
			# Get the instance's content type
			ctype = ContentType.objects.get_for_model(instance)

			# Special cases for different date fields
			if ctype.name == 'free comment':
				pub_date = instance.submit_date
				
			elif ctype.name == 'bookmark':
				pub_date = instance.time
				
			else:
				pub_date = instance.pub_date
				
			# Special case for FreeComments to ensure the comment is public
			# This prevents comments in moderation or thought to be spam from appearing
			if ctype.name == 'free comment':
				if instance.is_public == False:
					create = False
			
			if create:
				si = StreamItem.objects.get_or_create(content_type=ctype, object_id=instance.id, pub_date=pub_date)
				
# Send a signal on post_save for each of these models
for modelname in [Status, Bookmark, FreeComment]:		
	dispatcher.connect(create_stream_item, signal=signals.post_save, sender=modelname)

First, check to see if the object was just created for the first time. If so, indicate in which field that object’s model stores its publication date. Perform any other checks to make sure you want to create the StreamItem. Then create the StreamItem.

Now, a list of all these objects can be accessed by any view by using StreamItem.objects.all().

Djangosnippets can be of assistance in helping you pull in feeds from Del.icio.us, Twitter, or any other service you’d like to use, as can the Python libraries pydelicious and Python Twitter.

This is my first attempt at a code walk-thru, so please leave a comment if anything is not explained clearly. I don’t doubt there are more efficient ways to accomplish this same goal, so feel free to share those, too.

There is now a second part“) that explains how these items actually show up on the homepage.


Discussion
Link to this comment

Jacob's "jellyroll" tumblelog code does the same thing. And, really, I think a model representing a published item, with a generic foreign key to the actual item, is probably the best way to go, for a few reasons:

  1. It vastly simplifies the mechanics of processing and displaying the list of items, because the "item" model can normalize bits of metadata that otherwise vary from model to model (e.g., "which field is the publication date for this type of content?" questions).
  2. It makes the initial query for the list of items extremely simple, because you can do that in one query against one table.
  3. Although fetching the actual objects in the stream involves a lot of DB activity (it's an "N+1 queries" situation), you at least get a little help from the way the ContentType model caches certain types of lookups under the hood.
June 24th 2008, 4:50 a.m. by James Bennett
Link to this comment

So, if I understand correctly, in order to get a a function like the one above that creates the index of your del.icio.us / twitter / comments, it would be saved inside of your tumble app in a file named management.py (I got this from http://www.b-list.org/weblog/2006/sep/10/django-tips-laying-out-application/) but how do I tell Django that mangement.py exists?

Thanks for the great write up!

June 24th 2008, 7:39 a.m. by Joshua Blount
Link to this comment

I went with a slightly different approach to solve the same problem. Instead of a GenericForeignKey in your StreamItem model, I used a Timestamp (your StreamItem) model that other models have a ForeignKey relationship with.

Using introspection, I have a method in the Timestamp model for adding a Timestamp to any other model:

def add_stamp(self, object_model, stamp_to_set=datetime.now(), overwrite=False): """ Add a timestamp to the passed model * Object must be saved before accessing """

  # get the class name of the passed object
  pattern = re.compile(r'\.(?P<class>\w+)\'>')
  class_name = pattern.search("%s" % object_model.__class__).group('class').lower

  # add the stamp
  stamp, created = Timestamp.objects.get_or_create(date_added=stamp_to_set)
  set_method = getattr(stamp, "%s_set" % class_name)
  id_attr = getattr(object_model, "id")
  does_not_exist = getattr(object_model, "DoesNotExist")

  try:
     set_method.get(id__exact=id_attr)
     if overwrite == True:
        # add the timestamp
        set_method.add(object_model)
        stamp.save()
  except does_not_exist:
     pass

This way I have no hard coded links from my Timestamp model to any other model. Adding new models to this setup is as easy as: 1) providing a ForeignKey to Timestamp 2) overriding the save() method to:

def save(self): # get an id before add a many-to-many relationship super(WhateverModel, self).save()

  # create a timestamp if one doesn't exist
  Timestamp().add_stamp(self)

Querying objects to create a timeline is still very simple. Just pull Timestamp.objects().all() with whatever criteria you're interested in.

I took this approach because at the time (~5 months ago) it seemed the most lightweight with respect to maintenance. I'll fully admit I wasn't aware of the GenericForeignKey approach at all. I'm not sure which approach would be more robust but the one I took has worked reliably for me.

June 24th 2008, 8:48 a.m. by Mark
Link to this comment

I went with a flatter denormalized model with Feedclowd because it appears to me that the GenericForeignKey relation is heavy on the database. I haven't verified this is true, but I'd be surprised if select_related works with it.

Picture this, you are looping over 20 StreamItem objects, whenever you need to get the content_object to display it, you have to do a query. On small sites, that kind of behavior is forgivable but it doesn't scale well. So you're forced to band-aid the problem with memcache.

Feedclowd, because it's based on RSS/Atom, basically has a model that resembles a RSS entry and that's it. There's no relation to other data.

June 24th 2008, 9:36 a.m. by Eric Moritz
Link to this comment

Joshua, all of the above code lives in models.py for an application called "stream." The Bookmark and Status objects it references live in models.py for an app simply called, "blog."

June 24th 2008, 10:16 a.m. by Ryan Berg
Link to this comment

I'm interested how you go about rendering the different pieces of data in reverse chronologic order (i.e. the front page). Is the template just a huge ifequals to differentiate between different pieces of a tumblelog? I'm thinking you could maybe pass a variable to {% include %} but I'm not sure that's possible.

June 24th 2008, 11:10 a.m. by Jökull
Link to this comment

Jokull, I left that out of the tutorial for fear of it making the model confusing. I've added it back into the code above with an explanation. Basically, on the StreamItem model, I have a method that passes the object to a template using that object's content type. On the homepage, I pass the template StreamItem.objects.all().order_by('-pub_date')[:10] to get the 10 newest stream items. The template then loops through each stream item and outputs stream_item.get_rendered_html

June 24th 2008, 11:25 a.m. by Ryan Berg
Link to this comment

Wouldn't this be the sort of thing you can use model inheritance for?

You could have some sort of StreamObject, let's call it a Floatable and all objects that you want in your stream like a Duck or a Twig could inherit Floatable. Unless of course you want it completely meta, ie the freedom to make new things Floatable without changing the database; which is kind of a nice aim but I do worry about database performance.

June 25th 2008, 6:51 p.m. by Andrew Ingram
Link to this comment

Andrew, that's certainly an interesting idea. My code originates from last fall, so model inheritance wasn't exactly available at the time if I remember correctly.

In an ideal world I can see that being a good solution. But what about objects that you don't have control of on your own to add that inheritance? Like a comment from contrib.comments, or something from a third party library you'd rather not modify? Would there still be a way to register those objects with your Floatable system?

June 25th 2008, 11:24 p.m. by Ryan Berg
Link to this comment

I don't think you'd be able to do it without monkey patching which I'm not too keen on, so that's an obvious drawback. It depends what level of flexibility you need I guess.

June 26th 2008, 4:06 a.m. by Andrew Ingram
Link to this comment

I don't think you'd need to resort to monkey patching. The docs suggest that all you need to do for multi table model inheritance is extend the base class from the child class:

class Place(models.Model):
    name = models.CharField(max_length=50)
    address = models.CharField(max_length=80)

class Restaurant(Place):
    serves_hot_dogs = models.BooleanField()
    serves_pizza = models.BooleanField()
June 26th 2008, 2:11 p.m. by James Wheare
Link to this comment

The impression I get in this situation is that our StreamItem, or whatever you want to call it, would have to be the base class. Right?

If so, how can I then get the FreeComment class in django.contrib.comments.models to inherit from StreamItem? It's set to take models.Model.

June 26th 2008, 2:23 p.m. by Ryan Berg
Link to this comment

Good stuff. Now if only there was a way to get the request.user sent from the signal to the callback.

June 26th 2008, 6:33 p.m. by Grant
Link to this comment

Ah right I see yeah that makes a lot more sense. In that case, multiple inheritance looks like it would do the trick

class CommentStreamItem(StreamItem, FreeComment):
    ...

Be aware of the standard(http://docs.python.org/tut/node11.html#SECTION0011510000000000000000)name resolution[caveats] though.

June 26th 2008, 6:34 p.m. by James Wheare
Link to this comment

Sorry to bring up an old entry. I was wondering where you put the second code chunk? I wouldn't think you would put it in it's own view.

June 28th 2008, 7:10 p.m. by Jason Broyles
Link to this comment

Sorry I didn't make that clear in the entry. In my setup, both snippets are in models.py.

June 29th 2008, 3:23 p.m. by Ryan Berg
Link to this comment

Just wanted to let you know of an alternate way to accomplish this stuff, using friendfeed. I don't have anything to do with it -- it just struck me as an elegant solution worth pointing out.

Thanks for this lovely and informative blog! :)

July 1st 2008, 1:46 p.m. by Idan Gazit

Comments are disabled for this item