28

I have the following models:

class City(models.Model):
    ...

class Census(models.Model):
    city = models.ForeignKey(City)
    date = models.DateTimeField()
    value = models.BigIntegerField()

Now I'd like to annotate a City-queryset with the value of the latest Census. How do I achieve that?

I have tried:

City.objects.annotate(population=Max('census__date'))
# --> annotates date and not value

City.objects.annotate(population=Max('census__value'))
# --> annotates highest value, not latest

City.objects.annotate(population=
    Case(
        When(
            census__date=Max('census__date'),
            then='census__value')
        )
    )

# --> annotates 'None'

City.objects.annotate(population=
    Case(
        When(
            census__date=Max('census__date'),
            then='census__value')
        ), output_field=BigIntegerField()
    )

# --> takes forever (not sure what happens at the end, after some minutes I stopped waiting)

Any help greatly appreciated!

Se Norm
  • 1,515
  • 4
  • 23
  • 37
  • 1) Performance matters or is just an offline sporadic report ? 2) Should be database brand agnostic? – dani herrera Jul 08 '15 at 15:09
  • 1) Yes, performance matters, some performance compromise for beauty is acceptable though. 2) Yes, should at least work with MySQL and PostgreSQL. – Se Norm Jul 08 '15 at 21:29
  • Newer Django versions (>1.8) solution here: https://stackoverflow.com/questions/43775102/annotate-a-queryset-with-the-latest-value-from-a-filtered-join#43883397 – Risadinha Oct 04 '17 at 10:56

4 Answers4

11

I've also been having an issue where I need a max value object of a related set, but I require the entire object. I have not been able to figure out a solution using annotation and Case. Until I do, I use this prefetching solution. If each city does not have a large amount of census objects, or if your application is not performance bound, this may work for you.

inner_qs = Census.objects.order_by('-date')
cities = City.objects.prefetch_related(Prefetch("census_set", queryset=inner_qs, to_attr="census_list"))

class City(models.Model):
    @property
    def latest_census(self):
        if hasattr(self, 'census_list') and len(self.census_list) > 0:
            return self.census_list[0]
        return None

If this does not work for you, consider some of the suggestions found here: http://blog.roseman.org.uk/2010/08/14/getting-related-item-aggregate/

Mark Galloway
  • 3,460
  • 14
  • 23
  • Thanks Mark, that looks like a good workaround. Unfortunately I do have a lot of census objects per city, but maybe I can limit the Prefetch qs like this: inner_qs = Census.objects.order_by('-date')[:1] to still make it efficient. I can't test it right now, but will try it as soon as possible. – Se Norm Jul 07 '15 at 11:37
  • 1
    Unfortunately, limiting like this doesn't work (fails with "Cannot filter a query once a slice has been taken.") - I will start a bounty and see if someone comes up with a better solution. – Se Norm Jul 07 '15 at 23:29
5

At this moment, they are not a django query expression to annotate a not aggregated field from a related 1:N model based on a sql having expression.

You can accomplish it with several workarounds like split query and work with data in memory ( itertools groupby f.e. ) or through raw queries. But this will not match your requirements performance and database agnostic.

I explain here what I will do if this was my app. For developers is hard to have redundancy in database. In your scenario, last census by city is a calculated field ... but, in this case, take in consideration to materialize last_census:

The dirty work ...

class City(models.Model):
    last_census = models.ForeignKey('Census', null=True, 
                                     blank=True, editable=False)
    ...

For easy maintenance, you can overrite save and delete methods on Census to keep last_census up to date.

class Census(models.Model):
    ...

    #overriding save
    def save(self, *args, **kwargs):
        super(Census, self).save(*args, **kwargs)
        #updating last_census on referenced city
        max_census = Census.objects.filter( city = self.city ).latest('date')
        self.city.last_census = max_census.city if max_census else None
        self.city.save()

    #overriding delete
    def delete(self, *args, **kwargs):
        #updating last_census on referenced city
        max_census = ( Census.objects
                      .filter( city = self.city )
                      .exclude( id = self.id )
                      .latest('date') )
        self.city.last_census = max_census.city if max_census else None
        self.city.save()
        super(Census, self).delete(*args, **kwargs)

Notice: if you are more comfortable, you can code it with signals ( pre_delete, post_save, ... ) instead overriding methods.

The best ...

Your query now:

City.objects.select_related('last_census__value').all()
dani herrera
  • 39,746
  • 4
  • 87
  • 153
  • Is there any way to avoid race conditions with this solution? – Mark Galloway Jul 09 '15 at 09:32
  • What you means? Explain. Be free to illustrate with a sample. – dani herrera Jul 09 '15 at 09:33
  • My application will have many users adding and removing 'Census' objects at the same time. I need to make sure this redundant field is accurate, as the users will notice if it is not. – Mark Galloway Jul 09 '15 at 09:46
  • You should learn about database transactions, [isolation levels](https://en.wikipedia.org/wiki/Isolation_(database_systems)) and [django transaction control management](https://docs.djangoproject.com/en/1.8/topics/db/transactions/). – dani herrera Jul 09 '15 at 09:53
2

Something like this may work for you:

I have this to show last reservation for restaurants

Reservation.objects.filter(
        restaurant__city_id__gt=0
        ).distinct().annotate(city_count=Count(
        'restaurant_id')
        ).order_by('-reservationdate').filter(city_count__gte=1)[:20]

in your case it may be something like:

city = Census.objects.filter(
        city_id__gt=0
        ).distinct().annotate(city_count=Count(
        'city_id')
        ).order_by('-date').filter(city_count__gte=1)[:20]

and your html

{% for city in city %}
{{ city.city.name }} {{ city.date }}<br>
{% endfor %}
WayBehind
  • 1,445
  • 2
  • 34
  • 48
  • I like the idea, but it has to be a City QuerySet. – Se Norm Jul 08 '15 at 21:31
  • Thats the trick, you dont actually query the `city` at least not in the `Django ORM` eyes. You query the `Census` and display the `City` name in the results with `{{ census.city.name }}`. The `city` is actually inside your `census` tables as `city_id`. So you query that and display the city name as census.city.name – WayBehind Jul 08 '15 at 21:48
  • Sorry, I wasn't clear. Yes, I understood that, but I actually need it to be a City-QuerySet. – Se Norm Jul 08 '15 at 21:58
  • What results are you actually looking for? Just asking because `Django ORM` does the `join` for you and you can display all and any data from the `City` table through the `census` QuerySet. There is no difference ... – WayBehind Jul 08 '15 at 22:07
2

It's late and I'm getting hungry so I see won't this all the way through*, but the following code will return the latest census value in a valueset for a given city. A queryset might be possible, but again I am hungry!

*I wasn't sure if you needed to retrieve all of the latest values for all cities or a specific city. The latter is pretty easy, the former is a bit harder. You could easily put this as a method on your model and call it on every city in a loop inside of a template/view.

Hope this helps!

from app.models import *
from django.db.models import F

    City.objects.annotate(
        values=F("census__value"),
        date=F("census__date"))\
        .values('values', 'name').filter(name="Atlanta")\
        .latest('date')
ryechus
  • 529
  • 4
  • 10