Taming the beast that is the Django ORM - An introduction
The material this blog post was originally developed from was a bunch of slides used for a skill share presentation I gave at my workplace @ coreplan.io.
I have 3+ years of experience with Django, with it being the main framework that underpins the backend of CorePlanās main SaaS product. It is a mature, batteries included framework that has been around for a while now. One particular powerful yet dangerous feature of Django is the ORM. This is a Django specific ORM which cannot be separated from the rest of the framework. The other major python ORM is SQLAlchemy which can be used with other python web frameworks, but is an independent tool.
Below are some of the things that I have learned about the Django ORM, how it compares to raw SQL and gotchas that you should be aware of when using it.
What is an ORM (Object Relational Mapper)?
- Abstraction over SQL to interact with databases
Code -> SQL
Hole.objects.all()
ā¬ļø
SELECT * FROM drilling_hole;
Why use an ORM? - Pros
- Abstraction over SQL, no need to write raw SQL (plus and minus)
- Portability - Can change out database engines easily !?
- Probably not true, often will rely on db specific features e.g. postgres jsonb, triggers, etc
- Direct mapping from db to models
- Automatic schema generation
- Migrations are automatically generated
- Security
- abstracts away enough that sql injection is less likely
Why use an ORM? - Cons
- Abstraction over SQLā¦
- Hides the underlying SQL
- Can be difficult to debug
- Lazy loading can cause N+1 queries without the developer realising
- Harder to onboard new developers if they havenāt used Django before
- Performance
- Generated sql be slower than crafted SQL
Fundamentals
- Models = Tables
# drilling/models.py
from django.db import models
class Hole(models.Model):
name = models.TextField()
CREATE TABLE drilling_hole (
id SERIAL PRIMARY KEY,
name VARCHAR(100)
);
Migrations
python manage.py makemigrations # generate migration files
python manage.py migrate # apply migrations
python manage.py makemigrations drilling --empty # generate empty file for data migration for the `drilling` app
https://docs.djangoproject.com/en/dev/topics/migrations/
Querying
ActiveRecord
pattern - ala Ruby on Rails style- QuerySets (
Hole.objects.all()
)- lazy
- chainable
- cached when iterated over multiple times !?
- I would not recommend relying on this because it hard to comprehend when it is cached and when it is not when you are reading code
holes_qs = Hole.objects.filter(name="cheese") # not evaluated yet
holes_qs = holes_qs.filter(depth__gt=100) # still not evaluated
list(holes_qs) # evaluated
list(holes_qs) # cached
holes_qs[2] # not cached
holes_qs.first() # not cached
holes_qs.get(id=1) # not cached
WHERE
WHERE
clause āfilter()
holes_qs = Hole.objects.filter(name="cheese")
ā¬ļø
SELECT *
FROM drilling_hole;
WHERE drilling_hole.name = 'cheese';
WHERE across tables?
- But how do you do a left/inner join? With the ORM it isnāt done declaratively, but implicitly
class Hole(models.Model):
name = models.TextField()
pad = models.ForeignKey(Pad, on_delete=models.CASCADE)
class Pad(models.Model):
name = models.TextField()
holes_qs = Hole.objects.filter(pad__name="cheese board")
ā¬ļø
SELECT *
FROM drilling_hole;
INNER JOIN drilling_pad ON drilling_hole.pad_id = drilling_pad.id
WHERE drilling_pad.name = 'cheese board';
WHERE other conditionals
filter(name="cheese")
->filter(name__exact="cheese")
->WHERE name = 'cheese'
filter(name__iexact="cheese")
->WHERE name ILIKE 'cheese'
filter(name__contains="cheese")
->WHERE name LIKE '%cheese%'
filter(name__icontains="cheese")
->WHERE name ILIKE '%cheese%'
filter(name__in=["cheese", "board"])
->WHERE name IN ('cheese', 'board')
filter(name__gt=100)
->WHERE name > 100
etcfilter(name__isnull=True)
->WHERE name IS NULL
- At least for postgres shouldnāt
name = None
, null != null
- At least for postgres shouldnāt
AS
annotate
āAS
holes_qs = Hole.objects.annotate(this_thang=F("pad__name"))
hole = holes_qs.first()
print(hole.this_thang)
ā¬ļø
SELECT
* ,
drilling_pad.name AS this_thang
FROM drilling_hole;
INNER JOIN "drilling_pad" ON ("drilling_hole"."pad_id" = "drilling_pad"."id")
Subqueries
class Project(models.Model):
name = models.TextField()
class Pad(models.Model):
name = models.TextField()
class Hole(models.Model):
name = models.TextField()
pad = models.ForeignKey(Pad, on_delete=models.CASCADE)
project = models.ForeignKey(Project, on_delete=models.CASCADE)
# find pads that are on project_id=1
hole_subquery = Hole.objects.filter(project_id=1).values("pk")
pad_qs = Pad.objects.filter(hole__in=Subquery(hole_subquery))
ā¬ļø
SELECT "drilling_pad"."id",
"drilling_pad"."name"
FROM "drilling_pad"
INNER JOIN "drilling_hole" ON ("drilling_pad"."id" = "drilling_hole"."pad_id")
WHERE "drilling_hole"."id" IN (
SELECT U0."id"
FROM "drilling_hole" U0
WHERE U0."project_id" = 1
)
Correlated Subqueries
Correlated subqueries are where the inner query depends on outer query
class Pad(models.Model):
name = models.TextField()
class Hole(models.Model):
name = models.TextField()
pad = models.ForeignKey(Pad, on_delete=models.CASCADE)
# include the hole id of any hole that has a foreign key to the pad
hole_subquery = Hole.objects.filter(pad_id=OuterRef("pk")).values("pk")
pad_qs = Pad.objects.annotate(hole_id=Subquery(hole_subquery))
ā¬ļø
SELECT "drilling_pad"."id",
"drilling_pad"."name",
(
SELECT U0."id"
FROM "drilling_hole" U0
WHERE U0."pad_id" = ("drilling_pad"."id")
) AS "hole_id"
FROM "drilling_pad"
Performance improvements
- Reduce N+1
- You typically want to reduce N+1 queries because they have communication overhead
select_related
prefetch_related
- You also might choose to use
annotate()
instead ofselect_related
because select related pulls all the data for the associated table when you might only need one column. That associated might have a jsonb column which contains a lot of unnecessary data that you donāt need.
select_related
holes = Hole.objects.all()
for hole in holes:
print(hole.pad.name) # N+1 queries
holes = Hole.objects.select_related("pad")
for hole in holes:
print(hole.pad.name) # no extra query
Many-to-many ā prefetch_related
You would use prefetch related when you are not pulling a direct foreign key such a many-to-many relationship like below.
class Faculty(models.Model):
name = models.TextField()
class Course(models.Model):
name = models.TextField()
faculty = models.ForeignKey(Faculty, on_delete=models.CASCADE)
class Student(models.Model):
name = models.TextField()
courses = models.ManyToManyField(Course, through="Enrolment")
class Enrolment(models.Model):
course = models.ForeignKey(Course, on_delete=models.CASCADE)
student = models.ForeignKey(Student, on_delete=models.CASCADE)
grade = models.FloatField()
students = Student.objects.prefetch_related("courses")
for student in students:
for course in student.courses.all():
print(course.name) # no extra query
print(course.faculty.name) # extra query
students = Student.objects.prefetch_related(
Prefetch(
"courses",
queryset=Course.objects.select_related("faculty")
)
)
for student in students:
for course in student.courses.all():
print(course.name) # no extra query
print(course.faculty.name) # no extra query
to_attr
to_attr
can be used to make āfilteredā relationships available on the instance.
class Enrolment(models.Model):
course = models.ForeignKey(Course, on_delete=models.CASCADE)
student = models.ForeignKey(Student, on_delete=models.CASCADE)
grade = models.FloatField()
students = Student.objects.prefetch_related(
Prefetch(
"course",
queryset=Course.objects.filter(grade__gt=80.0).select_related("faculty"), to_attr="hd_courses"
)
)
for student in students:
for course in student.hd_courses.all():
...
Multiple instances when filtering across many-to-many
One gotcha is selecting across a many-to-many relationship can return multiple of the same instances.
Student.objects.filter(courses__faculty__name="Science") # inner join returns duplicated rows
<QuerySet [<Student: Student object (1)>, <Student: Student object (1)>]>
Student.objects.filter(courses__faculty__name="Science").distinct()
<QuerySet [<Student: Student object (1)>]>
SELECT
"testing_student"."id",
"testing_student"."name"
FROM
"testing_student"
INNER JOIN
"testing_enrolment"
ON
("testing_student"."id" = "testing_enrolment"."student_id")
INNER JOIN
"testing_course"
ON
("testing_enrolment"."course_id" = "testing_course"."id")
INNER JOIN
"testing_faculty"
ON
("testing_course"."faculty_id" = "testing_faculty"."id")
WHERE
"testing_faculty"."name" = 'Science'
Gotchas and other Funky stuff
- Model instances when retrieved will try to populate all columns
get()
orfirst()
for hole in Hole.objects.all()
- This can make migrations hard, as older workers will be requesting columns that might have been removed or renamed which will cause errors
- There are ways to do down-timeless migrations but are bit funky and multi step
- Recommendation is to avoid deleting or renaming columns
class Hole(models.Model):
name = models.TextField()
pad = models.ForeignKey(Pad, on_delete=models.CASCADE)
class Pad(models.Model):
name = models.TextField()
holes_qs = Hole.objects.annotate(this_thang=F("pad__name")).get()
ā¬ļø
SELECT
drilling_hole.name, -- pulls all columns
drilling_hole.pad_id,
drilling_pad.name AS this_thang
FROM drilling_hole;
WHERE drilling_pad.name = 'cheese board';
LIMIT 1;
Values
- So how do you to only retrieve certain columns?
class Hole(models.Model):
name = models.TextField()
pad = models.ForeignKey(Pad, on_delete=models.CASCADE)
holes_qs = Hole.objects.values("name")
for hole in holes_qs:
print(type(hole)) # dict
# not `Hole` object, hence no class functions, no lazy loading e.g. can't access `hole.pad.name`
ā¬ļø
SELECT
drilling_hole.name, -- only pulls name and maps it to a python dictionary object
FROM drilling_hole;
- Less data sent down the wire, but no lazy loading and no class functions as the data is a python dictionary
Other options
only()
anddefer()
- Will retrieve model instances, but wonāt retrieve all fields
- Values not declared when accessed on the model are lazy loaded
- Would not recommend to be used regularly, very high chance of N+1
holes_qs = Hole.objects.only("pad_id")
for hole in holes_qs:
print(hole.pad_id) # no extra query
print(hole.name) # name will be lazy loaded, N+1 queries
How do you know what SQL is being generated?
print(queryset.query)
- Django Debug Toolbar
- Kolo
Updating rows
There are three typical ways to update a row in the database.
class Hole(models.Model):
name = models.TextField()
instance = Hole.objects.create(name="cheese")
# save()
instance.name = "board"
instance.save()
# update()
Model.objects.filter(name="board").update(name="board2")
# bulk_update()
instance.name = "board3"
instances_to_update = [instance]
Model.objects.bulk_update(instances_to_update, ["name"])
Problems with updates
update()
andbulk_update()
do not triggersave()
method on the model- built in django signals (publish/subscribe pattern), there are post_save and pre_save signals which can be triggered when calling
save()
update()
andbulk_update()
do not trigger those signalsā¦
Other semi-related things (non deterministic ordering)
- Pagination / order_by
- Not a Django ORM thing, but a Django ORM hides the implementation detail, which may lead to unexpected result
- Page pagination is default in DRF list views and implemented with
LIMIT
andOFFSET
in SQL
?page_size=10&page=3
SELECT *
FROM drilling_hole
LIMIT 10
OFFSET 20;
Anything wrong with this query?
- There is no deterministic guarantee that the same 10 rows will be returned each time.
- A plain
SELECT
in postgres (may be different in different dbs) provides no guarantee of order, unlessORDER BY
is specified - It often appears to return in insertion/
id
order, but that is not guaranteed in postgres - Model Meta
ordering
may set a default order, but sometimes tht is ignored - For list views you should to provide a deterministic order_by
order_by(name)
is not enough if name is not uniqueorder_by(name, id)
is required, because id is unique
- This can been the the cause of some flaky tests issues where lists are returned seemingly in insertion order and asserted to return in id order
Thanks for reading! I hope this has been useful to you. There are definitely more particularities and gotchas to be aware of when using the Django ORM and Django in general but I think these are the most common ones.