If you read developer threads about scheduling jobs, top-k queries, or building fast priority queues, you will see heaps come up a lot. Python ships with a lightweight heap implementation that is perfect for many everyday tasks like finding the smallest items, merging sorted streams, or implementing Dijkstra’s algorithm in a few lines.
What are heaps in Python and how do you use them? I see the heapq module mentioned for priority queues, top-k lists, and graph algorithms. How do I push and pop items, handle max-heap behavior, break ties safely, and apply heaps to real-world problems?
A heap is a tree-like data structure that maintains a partial order: in a min-heap the smallest element is always at the root. Python’s standard library provides a binary min-heap in the heapq module, implemented on top of a plain list for speed and simplicity.
import heapq
# Build a heap incrementally
h = []
heapq.heappush(h, 5)
heapq.heappush(h, 1)
heapq.heappush(h, 3)
print(heapq.heappop(h)) # 1 (smallest)
print(h) # remaining heap structure
# Turn an existing list into a heap in place
data = [9, 3, 7, 1, 6]
heapq.heapify(data) # O(n)
print(heapq.heappop(data)) # 1
# Get n smallest or largest without fully sorting
nums = [8, 2, 7, 4, 9, 1, 5]
print(heapq.nsmallest(3, nums)) # [1, 2, 4]
print(heapq.nlargest(2, nums)) # [9, 8]Code language: PHP (php)
heappushandheappoprun inO(log n)heapifyruns in O(n)nsmallestandnlargestare optimized for smalln
heapq is a min-heap. To simulate a max-heap, store negated priorities or store tuples where the first element orders as you want.
import heapq
# Max-heap via negation
maxh = []
heapq.heappush(maxh, (-10, "low latency"))
heapq.heappush(maxh, (-5, "throughput"))
priority, task = heapq.heappop(maxh)
print(-priority, task) # 10 low latencyCode language: PHP (php)
When multiple items share the same priority, include a monotonic counter so the heap has a deterministic ordering and does not try to compare the tasks themselves.
import heapq
from itertools import count
pq = []
counter = count()
def push(pq, priority, item):
# For a max-heap, negate priority
entry = (priority, next(counter), item)
heapq.heappush(pq, entry)
def pop(pq):
priority, _, item = heapq.heappop(pq)
return priority, item
push(pq, 2, "resize")
push(pq, 1, "ingest")
push(pq, 2, "analyze")
print(pop(pq)) # (1, 'ingest')
print(pop(pq)) # (2, 'resize') then (2, 'analyze')Code language: PHP (php)
Maintain a fixed-size min-heap of size k. Push new items and pop when size exceeds k. The heap then holds the largest k items.
import heapq
def top_k(iterable, k):
h = []
for x in iterable:
if len(h) < k:
heapq.heappush(h, x)
else:
# If x is bigger than the smallest of the top k, replace it
if x > h[0]:
heapq.heapreplace(h, x)
return sorted(h, reverse=True)
print(top_k([5,1,9,3,7,8,2], 3)) # [9, 8, 7]Code language: PHP (php)
heapq.merge lazily merges multiple sorted iterables without loading everything into memory.
import heapq
a = [1, 4, 7]
b = [2, 5, 6]
c = [0, 3, 8]
for x in heapq.merge(a, b, c):
print(x)Code language: JavaScript (javascript)
import heapq
def dijkstra(graph, start):
dist = {start: 0}
pq = [(0, start)]
while pq:
d, u = heapq.heappop(pq)
if d != dist.get(u, float("inf")):
continue
for v, w in graph[u]:
nd = d + w
if nd < dist.get(v, float("inf")):
dist[v] = nd
heapq.heappush(pq, (nd, v))
return distCode language: JavaScript (javascript)
- Do not modify items in place after pushing. Remove and reinsert or push a new entry and mark the old one as invalid if needed.
- When items are non-comparable, add a counter to avoid TypeError on ties.
- For max-heap behavior, remember to negate or invert your priority.
In media pipelines you might prioritize work like ingest, analyze, transform, and deliver. A heap-backed priority queue lets you process urgent jobs first while keeping throughput high. For example, schedule time-sensitive transforms before batch work and integrate with Python image processing steps like Python-based analysis or file handling tips from saving images in Python.
If your pipeline includes format conversion or pre-optimization during backlog spikes, you can route tasks that trigger a tool like PNG compression at a lower priority than user-facing requests.
heapqimplements a fast min-heap withO(log n) push and pop,O(n) heapify.- Use tuples (priority, counter, item) to build stable priority queues.
- Simulate a max-heap by negating priorities or inverting comparisons.
- Apply heaps to top-k selection, merging sorted streams, and shortest path algorithms.
- In media workflows, heaps help prioritize processing tasks alongside Python image work and Cloudinary tooling.
Ready to streamline your media pipeline and deliver optimized assets at scale? Sign up for a free Cloudinary account and start transforming, optimizing, and delivering your content today.