I have been using PostgreSQL for years now, and I keep learning things the hard way — usually at 2 AM when a production query that used to take 50 milliseconds suddenly takes 30 seconds. Every performance lesson I know came from a real situation, and I want to share the ones that made the biggest difference.
These are not theoretical tips from a textbook. These are things that actually saved me in production.
1. EXPLAIN ANALYZE Is Your Best Friend
Before optimizing anything, you need to know what is actually happening. Never guess. Measure.
EXPLAIN ANALYZE
SELECT p.id, p.title, p.slug, p.excerpt, p.created_at
FROM posts p
WHERE p.category_id = 3 AND p.published = true
ORDER BY p.created_at DESC
LIMIT 20;
The output tells you everything: which indexes PostgreSQL chose (or chose not) to use, how many rows it estimated vs actually scanned, the time spent on each operation, and whether it did a sequential scan (reading every row) or an index scan (jumping to the right rows).
Key things to look for:
- Seq Scan on a large table — usually means a missing index
- Rows estimated vs actual — large differences mean outdated statistics
- Nested Loop with high row counts — possible N+1 or missing join index
- Sort with high cost — might need an index that includes the sort column
- Very small tables (under 1000 rows) — sequential scan is often faster
- Columns with very low cardinality (a boolean column with 50/50 distribution)
- Tables with heavy write load — each index slows down INSERT/UPDATE/DELETE
- If the query is rarely run, the index maintenance cost may not be worth it
- The table grows larger on disk (table bloat)
- Index scans become slower because they encounter dead tuples
- Sequential scans read more pages than necessary
- DATE is 4 bytes. VARCHAR(10) for a date is 10+ bytes.
- Date comparisons on VARCHAR require string comparison, which is slower and may give wrong results
- You cannot use date functions on strings without casting
- Range queries on proper dates use indexes efficiently
I run EXPLAIN ANALYZE on every query that takes more than 100ms. Make it a habit.
2. Index What You Actually Query
This sounds obvious, but I have seen production databases with millions of rows and no indexes beyond the primary key. Here is the thing: PostgreSQL does not automatically create indexes on foreign keys. If you have a posts table with a category_id column and you filter by category, you need to create that index yourself.
-- Basic index for category lookups CREATE INDEX idx_posts_category_id ON posts(category_id);
-- Composite index for a common query pattern CREATE INDEX idx_posts_category_published ON posts(category_id, created_at DESC) WHERE published = true;
That second index is a partial index — it only includes rows where published = true. Since most queries filter by published status, the index is smaller and faster because it skips all the draft posts.
Composite index column order matters. Put the columns with equality conditions first (category_id = 3) and range/sort columns last (created_at DESC). PostgreSQL can use the index for the equality filter and then return results already sorted.
When NOT to Index
3. Stop Using SELECT *
I know SELECT * is convenient during development, but it has real costs in production:
-- This fetches EVERYTHING, including that TEXT column -- with 10KB of HTML content you do not need for a listing page SELECT * FROM posts WHERE published = true ORDER BY created_at DESC LIMIT 20;
-- This fetches only what the listing page actually needs SELECT id, title, slug, excerpt, image_url, created_at, reading_time_minutes FROM posts WHERE published = true ORDER BY created_at DESC LIMIT 20;
The difference can be dramatic. If your content column averages 10KB and you are fetching 20 rows, that is 200KB of data you are transferring for no reason. For a listing page that only shows titles and excerpts, you are wasting 95% of the bandwidth.
In JPA/Hibernate, use projections or DTOs instead of fetching entire entities:
// Bad: fetches all columns into full Post entities List<Post> posts = postRepository.findByPublishedTrue();
// Good: fetches only needed columns @Query("SELECT new com.myapp.dto.PostSummary(p.id, p.title, p.slug, p.excerpt) " + "FROM Post p WHERE p.published = true") List<PostSummary> findPublishedSummaries();
4. The N+1 Query Problem
This is the single most common performance killer in applications using ORMs, and it is insidious because the code looks perfectly reasonable:
// Looks innocent, right?
List<Post> posts = postRepository.findAll(); // 1 query
for (Post post : posts) {
String categoryName = post.getCategory().getName(); // N queries!
String authorName = post.getAuthor().getUsername(); // N more queries!
}
With lazy loading (the default in JPA), accessing post.getCategory() triggers a separate database query for each post. If you have 50 posts, that is 1 + 50 + 50 = 101 database round trips instead of 1.
The fix is to tell JPA to load the related entities upfront:
// Option 1: @EntityGraph @EntityGraph(attributePaths = {"category", "author", "tags"}) List<Post> findByPublishedTrue();
// Option 2: JOIN FETCH in JPQL @Query("SELECT p FROM Post p " + "JOIN FETCH p.category " + "JOIN FETCH p.author " + "WHERE p.published = true") List<Post> findPublishedWithDetails();
One query instead of 101. This single fix has been the biggest performance improvement in almost every project I have worked on.
How to detect N+1: Enable SQL logging during development:
spring:
jpa:
show-sql: true
properties:
hibernate:
format_sql: true
Watch the console. If you see the same query pattern repeated dozens of times, you have an N+1 problem.
5. Connection Pooling Configuration
Every database connection has overhead — TCP handshake, SSL negotiation (if enabled), authentication, session setup, memory allocation on both sides. Without pooling, your application creates a new connection for every database operation and destroys it afterward.
HikariCP (the default in Spring Boot) manages a pool of reusable connections:
spring:
datasource:
hikari:
maximum-pool-size: 10
minimum-idle: 5
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000
Sizing the pool correctly matters more than you might think. Too small: requests wait for connections and timeout. Too large: you overwhelm the database with concurrent connections (PostgreSQL's default max is 100).
A good starting point for a single application server: maximum-pool-size = (2 * number_of_CPU_cores) + number_of_disk_spindles. For a 4-core server with an SSD, that is about 9-10 connections.
10 connections can handle a surprising amount of traffic if your queries are fast. The bottleneck is usually query performance, not connection count.
6. Use Batch Operations
Inserting or updating rows one at a time means one database round trip per row. For 1000 rows, that is 1000 round trips, each with its own network latency, transaction overhead, and index maintenance.
-- Slow: 1000 separate INSERT statements INSERT INTO tags (name, slug) VALUES ('java', 'java'); INSERT INTO tags (name, slug) VALUES ('python', 'python'); -- ... 998 more
-- Fast: single multi-row INSERT INSERT INTO tags (name, slug) VALUES ('java', 'java'), ('python', 'python'), ('docker', 'docker'), -- ... ;
In JPA, enable batching:
spring:
jpa:
properties:
hibernate:
jdbc:
batch_size: 50
order_inserts: true
order_updates: true
This tells Hibernate to group INSERT and UPDATE statements into batches of 50, dramatically reducing round trips.
7. VACUUM and Dead Tuples
PostgreSQL uses a mechanism called MVCC (Multi-Version Concurrency Control) for transaction isolation. When you UPDATE or DELETE a row, the old version is not immediately removed — it is marked as a "dead tuple." This allows other concurrent transactions to see the old data if they started before your change.
Over time, dead tuples accumulate and cause problems:
Autovacuum is PostgreSQL's built-in cleanup process. It runs automatically and removes dead tuples. But sometimes it falls behind, especially on tables with heavy write activity.
Check your table health:
SELECT
relname AS table_name,
n_live_tup AS live_rows,
n_dead_tup AS dead_rows,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 1) AS dead_pct,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
If you see tables with a high dead tuple percentage and no recent autovacuum, you may need to tune the autovacuum settings or run a manual VACUUM.
8. Full-Text Search Instead of LIKE
If you are using LIKE '%keyword%' for search, you are doing a full table scan on every search query:
-- Slow: cannot use any index, scans every row SELECT * FROM posts WHERE title ILIKE '%docker%' OR content ILIKE '%docker%';
-- Fast: uses GIN index on tsvector column SELECT * FROM posts WHERE search_vector @@ plainto_tsquery('simple', 'docker');
PostgreSQL has built-in full-text search that is surprisingly capable. You create a tsvector column, add a GIN index, and set up a trigger to keep it updated. Searches go from scanning every row to using the index — orders of magnitude faster on large tables.
For most applications, PostgreSQL's built-in search is good enough. You do not need Elasticsearch unless you have very specific requirements like fuzzy matching, typo tolerance, or faceted search across millions of documents.
9. Use Proper Data Types
Data types affect both storage size and query performance:
-- Bad: storing dates as strings CREATE TABLE events ( event_date VARCHAR(10) -- '2026-04-15' );
-- Good: proper date type CREATE TABLE events ( event_date DATE );
Why it matters:
Other common mistakes: using TEXT for everything (when VARCHAR with a limit would catch data errors), using BIGINT for everything (when INTEGER is sufficient and half the size), and storing JSON as TEXT instead of using JSONB.
10. Monitor Long-Running Queries
Sometimes a query that normally takes 10ms suddenly takes 30 seconds because of lock contention, bad statistics, or a changed query plan. Set up monitoring:
-- Find currently running long queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 seconds'
AND state != 'idle'
ORDER BY duration DESC;
Consider setting log_min_duration_statement in postgresql.conf:
log_min_duration_statement = 500 -- Log queries taking more than 500ms
This gives you a log of all slow queries without adding overhead to fast ones.
The 80/20 Rule of Database Performance
Most performance issues I have encountered come from three sources:
1. Missing indexes on columns used in WHERE and JOIN clauses 2. N+1 queries from lazy-loaded ORM relationships 3. Fetching too much data (SELECT * when you need 3 columns)
Fix these three things and you have solved 80 percent of PostgreSQL performance problems in a typical web application.
The remaining 20 percent — table partitioning, materialized views, connection pooler tuning, read replicas, query plan analysis — is where things get interesting. But you rarely need those until you have real scale (millions of rows, thousands of concurrent users).
Start with the fundamentals. EXPLAIN ANALYZE everything. Add indexes where they matter. Fix your N+1 queries. The boring basics will take you surprisingly far.
