Kodnos
/
Software Engineering

PostgreSQL Performance Tips That Helped Me in Real Projects

Years of PostgreSQL experience distilled into the tips that actually matter in production. EXPLAIN ANALYZE, indexing strategies, N+1 fixes, connection pooling, full-text search, and the 80/20 rule of database performance.

A
admin
Mar 25, 2026 15 min read 30 views
PostgreSQL Performance Tips That Helped Me in Real Projects

I have been using PostgreSQL for years now, and I keep learning things the hard way — usually at 2 AM when a production query that used to take 50 milliseconds suddenly takes 30 seconds. Every performance lesson I know came from a real situation, and I want to share the ones that made the biggest difference.

These are not theoretical tips from a textbook. These are things that actually saved me in production.

1. EXPLAIN ANALYZE Is Your Best Friend

Before optimizing anything, you need to know what is actually happening. Never guess. Measure.

SQL
EXPLAIN ANALYZE 
SELECT p.id, p.title, p.slug, p.excerpt, p.created_at
FROM posts p
WHERE p.category_id = 3 AND p.published = true
ORDER BY p.created_at DESC 
LIMIT 20;

The output tells you everything: which indexes PostgreSQL chose (or chose not) to use, how many rows it estimated vs actually scanned, the time spent on each operation, and whether it did a sequential scan (reading every row) or an index scan (jumping to the right rows).

Key things to look for:

  • Seq Scan on a large table — usually means a missing index
  • Rows estimated vs actual — large differences mean outdated statistics
  • Nested Loop with high row counts — possible N+1 or missing join index
  • Sort with high cost — might need an index that includes the sort column
  • I run EXPLAIN ANALYZE on every query that takes more than 100ms. Make it a habit.

    2. Index What You Actually Query

    This sounds obvious, but I have seen production databases with millions of rows and no indexes beyond the primary key. Here is the thing: PostgreSQL does not automatically create indexes on foreign keys. If you have a posts table with a category_id column and you filter by category, you need to create that index yourself.

    SQL
    -- Basic index for category lookups
    CREATE INDEX idx_posts_category_id ON posts(category_id);

    -- Composite index for a common query pattern CREATE INDEX idx_posts_category_published ON posts(category_id, created_at DESC) WHERE published = true;

    That second index is a partial index — it only includes rows where published = true. Since most queries filter by published status, the index is smaller and faster because it skips all the draft posts.

    Composite index column order matters. Put the columns with equality conditions first (category_id = 3) and range/sort columns last (created_at DESC). PostgreSQL can use the index for the equality filter and then return results already sorted.

    When NOT to Index

  • Very small tables (under 1000 rows) — sequential scan is often faster
  • Columns with very low cardinality (a boolean column with 50/50 distribution)
  • Tables with heavy write load — each index slows down INSERT/UPDATE/DELETE
  • If the query is rarely run, the index maintenance cost may not be worth it
  • 3. Stop Using SELECT *

    I know SELECT * is convenient during development, but it has real costs in production:

    SQL
    -- This fetches EVERYTHING, including that TEXT column
    -- with 10KB of HTML content you do not need for a listing page
    SELECT * FROM posts WHERE published = true ORDER BY created_at DESC LIMIT 20;

    -- This fetches only what the listing page actually needs SELECT id, title, slug, excerpt, image_url, created_at, reading_time_minutes FROM posts WHERE published = true ORDER BY created_at DESC LIMIT 20;

    The difference can be dramatic. If your content column averages 10KB and you are fetching 20 rows, that is 200KB of data you are transferring for no reason. For a listing page that only shows titles and excerpts, you are wasting 95% of the bandwidth.

    In JPA/Hibernate, use projections or DTOs instead of fetching entire entities:

    Java
    // Bad: fetches all columns into full Post entities
    List<Post> posts = postRepository.findByPublishedTrue();

    // Good: fetches only needed columns @Query("SELECT new com.myapp.dto.PostSummary(p.id, p.title, p.slug, p.excerpt) " + "FROM Post p WHERE p.published = true") List<PostSummary> findPublishedSummaries();

    4. The N+1 Query Problem

    This is the single most common performance killer in applications using ORMs, and it is insidious because the code looks perfectly reasonable:

    Java
    // Looks innocent, right?
    List<Post> posts = postRepository.findAll();  // 1 query
    for (Post post : posts) {
        String categoryName = post.getCategory().getName();  // N queries!
        String authorName = post.getAuthor().getUsername();   // N more queries!
    }
    

    With lazy loading (the default in JPA), accessing post.getCategory() triggers a separate database query for each post. If you have 50 posts, that is 1 + 50 + 50 = 101 database round trips instead of 1.

    The fix is to tell JPA to load the related entities upfront:

    Java
    // Option 1: @EntityGraph
    @EntityGraph(attributePaths = {"category", "author", "tags"})
    List<Post> findByPublishedTrue();

    // Option 2: JOIN FETCH in JPQL @Query("SELECT p FROM Post p " + "JOIN FETCH p.category " + "JOIN FETCH p.author " + "WHERE p.published = true") List<Post> findPublishedWithDetails();

    One query instead of 101. This single fix has been the biggest performance improvement in almost every project I have worked on.

    How to detect N+1: Enable SQL logging during development:

    YAML
    spring:
      jpa:
        show-sql: true
        properties:
          hibernate:
            format_sql: true
    

    Watch the console. If you see the same query pattern repeated dozens of times, you have an N+1 problem.

    5. Connection Pooling Configuration

    Every database connection has overhead — TCP handshake, SSL negotiation (if enabled), authentication, session setup, memory allocation on both sides. Without pooling, your application creates a new connection for every database operation and destroys it afterward.

    HikariCP (the default in Spring Boot) manages a pool of reusable connections:

    YAML
    spring:
      datasource:
        hikari:
          maximum-pool-size: 10
          minimum-idle: 5
          connection-timeout: 30000
          idle-timeout: 600000
          max-lifetime: 1800000
    

    Sizing the pool correctly matters more than you might think. Too small: requests wait for connections and timeout. Too large: you overwhelm the database with concurrent connections (PostgreSQL's default max is 100).

    A good starting point for a single application server: maximum-pool-size = (2 * number_of_CPU_cores) + number_of_disk_spindles. For a 4-core server with an SSD, that is about 9-10 connections.

    10 connections can handle a surprising amount of traffic if your queries are fast. The bottleneck is usually query performance, not connection count.

    6. Use Batch Operations

    Inserting or updating rows one at a time means one database round trip per row. For 1000 rows, that is 1000 round trips, each with its own network latency, transaction overhead, and index maintenance.

    SQL
    -- Slow: 1000 separate INSERT statements
    INSERT INTO tags (name, slug) VALUES ('java', 'java');
    INSERT INTO tags (name, slug) VALUES ('python', 'python');
    -- ... 998 more

    -- Fast: single multi-row INSERT INSERT INTO tags (name, slug) VALUES ('java', 'java'), ('python', 'python'), ('docker', 'docker'), -- ... ;

    In JPA, enable batching:

    YAML
    spring:
      jpa:
        properties:
          hibernate:
            jdbc:
              batch_size: 50
            order_inserts: true
            order_updates: true
    

    This tells Hibernate to group INSERT and UPDATE statements into batches of 50, dramatically reducing round trips.

    7. VACUUM and Dead Tuples

    PostgreSQL uses a mechanism called MVCC (Multi-Version Concurrency Control) for transaction isolation. When you UPDATE or DELETE a row, the old version is not immediately removed — it is marked as a "dead tuple." This allows other concurrent transactions to see the old data if they started before your change.

    Over time, dead tuples accumulate and cause problems:

  • The table grows larger on disk (table bloat)
  • Index scans become slower because they encounter dead tuples
  • Sequential scans read more pages than necessary
  • Autovacuum is PostgreSQL's built-in cleanup process. It runs automatically and removes dead tuples. But sometimes it falls behind, especially on tables with heavy write activity.

    Check your table health:

    SQL
    SELECT 
      relname AS table_name,
      n_live_tup AS live_rows,
      n_dead_tup AS dead_rows,
      ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 1) AS dead_pct,
      last_autovacuum,
      last_autoanalyze
    FROM pg_stat_user_tables
    ORDER BY n_dead_tup DESC
    LIMIT 10;
    

    If you see tables with a high dead tuple percentage and no recent autovacuum, you may need to tune the autovacuum settings or run a manual VACUUM.

    8. Full-Text Search Instead of LIKE

    If you are using LIKE '%keyword%' for search, you are doing a full table scan on every search query:

    SQL
    -- Slow: cannot use any index, scans every row
    SELECT * FROM posts 
    WHERE title ILIKE '%docker%' 
       OR content ILIKE '%docker%';

    -- Fast: uses GIN index on tsvector column SELECT * FROM posts WHERE search_vector @@ plainto_tsquery('simple', 'docker');

    PostgreSQL has built-in full-text search that is surprisingly capable. You create a tsvector column, add a GIN index, and set up a trigger to keep it updated. Searches go from scanning every row to using the index — orders of magnitude faster on large tables.

    For most applications, PostgreSQL's built-in search is good enough. You do not need Elasticsearch unless you have very specific requirements like fuzzy matching, typo tolerance, or faceted search across millions of documents.

    9. Use Proper Data Types

    Data types affect both storage size and query performance:

    SQL
    -- Bad: storing dates as strings
    CREATE TABLE events (
      event_date VARCHAR(10)  -- '2026-04-15'
    );

    -- Good: proper date type CREATE TABLE events ( event_date DATE );

    Why it matters:

  • DATE is 4 bytes. VARCHAR(10) for a date is 10+ bytes.
  • Date comparisons on VARCHAR require string comparison, which is slower and may give wrong results
  • You cannot use date functions on strings without casting
  • Range queries on proper dates use indexes efficiently

Other common mistakes: using TEXT for everything (when VARCHAR with a limit would catch data errors), using BIGINT for everything (when INTEGER is sufficient and half the size), and storing JSON as TEXT instead of using JSONB.

10. Monitor Long-Running Queries

Sometimes a query that normally takes 10ms suddenly takes 30 seconds because of lock contention, bad statistics, or a changed query plan. Set up monitoring:

SQL
-- Find currently running long queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 seconds'
  AND state != 'idle'
ORDER BY duration DESC;

Consider setting log_min_duration_statement in postgresql.conf:

log_min_duration_statement = 500  -- Log queries taking more than 500ms

This gives you a log of all slow queries without adding overhead to fast ones.

The 80/20 Rule of Database Performance

Most performance issues I have encountered come from three sources:

1. Missing indexes on columns used in WHERE and JOIN clauses 2. N+1 queries from lazy-loaded ORM relationships 3. Fetching too much data (SELECT * when you need 3 columns)

Fix these three things and you have solved 80 percent of PostgreSQL performance problems in a typical web application.

The remaining 20 percent — table partitioning, materialized views, connection pooler tuning, read replicas, query plan analysis — is where things get interesting. But you rarely need those until you have real scale (millions of rows, thousands of concurrent users).

Start with the fundamentals. EXPLAIN ANALYZE everything. Add indexes where they matter. Fix your N+1 queries. The boring basics will take you surprisingly far.

15 MinShare this article
Oğuzhan Berke Özdil
Author

Oğuzhan Berke Özdil

I have been connected to computers since childhood. On this website, I share what I learn and experience while trying to build a strong foundation in software. I completed my BSc in Computer Science at AGH University of Krakow and I am currently pursuing an MSc in Computer Science with a focus on AI & Data Analysis at the same university.