Skip to content

Bulk Insert & Upsert

Problem: you have hundreds or thousands of rows to load — an import job, a sync from an external API — and they must land in one of two ways: insert new rows and update the ones that already exist (upsert), without one statement per row, and without blowing the backend's placeholder limit.

The pieces: insert_many renders one multi-row INSERT, on_conflict_merge turns it into an upsert, and a small chunks() loop keeps each statement under the bind-count ceiling.

insert_many + on_conflict_merge

rust
use chain_builder::{Postgres, QueryBuilder};

let rows = vec![
    vec![("email", "a@x.io"), ("name", "Ann")],
    vec![("email", "b@x.io"), ("name", "Bob")],
];

let (sql, binds) = QueryBuilder::<Postgres>::table("users")
    .insert_many(rows)
    .on_conflict_merge(["email"])
    .to_sql();
// INSERT INTO "users" ("email", "name") VALUES ($1, $2), ($3, $4) ON CONFLICT ("email") DO UPDATE SET "name" = EXCLUDED."name"

One statement, one round trip: existing emails get their name updated from the proposed row, new ones are inserted. The SET list is the inserted columns minus the conflict targets — you never re-assign the key you matched on.

Two insert_many rules to keep in mind (full detail in INSERT · UPDATE · DELETE):

  1. The first row defines the column set (sorted alphabetically, like insert). A key that appears only in a later row is silently dropped.
  2. Ragged rows are NULL-padded, never an error. A later row missing a key binds Value::Null for that slot instead of panicking — a malformed record in an import batch cannot take the process down:
rust
let (sql, binds) = QueryBuilder::<Postgres>::table("u")
    .insert_many([vec![("a", 1i64), ("b", 2i64)], vec![("a", 3i64)]])
    .to_sql();
// INSERT INTO "u" ("a", "b") VALUES ($1, $2), ($3, $4)
// binds == [Value::I64(1), Value::I64(2), Value::I64(3), Value::Null]

For "insert new, leave existing untouched" use on_conflict_do_nothing instead — same shape, different conflict action.

Chunking: respect the placeholder ceiling

Every cell is a bound placeholder, so a batch binds rows × columns parameters — and each backend caps the count per statement:

BackendLimitSource
Postgres65,535wire protocol encodes the parameter count as 16-bit
MySQL65,535prepared-statement placeholder count is 16-bit
SQLite32,766 default (999 before 3.32.0)SQLITE_MAX_VARIABLE_NUMBER compile-time option

Exceed it and the statement fails at execute time. Stay comfortably under it by chunking — with the bonus that chunks keep statement size, lock duration, and memory per round trip bounded even when the hard limit is far away:

rust
use chain_builder::{Postgres, QueryBuilder};

const COLS: usize = 2;          // ("email", "name")
const CHUNK: usize = 1_000;     // 1_000 rows × 2 cols = 2_000 binds ≪ 65_535

async fn upsert_users(
    pool: &sqlx::PgPool,
    rows: &[(String, String)],  // (email, name)
) -> Result<u64, chain_builder::Error> {
    let mut tx = pool.begin().await?;
    let mut affected = 0u64;

    for chunk in rows.chunks(CHUNK) {
        let batch = chunk
            .iter()
            .map(|(email, name)| [("email", email.clone()), ("name", name.clone())]);

        let result = QueryBuilder::<Postgres>::table("users")
            .insert_many(batch)
            .on_conflict_merge(["email"])
            .execute(&mut *tx)
            .await?;
        affected += result.rows_affected();
    }

    tx.commit().await?;
    Ok(affected)
}

Pick CHUNK so that CHUNK × COLS clears the smallest limit you deploy against (999 if old SQLite builds are in play), with margin for any extra binds on the statement (WHERE/RETURNING expressions, etc.). The transaction makes the chunked upsert all-or-nothing; drop it if partial progress is acceptable for your job.

Notes & caveats

  • MySQL ignores your conflict targets. on_conflict_merge(["email"]) compiles to ON DUPLICATE KEY UPDATE … against whatever unique/primary keys the table has, and the SET list covers all inserted columns. If the table carries more than one unique key, the matched row may not be the one ["email"] suggests — see the MySQL merge note before relying on target semantics there.
  • Get the upserted rows back with .returning([...]) on Postgres/SQLite (per chunk). On MySQL returning is a silent no-op — query separately.
  • rows_affected() counts differently per backend. Notably MySQL's ON DUPLICATE KEY UPDATE reports 2 for an updated row and 1 for an inserted one, so the sum above is not a row count there.
  • Don't pad with empty batches: insert_many over zero rows leaves the builder with no columns, which fails as BuildError::EmptyInsert at compile time. chunks() never yields an empty slice, so the loop above is safe — but guard any other path that might hand the builder an empty iterator.