Bulk Insert & Upsert
Problem: you have hundreds or thousands of rows to load — an import job, a sync from an external API — and they must land in one of two ways: insert new rows and update the ones that already exist (upsert), without one statement per row, and without blowing the backend's placeholder limit.
The pieces: insert_many renders one multi-row INSERT, on_conflict_merge turns it into an upsert, and a small chunks() loop keeps each statement under the bind-count ceiling.
insert_many + on_conflict_merge
use chain_builder::{Postgres, QueryBuilder};
let rows = vec![
vec![("email", "a@x.io"), ("name", "Ann")],
vec![("email", "b@x.io"), ("name", "Bob")],
];
let (sql, binds) = QueryBuilder::<Postgres>::table("users")
.insert_many(rows)
.on_conflict_merge(["email"])
.to_sql();
// INSERT INTO "users" ("email", "name") VALUES ($1, $2), ($3, $4) ON CONFLICT ("email") DO UPDATE SET "name" = EXCLUDED."name"One statement, one round trip: existing emails get their name updated from the proposed row, new ones are inserted. The SET list is the inserted columns minus the conflict targets — you never re-assign the key you matched on.
Two insert_many rules to keep in mind (full detail in INSERT · UPDATE · DELETE):
- The first row defines the column set (sorted alphabetically, like
insert). A key that appears only in a later row is silently dropped. - Ragged rows are NULL-padded, never an error. A later row missing a key binds
Value::Nullfor that slot instead of panicking — a malformed record in an import batch cannot take the process down:
let (sql, binds) = QueryBuilder::<Postgres>::table("u")
.insert_many([vec![("a", 1i64), ("b", 2i64)], vec![("a", 3i64)]])
.to_sql();
// INSERT INTO "u" ("a", "b") VALUES ($1, $2), ($3, $4)
// binds == [Value::I64(1), Value::I64(2), Value::I64(3), Value::Null]For "insert new, leave existing untouched" use on_conflict_do_nothing instead — same shape, different conflict action.
Chunking: respect the placeholder ceiling
Every cell is a bound placeholder, so a batch binds rows × columns parameters — and each backend caps the count per statement:
| Backend | Limit | Source |
|---|---|---|
| Postgres | 65,535 | wire protocol encodes the parameter count as 16-bit |
| MySQL | 65,535 | prepared-statement placeholder count is 16-bit |
| SQLite | 32,766 default (999 before 3.32.0) | SQLITE_MAX_VARIABLE_NUMBER compile-time option |
Exceed it and the statement fails at execute time. Stay comfortably under it by chunking — with the bonus that chunks keep statement size, lock duration, and memory per round trip bounded even when the hard limit is far away:
use chain_builder::{Postgres, QueryBuilder};
const COLS: usize = 2; // ("email", "name")
const CHUNK: usize = 1_000; // 1_000 rows × 2 cols = 2_000 binds ≪ 65_535
async fn upsert_users(
pool: &sqlx::PgPool,
rows: &[(String, String)], // (email, name)
) -> Result<u64, chain_builder::Error> {
let mut tx = pool.begin().await?;
let mut affected = 0u64;
for chunk in rows.chunks(CHUNK) {
let batch = chunk
.iter()
.map(|(email, name)| [("email", email.clone()), ("name", name.clone())]);
let result = QueryBuilder::<Postgres>::table("users")
.insert_many(batch)
.on_conflict_merge(["email"])
.execute(&mut *tx)
.await?;
affected += result.rows_affected();
}
tx.commit().await?;
Ok(affected)
}Pick CHUNK so that CHUNK × COLS clears the smallest limit you deploy against (999 if old SQLite builds are in play), with margin for any extra binds on the statement (WHERE/RETURNING expressions, etc.). The transaction makes the chunked upsert all-or-nothing; drop it if partial progress is acceptable for your job.
Notes & caveats
- MySQL ignores your conflict targets.
on_conflict_merge(["email"])compiles toON DUPLICATE KEY UPDATE …against whatever unique/primary keys the table has, and the SET list covers all inserted columns. If the table carries more than one unique key, the matched row may not be the one["email"]suggests — see the MySQL merge note before relying on target semantics there. - Get the upserted rows back with
.returning([...])on Postgres/SQLite (per chunk). On MySQLreturningis a silent no-op — query separately. rows_affected()counts differently per backend. Notably MySQL'sON DUPLICATE KEY UPDATEreports 2 for an updated row and 1 for an inserted one, so the sum above is not a row count there.- Don't pad with empty batches:
insert_manyover zero rows leaves the builder with no columns, which fails asBuildError::EmptyInsertat compile time.chunks()never yields an empty slice, so the loop above is safe — but guard any other path that might hand the builder an empty iterator.
Related pages
- INSERT · UPDATE · DELETE —
insert_many, NULL padding, sorted columns - Upsert & RETURNING — both conflict actions, per-dialect rendering
- Executing with sqlx —
execute, transactions,rows_affected