• tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    15
    ·
    edit-2
    2 days ago

    goes looking for the issue

    PostgresSQL has a limit of 65,535 parameters, so bulk inserts can fail with large datasets.

    Hmm. I would believe that there are efficiency gains from doing one large insert rather than many small — like, there are probably optimizations one can take advantage of in rebuilding indexes — and it’d be nice for database users to have a way to leverage that.

    On the other hand, I can also believe that DBMSes might hold locks while running a query, and permitting unbounded (or very large) size and complexity queries might create problems for concurrent users, as a lock might be held for a long time.

    EDIT: Hmm. Lock granularity probably isn’t the issue:

    https://stackoverflow.com/questions/758945/whats-the-fastest-way-to-do-a-bulk-insert-into-postgres

    One way to speed things up is to explicitly perform multiple inserts or copy’s within a transaction (say 1000). Postgres’s default behavior is to commit after each statement, so by batching the commits, you can avoid some overhead. As the guide in Daniel’s answer says, you may have to disable autocommit for this to work. Also note the comment at the bottom that suggests increasing the size of the wal_buffers to 16 MB may also help.

    is worth mentioning that the limit for how many inserts/copies you can add to the same transaction is likely much higher than anything you’ll attempt. You could add millions and millions of rows within the same transaction and not run into problems.

    Any lock granularity issues would also apply to transactions.

    Might be concerns about how the query-processing code scales.