Opened 4 years ago

Last modified 3 years ago

#282 new defect

Very large pipeline causes very high Postgres server memory usage

Reported by: boraas@… Owned by: jtv
Priority: normal Component: performance
Severity: Ugly Keywords:
Cc:

Description

I'm doing a transaction in which I insert many rows into a table (returning the IDs of the new rows). For highest throughput, I'm using the pipeline object. The documentation says to insert as many queries into the pipeline as possible, and retrieve the results as late as possible. So, I insert all my rows (about ), and then retrieve all the results with the IDs. While the Postgres server is processing all the inserts, I see its resident memory go up and up, apparently unbounded. Server configuration is shared_buffers=4GB and work_mem=10MB. I see its resident memory reach 35GB. Then it returns to a normal 4GB as I retrieve the results and close the pipeline.

This seems unexpected to me. Perhaps a work_mem is being allocated on the server for each query, and isn't deallocated until the results are retrieved? The ID is a single number, so it should be buffer to buffer that much data on the server. So perhaps this is a server memory management issue.

I think I can work around this by keeping say 100 outstanding queries in the pipeline and retrieving old results while inserting new queries. But this contradicts the usage guidelines in the pipeline documentation.

Does this behaviour seem expected? Is this a discussion for the Postgres server?

Change History (4)

comment:1 Changed 4 years ago by anonymous

Sorry, I forgot to add, I'm putting about 2 million INSERTs into the pipeline (1 new row per INSERT).

comment:2 Changed 4 years ago by RichardWef

  • version changed from 4.0 to 3.2
Last edited 3 years ago by jtv (previous) (diff)

comment:3 Changed 3 years ago by jtv

I'm surprised that this happens, unless it's the parser having to go through the whole combined query before it can start executing. Your usage seems extreme though - why not set a lower capacity? Pipelining will save you _some_ overhead, but at some batch size that overhead will become insignificant compared to unavoidable costs.

comment:4 Changed 3 years ago by jtv

  • milestone 3.1 deleted

Milestone 3.1 deleted

Note: See TracTickets for help on using tickets.