Apply job aborting with <1 granule size unable to allocate, ASMM on

oracleoracle-10g-r2

all

My team is trying to resolve an oracle issue we're seeing sporadically that is really causing us trouble.

We have an update process that imports .dmp files via a package that utilizes an Apply job to do so. (not our choice to implement it this way…)

The problem is, we're getting this apply job aborted with an error of 4031 unable to allocate <100 bytes (various values, usually from 60 to 90) from memory, with the shared pool marked as the culprit. ASMM is on. The server is provisioned ~500 MB of memory to run on (these are old Server 2003 environments so memory is in high demand). I am concerned about this error – it seems to me that the error should always report multiples of 4k, as that is our granule size, but perhaps I am mis-understanding the process of freeing up memory. I am also concerned that the server is not freeing up packages as needed.

I believe that the fix is to use dbms_shared_pool.keep("PACKAGE NAME") to pin the package used to run the apply job, so that it never has to allocate the memory again, but we need to test this fix before we deploy it, for various organizational reasons. We also plan to allocate more memory to the oracle database. Setting the minimum size is not a solution, as we've seen this issue when the size of the pool is above the minimum, with ASMM on. We've also tried setting the sga maximum size to a higher value, and pinning the package in question, but again, with no way to reproduce the issue, it is a waiting game to see if it ever happens again.

Our main problem is in reliably reproducing the problem – even when disabling ASMM and setting the shared pool as small as possible, we're not able to reliably reproduce the issue. We've gotten the shared pool small enough that the web interface can't even pull image tags out of the database, or query the login information from the appropriate table, yet the apply job still runs reliably.

We do not have an in-house DBA so naturally we're probably missing some obvious steps.

This issue is present sporadically on 2-3 servers in a 8 server environment. These individual servers all see similar use by different groups. I am unclear as to why only some of the servers encounter this problem, but have not been able to identify any smoking gun cause.

I am hoping for input in either reliably avoiding this issue entirely, or in reliably forcing the shared pool to be unable to allocate bytes to this job, so that we can confirm these fixes actually work.

Please bear in mind that I have read most of the 4031 posts on SO at this point – this problem has been under investigation for weeks. It is likely NOT appropriate to mark this question as duplicate and direct me to another question – I've read it already. It did not answer my question.This is a cross post from plain old Stackoverflow

I have examined the trace files for one of the servers but it was less than enlightening – I am unsure of what to look for.

We've tried flushing the shared pool, but again, with no way to reproduce the issue it turns into an "wait and see" fix, which isn't the answer that our funder is really interested in.

Best Answer

This answer is based on the discussion had in the comments of the original question.

The primary cause of any ORA-4031 is that the shared pool is exhausted and is unable to allocate new connections to the database. This will cause the database to crash. I am not going to comment if the package that is running the import is just the last straw or if it is the entire problem, architecturally, I am just offering a workaround.

Due to the small nature of the server memory, space is extremely restricted. I would recommend assigning specific amounts of memory to the shared_pool_size parameter in the spfile and bumping up memory usage incrementally each time the issue occurs. This, of course, implies an increase to SGA_max_size and SGA_target_size. Initial sizing and increment size will be left for the author to discuss, but I'd imagine 50m jumps would be small enough to keep the server happy and also have reasonable chances of success to workaround the ORA-4031.

If memory of the server becomes exhausted before issues begin to resolve, then I would begin digging around the import package load to see if there are inefficiencies with how that procedure is using the shared_pool.