Why should I consider optimistic versus pessimistic approaches to database updates?

Joe Sam Shirah

In a modern database, possibly the two most important issues are data integrity and concurrency ( multiple users have access to and can update the data ). Either approach can be appropriate, depending on the application, but it is important to be aware of possible consequences to avoid being blindsided.

A pessimistic approach, with locks, is usually seen as good for data integrity, although it can be bad for concurrency, especially the longer a lock is held. In particular, it guarantees against 'lost updates' - defined as an update performed by one process between the time of access and update by another process, which overwrites the interim update. However, other users are blocked from updating the data and possibly reading it as well if the read access also tries to acquire a lock. A notorious problem can arise when a user accesses data for update and then doesn't act on it for a period of time. Another situation that occurred with one of my clients is that a batch ( non-interactive ) process may need to update data while an interactive user has an update lock on the same data. In that case, data integrity goes out the window and, depending on how the application is written, more problems may be introduced. ( No, we did not write the interactive update program and yes, we had recovery procedures in place. )

An optimstic approach can alleviate lock concurrency problems, but requires more code and care for integrity. The "optimistic" definition usually says that expectations of update clashes are rare, but I view them as normal occurrances in a heavily used database. The basics are that any changes between time of access and time of update must be detected and taken into account. This is often done by comparing timestamps, but one must be sure that the timestamp is always changed for an update and, of course, that the table contains a timestamp column. A more involved, but more complete method involves saving the original columns and using them in the 'Where' clause of the Update statement. If the update fails, the data has changed and the latest data should be reaccessed.