Note that serial execution doesn't mean that each transaction will get the same results, regardless of the order.
Consider the following two transactions:
T1: select sum(salary) from Employee where dept='Sales' T2: insert into Employee values (....,'Sales',...)If we executeT1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 5thenT1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 6, we get a smaller salary total than if we executeT1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 6thenT1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 5.
In both cases, however, the salary total is consistent with the state of the database at the time the query is executed.
A serial execution of consistent transactions is always consistent.
If transactions execute under a concurrent (nonserial) schedule, the potential exists for conflict among their effects.
In the worst case, the effect of executing the transactions ...
- is to leave the database in an inconsistent state
- even though each transaction, by itself, is consistent
- concurrency control mechanisms handle them (see later).
Valid Concurrent Transaction
Not all concurrent executions cause problems.
For example, the schedules
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)or
T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X)or ...
leave the database in a consistent state.
Consider the following schedule where the transactions execute in parallel:
T1: ... A B C ... T2: ... A B C ... 0In this scenario:
- T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 6reads data (Database T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) read(Y) 50 Y:=Y-N 45 45 write(Y) read(X) 105 X:=X+M 113 113 write(X) --------- 113 45 2) thatT1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 5is currently operating on
- then makes changes toDatabase T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) read(Y) 50 Y:=Y-N 45 45 write(Y) read(X) 105 X:=X+M 113 113 write(X) --------- 113 45 2and overwritesT1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 5's result
The result:T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 5's update toDatabase T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) read(Y) 50 Y:=Y-N 45 45 write(Y) read(X) 105 X:=X+M 113 113 write(X) --------- 113 45 2is lost.
Lost Update Problem (cont)
Consider the states in the WR Conflict schedule:
T1: ... A B C ... T2: ... A B C ... 1Consider the following schedule where one transaction fails:
T1: ... A B C ... T2: ... A B C ... 2TransactionT1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 5aborts after writingDatabase T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) read(Y) 50 Y:=Y-N 45 45 write(Y) read(X) 105 X:=X+M 113 113 write(X) --------- 113 45 2.
The abort will undo the changes toDatabase T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+N 105 105 write(X) read(Y) 50 Y:=Y-N 45 45 write(Y) read(X) 105 X:=X+M 113 113 write(X) --------- 113 45 2, but where the undo occurs can affect the results.
Consider three places where undo might occur:
T1: ... A B C ... T2: ... A B C ... 3Temporary Update - Case 1
This scenario is ok. T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 5's effects have been eliminated.
T1: ... A B C ... T2: ... A B C ... 4Temporary Update - Case 2
In this scenario, some ofT1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 5's effects have been retained.
T1: ... A B C ... T2: ... A B C ... 5Temporary Update - Case 3
In this scenario,T1: R(X) W(X) R(Y) W(Y) T2: R(X) W(X) 6's effects have been lost, even after commit.
T1: ... A B C ... T2: ... A B C ... 6For ACID, we must ensure that schedules are:
- serializable
The effect of executing n concurrent transactions is the same as the effect of executing them serially in some order.
For assessing the correctness of concurrency control methods, need a test to check whether it produces serializable schedules.
- recoverable
A failed transaction should not affect the ability to recover the system to a consistent state.
This can be ensured if transactions commit only after all transactions whose changes they read commit.
If a concurrent schedule for transactions T1 ..Tn acts like a serial schedule for T1 ..Tn, then consistency is guaranteed.
To determine this requires a notion of schedule equivalence.
Note: we are not attempting to determine equivalence of entire computations, simply of the interleaved sequences of read/write operations.
A serializable schedule is a concurrent schedule that produces a final state that is the same as that produced by some serial schedule.
There are two primary formulations of serializability:
- conflict serializibility (read/write operations occur in the "right" order)
- view serializibility (read operations see the correct version of data)
Consider two transactions T1 and T2 acting on data item X.
Considering only read/write operations, the possibilities are:
T1 firstT2 firstEquiv?R1(X) R2(X)R2(X) R1(X)yesR1(X) W2(X)W2(X) R1(X)noW1(X) R2(X)R2(X) W1(X)noW1(X) W2(X)W2(X) W1(X)no
If T1 and T2 act on different data items, result is equivalent regardless of order.
Conflict Serializability (cont)
Two transactions have a potential conflict if
- they perform operations on the same data item
- at least one of the operations is a write operation
Conversely, if two operations in a schedule don't conflict,
we can swap their order without affecting the overall result.
This gives a basis for determining equivalence of schedules.
Conflict Serializability (cont)
If we can transform a schedule
- by swapping the orders of non-conflicting operations
- such that the result is a serial schedule
If a concurrent schedule is equivalent to some (any) serial schedule, then we have a consistency guarantee.
Conflict Serializability (cont)
Example: transform a concurrent schedule to serial schedule
T1: ... A B C ... T2: ... A B C ... 7View Serializability is
- an alternative formulation of serializability
- that is less conservative than conflict serializability (CS)
(some safe schedules that are view serializable are not conflict serializable)
- a schedule is "safe" if view equivalent to a serial schedule
- always read the result of the same write operations
- then the schedules must produce the same result
View Serializability (cont)
Two schedules S and S' on T1 .. Tn are view equivalent iff
- for each shared data item X
- if Tj reads the initial value of X in S, then it also reads the initial value of X in S'
- if Tj reads X in S and X was produced by Tk, then Tj must also read the value of X produced by Tk in S'
- if Tj performs the final write of X in S, then it must also perform the final write of X in S'
In practice, we don't test specific schedules for serializability.
However, in designing concurrency control schemes, we need a way of checking whether they produce "safe" schedules.
This is typically achieved by a demonstration that the scheme generates only serializable schedules, and we need a serializability test for this.
There is a simple and efficient test for conflict serializability;
there is a more complex test for view serializablity.
Both tests are based on notions of
- building a graph to represent transaction interactions
- testing properties of this graph (checking for cycles)
Testing Serializability (cont)
A precedence graph G = (V,E) for a schedule S consists of
- a vertex in V for each transaction from T1 .. Tn
- an edge in E for each pair Tj and Tk, such that
- there is a pair of conflicting operations between Tj & Tk
- the Tj operation occurs before the Tk operation
Testing Serializability (cont)
If an edge Tj → Tk exists in the precedence graph
- then Tj must appear before Tk in any serial schedule
Thus, the serializability test is reduced to cycle-detection
(and there are cycle-detection algorithms available in many algorithms textbooks)
Serializability Test Examples
Serializable schedule (with conflicting operations shown in red):
T1: ... A B C ... T2: ... A B C ... 8Precedence graph for this schedule:
No cycles ⇒ serializable (as we already knew)
Serializability Test Examples (cont)
Consider this schedule:
T1: ... A B C ... T2: ... A B C ... 9Precedence graph for this schedule:
Has a cycle ⇒ not serializable
Serializability Test Examples (cont)
Consider this 3-transaction schedule:
T1: read(X) T2: read(X) X := X + N X := X + M write(X) write(X) read(Y) Y := Y - N write(Y) 0Precedence graph for this schedule:
No cycles ⇒ serializable
Serializability Test Examples (cont)
Consider this 3-transaction schedule:
T1: read(X) T2: read(X) X := X + N X := X + M write(X) write(X) read(Y) Y := Y - N write(Y) 1Precedence graph for this schedule:
Has a cycle ⇒ not serializable
Having serializability tests is useful theoretically, but they do not provide a practical tool for organising schedules.
Why not practical?
- the # possible schedules for n transactions is O(n!)
- the cost of testing for serializability via graphs is O(n2)
- that can be applied to each transaction individually
- which guarantee that any combination of transactions is serializable
Concurrency Control (cont)
Approaches to ensuring ACID transactions:
- lock-based
Synchronise transaction execution via locks on some portion of the database.
- version-based
Allow multiple consistent versions of the data to exist, and allow each transaction exclusive access to one version.
- timestamp-based
Organise transaction execution in advance by assigning timestamps to operations.
- validation-based (optimistic concurrency control)
Exploit typical execution-sequence properties of transactions to determine safety dynamically.
Lock-based Concurrency Control
Synchronise access to shared data items via following rules:
- before reading X, get shared (read) lock on X
- before writing X, get exclusive (write) lock on X
- an attempt to get a shared lock on X is blocked if another transaction already has exclusive lock on X
- an attempt to get an exclusive lock on X is blocked if another transaction has any kind of lock on X
To guarantee serializability, we require an additional constraint on how locks are applied:
- no transaction can request a lock after it has released one of its locks
- growing phase where locks are acquired
- action phase where "real work" is done
- shrinking phase where locks are released
Appropriate locking can guarantee correctness.
However, it also introduces potential undesirable effects:
- deadlock
No transactions can proceed; each waiting on lock held by another.
- starvation
One transaction is permanently "frozen out" of access to data.
- reduced performance
Locking introduces delays while waiting for locks to be released.
Deadlock occurs when two transactions are waiting for a lock on an item held by the other.
Example:
T1: read(X) T2: read(X) X := X + N X := X + M write(X) write(X) read(Y) Y := Y - N write(Y) 2Handling deadlock involves forcing a transaction to "back off".
- select a process to "back off"
- choose on basis of how far transaction has progressed, # locks held, ...
- roll back the selected process
- how far does this it need to be rolled back? (less roll-back is better)
- prevent starvation
- need methods to ensure that same transaction isn't always chosen
Starvation occurs when one transaction
- waits on a lock indefinitely
- while other transactions continue normally
Multiple locks ⇒ need to decide which to release first.
Solutions:
- implement a fair wait/release strategy (e.g. first-come-first-served)
- use deadlock prevention schemes, such as "wait-die"
Locking typically reduces concurrency ⇒ reduces throughput.
Granularity of locking can impact performance:
+ lock a small item ⇒ more of database accessible
+ lock a small item ⇒ quick update ⇒ quick lock release
- lock small items ⇒ more locks ⇒ more lock management
Granularity levels: field, row (tuple), table, whole database
Multiple lock-granularities give best scope for optimising performance.
Multi-version Concurrency Control
One approach to reducing the requirement for locks is to
- provide multiple (consistent) versions of the database
- give each transaction access to an "appropriate" version
(i.e. a version that will mantain the serializability of the transaction)
The primary difference between MVCC and standard locking models:
- read locks do not conflict with write locks, so that
- reading never blocks writing, and writing never blocks reading
Database systems using MVCC ensure
- statement-level read consistency
(i.e. once an SQL SELECT statement starts, its view of the data is "frozen") - readers do not wait for writers or other readers of the same data
- writers do not wait for readers of the same data
- writers only wait for other writers if they attempt to update identical rows in concurrent transactions
- a SELECT statement sees a consistent view of the database
- but it may not see the "current" view of the database
E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples
PostgreSQL uses MVCC to reduce locking requirements.
Consequences:
- several versions of each tuple may exist ⇒ uses more storage
- each transaction needs to check each tuple's visibility
- periodically, clean up "old" tuples (Database T1 T2 --------- ------------------- -------------- X Y X Y X 100 50 ? ? ? read(X) 100 X:=X+M 108 108 write(X) read(X) 108 X:=X+N 113 113 write(X) read(Y) 50 Y:=Y-N 45 45 write(Y) --------- 113 45 4)
Concurrency control is still needed (via implicit locking):
- amount of locking is determined by user-chosen isolation level
- the system then applies appropriate locking automatically
PostgreSQL and MVCC (cont)
A transaction sees a consistent view of the database, but it may not see the "current" view of the database.
E.g. T1 does a select and then concurrent T2 deletes some of T1's selected tuples
This is not a problem unless the transactions communicate outside the database system.
For applications that require that every transaction accesses the current consistent version of the data, explicit locking must be used.