LOGGING or NOLOGGING, that is the question – Part II

By Francisco Munoz Alvarez Oracle ACE

Redo Generation and Recoverability


The purpose of redo generation is to ensure recoverability.  This is the reason why, Oracle does not give the DBA a lot of control over redo generation. If the instance crashes, then all the changes within SGA will be lost. Oracle will then use the redo entries in the online redo files to bring the database to a consistent state. The cost of maintaining the redolog records is an expensive operation involving latch management operations (CPU) and frequent write access to the redolog files (I/O). You can avoid redo logging for certain operations using the NOLOGGING feature. Regarding redo generation, I saw all the times two questions in the OTN Forums: 

Why I have excessive Redo Generation during an Online Backup? 

When a tablespace is put in backup mode the redo generation behavior changes but there is not excessive redo generated, there is additional information logged into the online redo log during a hot backup the first time a block is modified in a tablespace that is in hot backup mode.

As long as the table space is in backup mode Oracle will write the entire block is dumped to redo when the ALTER TABLESPACE TBSNAME BEGIN BACKUP MODE is entered but later it generates the same redo. This is done due to the reason Oracle can not guaranty that a block was not copied while it was updating as part of the backup. Let’s go explain better this part: 

In hot backup mode only 2 things are different: 

  • The first time a block is changed in a datafile that is in hot backup mode, the entire block is written to the redo log files, not just the changed bytes.  Normally only the changed bytes (a redo vector) are written. In hot backup mode, the entire block is logged the first time.  This is because you can get into a situation where the process copying the datafile and DBWR are working on the same block simultaneously.  Let’s say they are and the OS blocking read factor is 512bytes (the OS reads 512 bytes from disk at a time).  The backup program goes to read an 8k Oracle block.  The OS gives it 4k.  Meanwhile the DBWR has asked to rewrite this block.  The OS schedules the DBWR write to occur right now.  The entire 8k block is rewritten.  The backup program starts running again (multi-tasking OS here) and reads the last 4k of the block.  The backup program has now gotten an impossible block — the head and tail are from two points in time.  We cannot deal with that during recovery.  Hence, we log the entire block image so that during recovery, this block is totally rewritten from redo and is consistent with itself at least.  We can recover it from there.
  • The datafile headers which contain the SCN of the last completed checkpoint are NOT updated while a file is in hot backup mode. DBWR constantly write to the datafiles during the hot backup.  The SCN recorded in the header tells us how far back in the redo stream one needs to go to recover this file.

To limit the effect of this additional logging, you should ensure you only place one tablespace at a time in backup mode and bring the tablespace out of backup mode as soon as you have backed it up.  This will reduce the number of blocks that may have to be logged to the minimum possible. 

What are the differences between REDO and UNDO? To clear this question we have this table: 

Record of How to undo a change How to reproduce a change
Used for Rollback, Read-Consistency Rolling forward DB Changes
Stored in Undo segments Redo log files
Protect Against Inconsistent reads in multiuser systems Data loss

An undo segment is just a segment, like a table or an index or a hash cluster or a materialized view is a segment. The clue is in the name. And the rule is that if you modify part of a segment, any segment, regardless of its type, you must generate redo so that the modification can be recovered in the event of media or instance failure. Therefore, you modify EMP; the changes to the EMP blocks are recorded in redo. The modification to EMP also has to be recorded in UNDO, because you might change your mind and want to reverse the transaction before you commit. Therefore, the modification to EMP causes entries to be made in an undo segment. But that’s a modification to a segment -this time, an undo segment. Therefore, the changes to the undo segment also have to be recorded in redo, in case you suffer a media or instance failure. 

If your database now crashed and you had to restore a set of datafiles, including those for the undo tablespace, from 10 days ago, you would of course do what Oracle always does: start reading from your archived redo, rolling the 10 day old files forward in time until they were 9, then 8, then 7 then 6 and so on days old, until you get to the time where the only record of the changes to segments (any segment) was contained in the current online redo log, and then you’d use that redo log to roll the files forward until they were 8 minutes old, 7 minutes, 6, 5, 4,… and so on, right up until all changes to all segments that had ever been recorded in the redo had been applied. At which point, your undo segments have been re-populated. So now you can start rolling back those transactions which were recorded in the redo logs, but which weren’t committed at the time of the database failure.  

I can’t emphasize enough, really, that undo segments are just slightly special tables. They’re fundamentally not very different from EMP or DEPT, except that new inserts into them can over-write a previous record, which never happens to EMP, of course. If you generate change vectors when you update EMP, you generate change vectors when you generate undo.   Why do we store the before and after image in redo and then duplicate half of that by repeating the store of the before image in undo? Because redo is written and generated sequentially and isn’t cached for long in memory (most log buffers are a few megas in size, tops). Therefore, using redo to rollback a mere mistake or as a result of a change of mind, whilst theoretically do-able, would involve wading through huge amounts of redo sequentially, looking for one little before image in a sea of changes made by lots of people and all of that wading would be done by reading stuff off disk (like it is in a recovery scenario). Undo, on the other hand, is stored in the buffer cache (just as EMP is stored in the buffer cache), so there’s a good chance that reading that will only require logical I/O, not physical. And your transaction is dynamically linked to where it’s written its undo, so you and your transaction can jump straight to where your undo is, without having to wade through the entire undo generated by other transactions. In performance terms, there is no comparison. Splitting ‘you need this for recovery’ from ‘you need this for changes of mind’ was a stroke of genius on the part of Oracle: other databases merely have ‘transaction logs’ which serve both purposes, and suffer in performance and flexibility terms accordingly.  

Wait for next part, I will talk about Logging and No Logging: What is it, how, when and why. 

One Reply to “LOGGING or NOLOGGING, that is the question – Part II”

  1. Pingback: ora-click.com

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.