Overview
--------
diskdump provides dump feature that can dump memory with dump partitions.
diskdump consists of kernel portion and diskdumputils package.
The kernel portion inhibits interrupts, freezes all other CPUs, and then for
each page of data, issues the I/O command to the host adapter driver, followed
by calling the interrupt handler of the adapter driver iteratively until
the I/O has completed.
The diskdumputils package is installed on the machine that you wish to capture
dumps on, in the event of a system panic.  It loads and configures the diskdump
kernel modules so that if the machine crashes, the memory dump will be dumped
to disk.
When diskdump is executed, of course, the system is in a serious trouble.
Therefore, it is possible that user resources on the disk device containing
the dump partition are corrupted.  To avoid this danger, signatures to identify
are written to the complete dump partition.  When a system panic occurs,
the diskdump module reads the dump partition and checks on whether
the signatures written are correct.  If all of the signatures are verified,
the diskdump presumes that it will be writing to the correct device, and that
it is highly possible that it will be able to write memory dump correctly.

Features
--------
There are several features in diskdump service.  Each feature that is marked
with an X in the following table is available in RHEL3 and/or RHEL4.  Basic Dump
Facility is a feature that dumps memory without specifying any module parameters
and any options.  Partial Dump and Compressed Dump are enabled by specifying
module parameters.  Swap Partiton Support is enabled by specifying swap
partition that works with the supported driver as a dump device.  The other
features are enabled by specifying options for their features.  See the later
descriptions in this README if you need further information about each feature.

	Feature			RHEL3	RHEL4	File to Configure
	---------------------------------------------------------------
	Basic Dump Facility	 X	 X	/etc/sysconfig/diskdump
	Partial Dump		 	 X	/etc/modprobe.conf
	Swap Partiton Support	 X	 X	/etc/sysconfig/diskdump
	Compressed Dump			 X	/etc/modprobe.conf
	Preserved Dump		 X	 X	/etc/sysconfig/diskdump
	Deferred Savecore		 X	/etc/sysconfig/diskdump
	Message Complement		 X	/etc/sysconfig/diskdump
	diskdump-nospace script  X	 X	/var/crash/scripts
	diskdump-success script  X	 X	/var/crash/scripts
	Multiple Dump Devices		 X	/etc/sysconfig/diskdump
	Dump Filtering			 X


Supported Drivers
-----------------
Diskdump is only supported with the following storage adapters:

	RHEL3			RHEL4
	------------------------------------
	aic7xxx			aic7xxx
	aic79xx			aic79xx
	dpt_i2o
				ipr
	megaraid2		megaraid
	mptfusion		mptfusion
	sym53c8xx		sym53c8xx
	sata_promise		sata_promise
	ata_piix		ata_piix
	CCISS			CCISS
	megaraid_sas		megaraid_sas
				IDE
	qla2xxx			qla2xxx
				lpfc
				stex
				ips
				ibmvscsi
				sata_nv
				aacraid

Supported Kernels
-----------------
diskdump is supported in the following Red Hat kernels, where <kernel-version>
is the version containing this diskdumputils package:

	RHEL3
		kernel*-<kernel-version>.i686.rpm
		kernel*-<kernel-version>.athlon.rpm
		kernel*-<kernel-version>.ia64.rpm
		kernel*-<kernel-version>.x86_64.rpm
		kernel*-<kernel-version>.ia32e.rpm
	RHEL4
		kernel*-<kernel-version>.i686.rpm
		kernel*-<kernel-version>.ia64.rpm
		kernel*-<kernel-version>.x86_64.rpm
		kernel*-<kernel-version>.ppc64.rpm

Setup
-----
1. Dump Device Selection

The first step in the configuration process is to designate a disk device
to dump memory to in the event of a system crash.  The dump device may be
any of the following:
 - a full disk device - RHEL3 only - (e.g. /dev/sda)
 - a partition of a disk device (e.g. /dev/sda4)
 - a swap partition (e.g. /dev/sda2)
However diskdump accesses a disk driver directly, so it cannot dump memory on
block devices that are configured with logical volume or device mapper.  That
means the block devices like them cannot be used as dump devices for diskdump.

As for usual operations, the size of dump device should be large enough to save
the whole dump.  However, it is possible that the size can be reduced with
certain diskdump features that are described later.  The dump size to be written
consists of the size of whole physical memory plus a header field.  To determine
the exact size required, refer to the output of /proc/diskdump after
the diskdump module is loaded:

 # modprobe diskdump

 # cat /proc/diskdump

 # sample_rate: 8
 # block_order: 2
 # fallback_on_err: 1
 # allow_risky_dumps: 1
 # total_blocks: 262042
 #

The total block size is shown by page-size units, so in this
example, the selected device must contain at least (262042 * 4096) bytes
on an i386 machine.

Note: during a diskdump operation, memory contents residing on the swap
partition are not preserved.  Therefore the dump partition size corresponds
to physical memory; rather than physical memory plus the size of the swap
partition.

Next, based on the information above, consider which devices you select
as a dump device.  To do that, follow the instructions below.

Edit /etc/sysconfig/diskdump appropriately in the following format to register
a dump device:

 ----------------
 DEVICE=/dev/sde1
 ----------------

Multiple dump devices can be registered in a colon-separated format like:

 -------------------------
 DEVICE=/dev/sda2:/dev/sdb
 -------------------------

Diskdump attempts to dump memory to the leftmost usable device/partition.
The benefit of designating more than one dump device is redundancy.
For example, if each dump device was controlled by a different driver, even
if a system panic occurred in a driver that controls one of the registered
devices, the memory could be dumped out using the other registered device.
Moreover, as for RHEL4 only, if the device/partition becomes disabled due to
some error such as an I/O error or a format error while dumping memory, then
diskdump abandons the using dump device and finds another sane device in
the registered devices.  If a sane device is found, then diskdump starts over
dumping memory with the device.  Diskdump repeats finding a sane device and
dumping memory with the device until diskdump finishes dumping memory or
no more sane devices are found.  After that, if there is at least one device
that contains complete dump data, then diskdump service deletes incomplete dump
data from other device(s) later when diskdump service starts.  Otherwise,
device that contains the largest incomplete dump data in the registered devices
is saved as vmcore-incomplete, or the dump data could be left in the dump
device by specifying SKIPSAVECORE/PRESERVEDUMP described later.
Each dump device is required to be sufficiently large to store the full dump.

If you configure dumping to the swap partition, it is required that /var must
be mounted locally; for reasons described in the remainder of this paragraph.
In the event of a system crash, the memory contents are saved to the swap
partition.  When the system reboots, the diskdumputils commands are run to
convert the saved memory in the swap partition, and move it to a directory
under /var/crash/.  This memory saving operation is run in the boot sequence
prior to both enabling swap and mounting remote filesystems.  If /var was
mounted remotely, the diskdump service would fail because remote file systems
are usually mounted later than the swap initialization in rc.sysinit.

Diskdump can dump memory to only one dump partition.  For example, even if you
configure four 2 GB swap partitions, diskdump cannot dump 8 GB memory to their
partitions.  Consequently, for success of memory dumping, you must configure
a swap partition whose size is greater than the system memory size.  That is,
if the system memory size is 8 GB, then you must configure a more than 8 GB
swap partition so that diskdump can dump memory completely.


2. Dump Device Formatting

The second step in the configuration process involves formatting the dump
device.
Any dump device needs to be specially formatted for diskdump before being
used.  Accordingly, the designated dump partition cannot be used to create
a conventional filesystem on it.

The dump device formatting needs to be done once by the system administrator.

 # service diskdump initialformat

The above command cannot format any swap partitions or any mounted devices.
The message is shown as below if the above command formats the devices like
them.

 # service diskdump initialformat
 Formatting dump device:
 /dev/sda2: skipped (swap device)
 /dev/sda3: skipped (mounted device)

When you see the above message but want to format them as dump-dedicated
devices, follow the instructions below. For devices that will be used both
as swap devices and dump devices, only a standard swap header should be
written to the device using mkswap.

First, disable the swap partition with swapoff, or unmount the partition
with umount.  Then, run the command below.

 # diskdumpfmt -f <device>

Note that the above command forcibly formats the specified device without
asking you whether you want to format it.  You must pay attention to
the device you will specify because the device cannot be restored after it is
formatted.  It is never recommended that you run the above command
(i.e. diskdumpfmt -f) for the other purpose than the above-mentioned because
it is possible that you corrupt valuable data by mistake.

The device that has been already formatted as a dump device can be reformatted
with not subcommand "initialformat" but "format".  That is, the command below
is good enough.

 # service diskdump format

Device formatting with subcommand "initialformat" or "format" is just a quick
format.  If you want to format all of the registered devices/partitions fully,
you need to run the command below.

 # service diskdump regularformat

The speed of device formatting depends on its device size, but a full format
with subcommand "regularformat" usually takes much longer time than a quick
format.

3. Enable Diskdump Service

Lastly, start the diskdump service:

 # chkconfig diskdump on
 # service diskdump start

If diskdump startup succeeds, either [OK] or [WARNING] is shown.  However
if diskdump is already running, diskdump service just indicates that condition
instead of showing [OK]/[WARNING].  In this case, it is possible that
the configurations you changed are unavailable in the current system.  To make
them available, you can use subcommand "restart" instead of "start" in any
case.  Just run the command below.

 # service diskdump restart

Alternatively, the commands below would also work as well.

 # service diskdump stop
 # service diskdump start

[WARNING] means some of the registered devices are inappropriate, but at least
one device is available.  If diskdump startup failed, [FAIL] is just shown.
[FAIL] means diskdump startup fails because the registered devices are
inappropriate.  See /var/log/messages if you need further information.

The registered device/partition can be referred through /proc/diskdump
interface.

 # cat /proc/diskdump
 /dev/sde1 514080 1012095

The first value means start sector of the registered device.  The second value
means the number of sectors.
If the registered dump device needs to be replaced, edit
/etc/sysconfig/diskdump.  Format the new dump device as described above.
Then restart the diskdump service.  To restart the service, run the command
below.

 # service diskdump restart

To check on whether diskdump service is enabled, run the command below.

 # service diskdump status

Just in case, run the command above after start/restart.

To check the status of the registered device(s), run the command below.

 # service diskdump devicestatus

The command above is helpful to check on whether there are any inappropriate
devices when multiple devices are registered in /etc/sysconfig/diskdump.

4. Additional Options

Additional options to enable features such as Preserved Dump, Deferred Savecore,
and Message Complement are available in diskdump service.  The following options
can be set in /etc/sysconfig/diskdump.
 - PRESERVEDUMP		(may be used with other options)
 - SKIPSAVECORE		(may be used with other options)
 - EXPIRATION		(an auxiliary option for only SKIPSAVECORE)
 - MAILTO		(an auxiliary option)
 - FROM			(an auxiliary option)
 - SALVAGEMESSAGE	(an independent option for only Message Complement)
 - INITFMTSILENT	(an independent option for initialformat)

To enable Preserved Dump feature, edit /etc/sysconfig/diskdump appropriately in
the following format if you want to preserve dump data:

 ----------------
 PRESERVEDUMP=yes
 ----------------

If it is specified as above, then diskdump service won't format the device
only when saving panic dump failed due to some error.  Consequently, you need
to investigate the cause which it failed and resolve the problem, and then must
run:

 # service diskdump restart

If saving panic dump fails again with the command above, then you must run:

 # service diskdump enabledevice

so that diskdump service can restart.  The command above enables the registered
device again (i.e. the device can be used as the diskdump-dedicated device).
Note:
The device would be formatted by the command above, enabledevice,
even if saving panic dump fails (i.e. the dump data would be lost).  If a single
device is registered, then diskdump is not ready to dump memory until
the command above is finished.
It is highly recommended that two or more devices are registered if this
parameter is effective.
In RHEL4, the dump data is not preserved in the dump device even if
PRESERVEDUMP is specified, if the dump device size is not large enough to hold
the dump data.  The file created by diskdump is named vmcore-incomplete,
and most debugger subcommands should be usable against this file.  In this case
"service diskdump restart" above succeeds without saving panic dump.

To enable Deferred Savecore feature, edit /etc/sysconfig/diskdump appropriately
in the following format if you want to skip saving panic dump so that you can
save the dump whenever you want:

 ----------------
 SKIPSAVECORE=yes
 ----------------

If it is specified as above, then diskdump service won't save panic dump and
format the device even if the device has panic dump.  Consequently, you must
run:

 # service diskdump enabledevice

as well as the case of PRESERVEDUMP.
It is highly recommended that two or more devices are registered if this
parameter is effective.

You can set the expiration to dump data as the following format in
/etc/sysconfig/diskdump.

 ------------
 EXPIRATION=7
 ------------

If it is specified as the above, then the dump device is formatted by diskdump
service when 7 days have passed since the memory was dumped.  If it is specified
as 0 or none, then the dump data has no expiration.  Consequently, you must
run:

 # service diskdump enabledevice

as well as the case of SKIPSAVECORE.
Of course, a negative value is invalid for EXPIRATION, so diskdump service
won't start.
Note that EXPIRATION is valid only when SKIPSAVECORE is effective.

You can email someone to notify something as the following format in
/etc/sysconfig/diskdump.

 ---------------------------------------
 MAILTO="admin@foo.com, tech@ml.foo.com"
 FROM="root@serv1.foo.com"
 ---------------------------------------

If you want to notify someone of such that the memory dump still remains
in the registered device, set both MAILTO and FROM in /etc/sysconfig/diskdump
appropriately.  Both parameters should be filled with appropriate email address.
MAILTO should be specified at least one email address if you enable it.  If you
specify two or more email addresses to MAILTO, you need to separate them by
commas.
With all parameters specified appropriately, an email that is created by
a template would be sent by diskdump service automatically when saving panic
dump was skipped because of PRESERVEDUMP and/or SKIPSAVECORE.  The template is
located in /etc/diskdump/mail_template.us.  You can edit the template as you
wish.
Note that both parameters are valid only when PRESERVEDUMP and/or SKIPSAVECORE
are/is effective.

To disable Message Complement feature edit /etc/sysconfig/diskdump appropriately
in the following format:

 -----------------
 SALVAGEMESSAGE=no
 -----------------

This feature retrieves kernel dump messages from log_buf area of the saved
crash dump file and appends these messages to the end of /var/log/messages.
In almost case, the messages are the Oops, but the last kernel messages which
were not written in /var/log/messages also may be appended.  This feature only
works if just after rebooting from system panic and before syslogd and klogd
are invoked.  So the appended message will not disturb the time sequence of
messages.

If INITFMTSILENT is specified as yes, diskdump service won't ask you if you
want to format the registered device when initialformat is run.

 -----------------
 INITFMTSILENT=yes
 -----------------

5. Disable Diskdump Service

To disable the diskdump service, run the command below.

 # service diskdump stop

Neither [OK] nor [FAIL] is shown in this case.  To check on whether the diskdump
service is disabled, run the command below.

 # service diskdump status

If you see a message "diskdump not enabled", diskdump is disabled.


Testing diskdump
----------------
To test the diskdump functionality, press [Alt] + [SysRq] + [C] or run
the command below.

 # echo c > /proc/sysrq-trigger

After completing the dump, a dump file named vmcore will be created during
the next reboot sequence, and be saved in a directory with a name of
the following format:

  /var/crash/127.0.0.1-<date>

The vmcore's file format is same as one created by the netdump facility, so
you can analyze it with crash(8) command.  However the file format of
diskdump's vmcore is different from netdump's if the vmcore file is created
with Compression Dump described later.  If so, the vmcore like that needs
crash-4.0-2.15 or later so that it can be analyzed.
If the dump data in the dump device is incomplete or saving panic dump fails,
the dump data is saved as the incomplete dump file named vmcore-incomplete.
The vmcore-incomplete's file format is same as vmcore.

Note
----
Once you set up, it is not necessary to do anything after that.
After the initial configuration process there are no additional steps
required.  Be sure to keep the designated dump partition to be sufficiently
large.  If there is not enough space, the dump file will be partially saved;
resulting in the vmcore-incomplete.

The system normally halts just after completing the dump.  At that time, you
can only reset the machine to reboot the system.  To have the system reboot
automatically after that, edit /etc/sysctl.conf appropriately in the following
format.

 ------------------
 kernel.panic = 180
 ------------------

And then run the command below to enable the above configuration.

 # sysctl -p

Alternatively, just run the command below.

 # echo 180 > /proc/sys/kernel/panic

Both of the above setting indicate the system reboots in 180 seconds after
completing the dump.  However the latter is just a temporary setting until next
reboot.  Of course, any value but 180 can be specified as the time until reboot.

Diskdump currently contains two customizable script files named
diskdump-nospace and diskdump-success, which are both located in
/usr/share/doc/diskdumputils-<version>/example_scripts initially, and should
be placed in /var/crash/scripts if those need to be run.
diskdump-nospace is run prior to the creation of the vmcore file only if there
is not enough space to hold a complete dump file in /var/crash.  This may be
customized to clean up enough space for the dump in question to proceed.
diskdump-success is run after the vmcore file is created.  This could be used
to email the system administrator of the machine, gzip the vmcore, and so on.
The default of both script files simply send a mail message, and exit with
a zero value.


Tunable Parameters
------------------
Tunable parameters can be set up with /etc/module.conf as for RHEL3, or with
/etc/modprobe.conf as for RHEL4.  The diskdump module has following module
parameters:

block_order:	Specifies the dump-time I/O block size.  Default value is 2,
		which sets the I/O block size equal to "page-size << 2", or
		16 kbytes on an i386 machine.  Larger values may make for
		better performance, but occupies more module memory.

sample_rate:	Determine how many blocks in the dump partition are verified
		before actual memory dumping begins.  Default value is 8,
		which means one of every "1<<8" (256) blocks are verified.
		Specifying zero means all blocks in the partition are verified,
		and a negative value disables verification.

dump_level:	A memory collection level that specifies which memory pages
		will be dumped.  Default value of 0 dumps all pages of
		physical RAM into the vmcore file.  To avoid excessively
		large vmcore files, page cache pages, zero-filled pages,
		free pages, and user application pages may be eliminated
		from the file.  Specifying one of the dump_level values
		will skip one or more memory page type(s) if that page type
		is marked with an X in the following table:

		dump	cache	cache	zero	free	user	description
		level	page	private	page	page	page
		---------------------------------------------------------
		  0		 				default
		  1	 X	 X
		  2		 	 X
		  3	 X	 X	 X
		  4		 		 X
		  5	 X	 X		 X
		  6	 	 	 X	 X
		  7	 X	 X	 X	 X
		  8		 			 X
		  9	 X	 X			 X
		 10		 	 X		 X
		 11	 X	 X	 X		 X
		 12		 		 X	 X
		 13	 X	 X		 X	 X
		 14		 	 X	 X	 X
		 15	 X	 X	 X	 X	 X	minimum dump
		 17	 X
		 19	 X	 	 X			recommended
		 21	 X	 		 X
		 23	 X	 	 X	 X
		 25	 X	 			 X
		 27	 X	 	 X		 X
		 29	 X	 		 X	 X
		 31	 X	 	 X	 X	 X

		The above table is also referred for Dump Filtering feature
		(dumpfilter command) described later.

compress:	Specify whether compression for dump data is enabled or not.
		The default value is 0, which means that memory pages will be
		dumped without compression.  If the value is 1, memory pages
		will be compressed using the deflation algorithm, GZIP.

halt_on_err:	Specify whether the system restarts after diskdump failure or
		not.  This feature is only for RHEL4.  The default value is 0,
		which means that the system restarts or halts (depend on the value of
		/proc/sys/kernel/panic) after diskdump detects some errors to abort.
		If the value is 1, the system halts after diskdump aborts.  It is
		useful for investigating the cause of a system panic when a serial
		console is not enabled and the only record of the system panic is last
		messages in the VGA screen.  Note that in last messages only Call
		Trace is perhaps information that relates to the cause of the panic.
		And note that if the panic is caused by pushing an NMI button,
		last messages in the screen may not show information that
		relates to the cause of the panic.

Partial Dump feature, RHEL4 only, provides a memory collection level that can
select the amount of physical memory that is dumped.  All of physical memory is
usually not required to investigate a kernel issue.  Most of physical memory
typically contains user application data, page cache memory (file data), free
memory pages, and zero-filled pages.  By skipping one of more of those page
types when creating the vmcore file, the crash dump will be significantly
smaller, and the dump procedure less time-consuming.  While the actual vmcore
file size may vary because of the status of system and the dump_level
specified, the minimum amount of data required to analyze the dump will always
be captured.  However, since there may be circumstances where it will be
necessary to capture all of physical memory, it is not recommended that a dump
partition size be less than the actual amount of physical memory.
Note that the partial dump feature has some risks.  There are memory management
lists which are scanned for a page's memory attribute, so if the list has been
corrupted, the scanning process may fail.  For example, when specifying a
dump_level from 4-7 or from 12-15, the kernel's free page linked lists are
scanned; if the list is corrupt, diskdump may hang.  Furthermore, it is
possible that a page type that has been skipped may be necessary to fully
investigate the cause of some issues.  Therefore, a memory collection level
should be selected to suit each situation.  The recommended level is 19,
because it is easiest to determine whether a page is zero-filled or if it is a
page cache page, and because no page lists need to be traversed.

The benefit of Compressed Dump feature, RHEL4 only, is that the size of dump
device can be reduced.  The dump device to be used has to be larger than memory
size unless compression is enabled.  If there are no devices enough to dump,
then the device that its size is smaller than the system memory size can be used
by enabling compression.  Memory pages are compressed and dumped to the dump
device, and a dump file is created without decompression.  Therefore,
consumption of /var/crash can be also saved.  As another benefit, time both to
dump memory and to create a dump file could be reduced.  Note that compression
rate cannot be known in advance and that small dump device may become
insufficient even if compression is enabled.  To enable Dump Filtering feature
described later, this option to compress dump data is required.

Example:

The following option sets I/O block size to 32 kbytes, and verification is
done on every block in the partition.  Also, cache page and zero page are
skipped by partial dump feature.

	options diskdump block_order=3 sample_rate=0 dump_level=19 compress=1

Note that the diskdump service always needs to restart after setting up
as above.  Then, run the command below in order to check on whether those
parameters were set up correctly.

 # cat /proc/diskdump

 # sample_rate: 0
 # block_order: 3
 # fallback_on_err: 1
 # halt_on_err: 0
 # allow_risky_dumps: 1
 # dump_level: 19
 # compress: 1
 # total_blocks: 262042
 #
 sde1 514080 1012095


Other Utility Program
---------------------
As other utility program, there is a dumpfilter command which provides
Dump Filtering feature.  This feature is almost same as the Partial Dump feature
mentioned in the previous section.  This command is run not by diskdump service
but on command line.  This command eliminates some pages that are not usually
required for fault analysis from the original full dump so that it can create
a smaller dump file with the full dump, but cannot eliminate free pages from
the full dump.  This point is the only difference between the Dump Filtering
feature and the Partial Dump feature.  For example, as for dumpfilter,
a dump file created with dump_level 17 is identical with one created with
dump_level 21.  This is known by the table of the dump_level described in
the previous section.  The dump file which is created by this command should be
analyzed by the crash(8) utility as well as the original vmcore file.

Additional Notes:
1. While using diskdump on a machine with RSA II card installed on it, 
   it is always safe to set the 'O/S watchdog' and 'NMI reset delay' 
   timers under ASM Control->System Settings->Server Timeouts section
   of RSA II to 'Disabled'. Otherwise the dump process may get 
   interrupted leaving behind a vmcore-incomplete file on the system.
