Critical error 8047

critical error 8047

Error. E1. Database corrupted. E2. Flash corrupted. E3. LON configuration error. E4. One or more task not started. E5. Cold start was forced. PES HAS A CRITICAL ERROR IN MODULE module_name: FUNCTION= function_code, RC= nn BSN8047I EXISTING UTILITY HISTORY DATA DID NOT CONTAIN INFORMATION OF. You should analyze the error and retry launching Artifactory. Error is: Converter can't run since [ERROR] Router grpc port (8047) is blocked by firewall.

Critical error 8047 - fantastic way!

How to Troubleshoot Artifactory 7.x Upgrade Issues

David Pinhas
2022-08-31 12:30

When upgrading Artifactory 6.x to 7.x versions, you may encounter some problems. The following are some of the most encountered issues and how to resolve them.

Issue #1: No valid installed license found
Error:Resolution: This error usually means that the Artifactory license was missing during the Artifactory boot process. Therefore, you’ll need to verify that the artifactory.lic file (or, if yours is an HA setup, the artifactory.cluster.license file) exists under the $JFROG_HOME/artifactory/var/etc/artifactory/ directory. If it does not exist, you’ll need to create it manually and paste the license key inside the artifactory.lic or artifactory.cluster.license file.

Issue #2: Failure to resolve the join key
Error:
Resolution: This error occurs due to the join.key file being missing when the Artifactory service is starting up. Take a look at our wiki page on Managing Keys HERE. Also, check the ~/opt/artifactory/artifactory_home/var/etc/security/ path to verify whether the join.key file exists or not.

Issue #3: Master key mismatch
Error:
Resolution: This behavior occurs when the provided master.key is a mismatch with the existing master.key in the database. If the original master.key is present in the backup folder, you can copy it to $ARTIFACTORY_HOME/var/etc/security and restart the Artifactory service. If the master.key is not in the backup folder, you’ll need to regenerate the master.key by deleting old entries from the database, as well as from the file system. To accomplish this, use the following deletion queries (but, before you do, we highly recommend making a backup of the database):

Step 1: Backup existing entries using the following queries:

  • SELECT * FROM access_configs where DATA like 'JE%'
  • SELECT * FROM access_users_custom_data where PROP_VALUE like 'JE%'
  • SELECT * FROM ACCESS_USERS_CUSTOM_DATA where PROP_KEY LIKE '%_shash'
  • SELECT * FROM access_master_key_status
  • SELECT * FROM configs where DATA like 'JE%'
  • SELECT * FROM master_key_status

Step 2: Delete the entries from the corresponding access tables, as follows:

  • delete from access_configs where data LIKE 'JE%'
  • delete from access_users_custom_data where PROP_VALUE LIKE ‘JE%’
  • delete from access_master_key_status where status = 'on'
  • delete from master_key_status where status = 'on'
  • delete from CONFIGS where data LIKE 'JE%'

Once you have successfully deleted the master.key entry, you’ll need to remove the master.key file from the $JFROG_HOME/var/etc/security directory. Thereafter, regenerate the master key using the following command:Place this in the $JFROG_HOME/var/etc/security directory, renaming it as master.key. Additionally, as the databasepassword is encrypted by the master.key file, you’ll need to go to the $ARTIFACTORY_HOME/var/etc/system.yaml file and manually change the password from encrypted to plain text. Afterwards, restart Artifactory for these changes to take effect.

Issue #4: Local server is running as PRO/OSS
Error
Resolution: This behavior usually occurs when upgrading an HAnode as a standalone Artifactory instance. You can identify your Artifactory type (HA or standalone) by going through your Artifactory startup output and see whether it’s listed as ArtifactoryPro or ArtifactoryHA. To resolve this issue, you’ll need to upgrade your Artifactory node by following the steps that are detailed HERE.

Issue #5: Relation "node_event_cursor" does not exist
Error
Should you continue to encounter errors regarding missing tables, please contact JFrog Support.

Issue #6: “not permitted for a read-only connection, user or database” error
Error:
Resolution: To overcome errors such as this one, you’ll need to remove the db.lock file from the $JFROG_HOME/var/data/artifactory/derby path. Afterwards, restart the Artifactory service for your changes to take effect. Should you continue to encounter errors, please make sure your ownership/permissions are valid for this folder.

Issue #7: Oracle database support libraries are missing
Error
Resolution: This behavior usually occurs when the LD_LIBRARY_PATH libraries path hasn’t been configured, as recommended HERE, with the Oracle Instant Client library.

Issue #8: JDBC driver is missing from the required location
Error

Resolution: This behavior occurs due to the JDBC driver being missing from the required location. Artifactory 7.x is compatible with Java 11 and JDK comes bundled into the application. Accordingly, upon startup, the JDBC driver will be copied from the $JFROG_HOME/artifactory/var/bootstrap/artifactory/tomcat/lib folder to the $JFROG_HOME/artifactory/app/artifactory/tomcat/lib folder.

Whenever you execute an upgrade of Artifactory to 7.x, the following error may be encountered and the Artifactory service will not start:

This error means that the JDBC driver for the external database being used is incompatible with Java 11. To overcome this problem, change the JDBC driver to the compatible one, which can be found in the $JFROG_HOME/artifactory/var/bootstrap/artifactory/tomcat/lib folder. Thereafter, restart Artifactory for your changes to take effect.

Should you encounter the same error after making the change to the compatible JDBC driver, navigate to $JFROG_HOME/artifactory/app/artifactory/tomcat/lib, remove all JDBC drivers (both old and new), and restart Artifactory.

Issue #9: Ports are blocked by firewall
Error:

Resolution: This error indicates that Artifactory ports are all blocked by firewall rules. Resolving this issue can be done by allowing these ports in your firewall. A full list of ports Artifactory uses can be found on our System Resources wiki page.

Should you encounter any issues after performing all of these steps, please contact JFrog Support for further assistance.

Issue #10: When upgrading from Artifactory v6 to Artifactory v7 on a Windows environment, we usually encounter the following error:

The reason why the issue occurred:

The following lines were added to system.yaml after the upgrade (those settings were migrated from version 6's server.xml):

This is the result of parsing the above in server.xml:

Solution:

You can resolve the issue by removing the following lines from system.yaml (located at $JFROG_HOME/artifactory/var/etc/system.yaml).

Thus, it forces Artifactory to use default values for relaxedPathChars and relaxedQueryChars parameters. You may inspect the full list of parameters and their default values on our System YAML wiki page. 

fls

Reed-Solomon Codes in Practical Implementations

Applications, Illegal Operations, and Everything Else

Low-level control over equipment requires extreme care and caution. Even the smallest error can result in the Blue Screen of Death (BSOD) or the abnormal termination of one or more applications. Driver developers and combat engineers have very much in common ” neither of these professions is particularly forgiving of carelessness. ASPI and SPTI interfaces, despite their high-level wrappers, are equally aggressive . They can freeze the system or shut it down with or without pretext. It takes a long time to master the skill of writing stable and simple code. Until that level has been reached, the only guarantee of survival is the skill of recovering the system after critical errors and various kinds of malfunctions.

Different operating systems react to critical errors differently. For example, Windows NT reserves two regions of its address space for detecting stray pointers. One of them is located at the very bottom of the memory map and is intended for the trapping of zero pointers. Another is located between the heap and the memory area allocated for the operating system itself. It controls events that involve crossing the limits of the memory area allocated to user processes. Contrary to common opinion, it is in no way related to the WriteProcessMemory function (see MSDN article Q92764). Both regions take 64 K each, and any attempt of accessing them is interpreted by the system as a critical error. In Windows 9x, there is only one 4 K region for tracing stray pointers. Therefore, this system has significantly weaker controlling capabilities than Windows NT.

In Windows NT, the critical error screen (Fig. 3.1) contains the following information:

  • The address of machine instruction that has caused the current exception

  • A brief description of the exception category (or its code, if category is unknown)

  • The exception parameters (address of invalid memory cell , type of operation, etc.)

image from book
Fig. 3.1: Critical error message displayed by Windows 2000

Operating systems of the Windows 9x family are considerably more informative in this respect (see Fig. 3.2). Besides the exception category, they display the contents of CPU registers, stack condition and memory bytes located by the address CS:EIP (e.g., by the current execution address). However, the existence of the Doctor Watson tool, which will be described later in this chapter, diminishes this difference between the two families of operating systems. Therefore, in this case we can only point out that Windows 9x is more user-friendly and ergonomic, since it immediately provides the required minimum of error information, while in Windows NT error reports are created by a separate utility.

image from book
Fig. 3.2: Critical error message displayed by Windows 98

If no additional debugger has been installed in the system, then the critical error message window has only one button ” OK . After the user clicks this button, the application that carried out the illegal operation will be terminated . If you wish, it is possible to add the Cancel button to this window. Clicking on this button will start the debugger or any other utility intended for analyzing the situation. It is important to understand that clicking the Cancel button doesn t cancel automatic termination of the incorrect application. However, having mastered some skills, you can close the breach manually and continue working in a normal way.

Start the Registry Editor application and go to the following registry key: HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug . If there is no such key, just create it. The Debugger value specifies the path to the debugger with all of the required command-line options; Auto string parameter determines whether the debugger must start automatically (the value must be set to 1) or provide the user with a choice (0). Finally, the DWORD parameter UserDebuggerHotKey specifies the scancode for the hotkey for starting the debugger.

Doctor Watson

The Doctor Watson tool is the standard built-in debugger for critical errors that is included with all operating systems of the Windows family. Principally, it is a static tool for collecting all relevant information. Although Doctor Watson provides a detailed report on the causes of a failure, it lacks the active functions that would allow it to influence incorrectly operating programs. Thus, having only Doctor Watson at your disposal, you won t be able to make the application that has caused an error continue operating as if nothing has happened . To achieve this, you ll have to use interactive debuggers . The Microsoft Visual Studio Debugger, supplied as part of the Microsoft Visual Studio, is one of such tools. It will be considered later in this chapter.

That Doctor Watson is preferable for use on workstations, while interactive debuggers are the best for servers is a widely held opinion. Those who hold this view generally think that end users cannot understand all of the mysteries of the assembler, while interactive debuggers are the tools of choice on servers. This opinion is partially true. However, it isn't wise to ignore the point that not every cause of an error can be detected by static analysis tools. Furthermore, interactive tools simplify the procedure of analysis considerably. On the other hand, Doctor Watson is included with the operating system, while all other tools must be purchased separately. Therefore, it is up to you to choose the preferred debugger for handling critical errors.

To specify Doctor Watson as your default debugger, add the following entry to the system registry or issue the Drwtsn32.exe -i command (to carry out any of these operations, you must have administrative privileges):

Listing 3.1: Installing Doctor Watson as the default debugger
image from book
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug] "Auto"="1" "Debugger"="drwtsn32 -p %ld -e %ld -g" "UserDebuggerHotKey"=dword:00000000
image from book
 

Now the occurrence of any critical error will be followed by the generation of a report composed by Doctor Watson and containing a more or less detailed explanation on the error type and what has caused it.

image from book
Fig. 3.3: Reaction of Doctor Watson to a critical error

An example of a report created by Doctor Watson is provided below. Comments are added by the author; the report s lines are in bold.

Listing 3.2: An example of report produced by Doctor Watson (with the author's comments in bold).
image from book
Exception in application: App: (pid=612) ; pid of the process where the exception took place Time: 14.11.2003 @ 22:51:40.674 ; Time when the exception took place Number: c0000005 (access rights violation) ; Code of the Exception category ; Code decoding can be found in WINNT.H ; included with SDK, supplied with any Windows compiler ; A detailed description of all exceptions can be found ; in supplementary documentation ; to all Intel and AMD processor, distributed freely ; by the respective manufacturers ; (Attention: To change the OS exception code to the CPU interrupt vector, ; you must reset the most significant word to zero.) ; In this case, this is 0x5 an attempt to access ; an invalid memory address. *----> System information <----* Computer name: KPNC User name: Kris Kaspersky Number of processors: 1 Processor type: x86 Family 6 Model 8 Stepping 6 Windows version: 2000: 5.0 Current build: 2195 Service pack: None Current type: Uniprocessor Free Registered organization: Registered user: Kris Kaspersky ; Brief info on the system *----> Task list <----* 0 Idle.exe 8 System.exe 232 smss.exe ... 1244 os2srv.exe 1164 os2ss.exe 1284 windbg.exe 1180 MSDEV.exe 1312 cmd.exe 612 test.exe 1404 drwtsn32.exe 0 _Total.exe (00400000 - 00406000) (77F80000 - 77FFA000) (77E80000 - 77F37000) ; List of loaded DLLs ; According to documentation, the names of appropriate modules ; must be listed to the right of the addresses. They are ; masked so well, however, that they became practically invisible. ; Still, it is possible to extract their names from the log file. ; But this can't be done without the use of a few tricks (see character table below), Memory copy for flow 0x188 ; Provided below is a copy of the memory flow that has caused an exception. eax=00000064 ebx=7ffdf000 ecx=00000000 edx=00000064 esi=00000000 edi=00000000 eip=00401014 esp=0012ff70 ebp=0012ffc0 iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000202 ; Contents of registers and flags Function: <nosymbols> ; Printout of the failure environment 00400ffc 0000 add [eax],al ds:00000064=?? ; Writing the value into the cell that adds AL value to EAX ; The value of the cell address computed by Doctor Watson is equal to 64h, ; which, obviously, doesn't correspond to reality; ; Doctor Watson substitutes the value of the EAX register ; for the moment of failure into the expression ; and this value is different from the one ; that this register had at the moment of execution! ; Unfortunately, neither we nor Doctor Watson ; know the run-time value of the EAX register. 00400ffe 0000 add [eax], al ds:00000064=?? ; Writing the AL value of the cell referenced by EAX ; What? again? what a pain?! Actually, ; it is the sequence 00 00 00 00 that is encoded this way. ; For all appearances, this sequence is a piece ; of some machine command incorrectly interpreted ; by the disassembling engine of Doctor Watson. 00401000 8b542408 mov edx, [esp+0x8] ss:00f8d547=???????? ; Loading function argument into EDX ; It is impossible to tell for certain which argument we should load, ; since we do not know the address ; of the stack frame. 00401004 33c9 xor ecx, ecx ; Resetting ECX to zero 00401006 85d2 test edx, edx 00401008 7e18 jle 00409b22 ; If EDX == 0, jumping to the 409B22h address 0040100a 8b442408 mov eax, [esp+0x8] ss:00f8d547=???????? ; Loading the above-mentioned argument into EAX 0040100e 56 push esi ; Saving ESI in the stack, thus moving the stack top pointer ; up by 4 bytes (into the area of lower addresses) 0040100f 8b742408 mov esi, [esp+0x8] ss:00f8d547=???????? ; Loading the next argument into ESI ; Since ESP has just been changed, this isn't the argument ; with which we were dealing before. 00401013 57 push edi ; Saving the EDI register in the stack FAILURE -> 00401014 0fbe3c31 movsx edi, byte ptr [ecx+esi] ds:00000000=?? ; Well, we've got the instruction that has caused the access violation. ; it accesses the cell referenced by the sum of the ECX and ESI registers. ; What are their values? scroll the screen upwards slightly and find out that ; ECX and ESI are equal to 0, a fact about which ; Doctor Watson informs us: "ds:000000" ; Note that this information can be trusted, since substitution ; of the effective address was carried out at run time. ; Now, let us recall that ESI contains ; the copy of the argument passed to the function ; and that ECX was explicitly reset to zero. Consequently, ; in the [ECX+ESI] expression, ; the ESI register is the pointer, and ECX is the index. ; Since ESI is equal to zero, this means that our function ; passed the pointer to unallocated memory area. ; This usually happens ; either because of an algorithmic error in a program ; or because the virtual memory has been exhausted. ; Unfortunately, Doctor Watson doesn't disassemble ; the parent function, and we have to guess, which of the ; two ossible variants is true. ; Although, it is possible to disassemble the memory dump ; of the process (provided, of course, that it has been saved), ; this isn't what we actually need... 00401018 03c7 add eax, edi ; Add the contents of the EAX register ; to the EDI register and write the result to EAX. 0040101a 41 inc ecx ; increase ECX by one 0040101b 3bca cmp ecx, edx 0040101d 7cf5 jl 00407014 ; Until ECX < EDX, jump to 407014 ; (obviously, we are dealing with a loop controlled by the ECX counter). ; In the case of interactive debugging, we could forcibly exit the function ; that is returning the error flag, informing us so that the parent function ; (and the entire program along with it) can continue execution. ; In this case, only the last operation would be lost, ; while all the other data will remain correct. 0040101f 5f pop edi 00401020 5e pop esi 00401021 c3 ret ; exiting the function *----> Backward tracing of the stack <----- ; Stack contents at the moment of failure ; prints addresses and parameters of previously executed functions. ; In the case of interactive debugging, we can simply pass control to one ; of the upper functions, which is equivalent to a return to the past. ; Only in reality is it impossible to fix smashed porcelain, ; in the computer universe, everything is possible! FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name ; FramePtr: points to the value of the stack frame, ; above (i.e., in smaller addresses) are the function arguments, ; below are its local variables. ; ; ReturnAd: stores the return address to the parent function. ; If this location contains garbage and back-tracing of the stack ; starts to make a characteristic noise, ; then it is highly likely ; that we are dealing with the stack overflow error ; or, possibly, that your computer is under attack. ; ; Param#: the first four parameters of the function ; this is the number of parameters ; that Doctor Watson displays on the screen. ; This is an overly stringent limitation, ; since most functions have dozens of parameters ; and the first four do not provide sufficient information. ; However, a missing parameter can be retrieved easily ; from the copy of the unprocessed stack manually. ; To do so, it is enough to go by the address specified in the ; FramePtr field ; ; Func Name: function name (if it is possible to detect it) . In fact, ; it displays only the names of functions imported from other DLLs, ; since it is impossible to find a commercial program ; compiled along with debug info. ; 0012FFC0 77E87903 00000000 00000000 7FFDF000 C0000005 !<nosymbols> 0012FFF0 00000000 00401040 00000000 000000C8 00000100 kernel32!SetUnhandledExceptionFilter ; Functions are listed in the order of their execution. ; The last one that was executed was the same ; kernel32!SetUnhandledExceptionFilter function that handles the current exception. *----> Copy of unprocessed stack <----* ; The copy of the unprocessed stack contains it "as is." ; It is very helpful when detecting buffer overfull attacks the entire shell-code ; passed by the intruder will be printed out by Doctor Watson, ; and you'll only have to detect it (for further details, ; see my book "Technique and philosophy of network attacks") 0012ff70 00 00 00 00 00 00 00 00 - 39 10 40 00 00 00 00 00 [email protected] ..... 0012ff80 64 00 00 00 f4 10 40 00 - 01 00 00 00 d0 0e 30 00 [email protected] ... 00130090 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 001300a0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 *----> Symbol table <----* ; The symbol table contains the names of all loaded DLLs, along with the names ; of imported functions. Using these addresses as the starting point, ; we can easily restore the list of loaded DLLs.... ntdll.dll 77F81106 00000000 ZwAccessCheckByType ... 77FCEFB0 00000000 fltused kernel32.dll 77E81765 0000003d IsDebuggerPresent ... 77EDBF7A 00000000 VerSetConditionMask ; ; Thus, let us return to the list of loaded DLLs. ; (00400000 - 00406000) - obviously, ; this is the memory area occupied by the program itself. ; (77F80000 - 77FFA000) - this is KERNEL32.DLL ; (77E80000 - 77F37000) - this is NTDDL.DLL
image from book
 

Microsoft Visual Studio Debugger

When you install the Microsoft Visual Studio programming environment, it registers its debugger as the default one for handling critical errors. Although this debugger is very easy to use, it has very limited functions, and doesn't even support such a simple operation as looking for a hex sequence in memory. Its only advantage in comparison to the most advanced (in every respect) option, Microsoft Kernel Debugger, is the ability to trace processes that have generated a critical exception.

In the hands of an experienced professional, Microsoft Visual Studio Debugger is capable of bringing wonders to reality, and one such wonder is making applications that have executed an illegal operation continue their work, even given that the operating system closes such applications abnormally without saving their data. Anyway, an interactive debugger (Microsoft Visual Studio Debugger is the one) provides much more detailed information on the failure and simplifies considerably the process of detecting its sources. Unfortunately, the limited space allowed in this chapter (even though it already contains a large amount of off topic information!) prevents the author from providing a detailed description of the entire methodic of debugging. Instead, I must limit myself to only a narrow range of the most interesting problems. For more details, see the section " Inhabitants of the Shadowy Zone, or From Morgue to Reanimation ).

In order to set Microsoft Visual Studio Debugger as the default debugger for critical errors manually, add the following entries to the system registry:

Listing 3.3: Specifying Microsoft Visual Studio Debugger as your default debugger for critical errors
image from book
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug] "Auto"="1" "Debugger"="\"C:\Prg Files\MS VS\Conmon\MSDev98\Bin\msdev.exe\" -p %ld -e %ld" "UserDebuggerHotKey"=dword:00000000
image from book
 
Listing 3.4: A demo example that causes a critical exception
image from book
// The function returns the sum of n char characters. // If it is passed the null-pointer, the function will "drop," // although itself isn't the source of error, rather, // the arguments passed to it, //by the parent function. test(char *buf, int n) { int a, sum; for (a = 0; a < n; a) sum += buf[a]; // Here, the exception is thrown. return sum; } main() { #define N 100 char *buf = 0; // Initializing the pointer to the buffer /* buf = malloc(100) ; */ // "Forgetting" to allocate the memory, // which is the error test (buf, N); // Passing the null-pointer to some function }
image from book
 

Inhabitants of the Shadowy Zone, or From Morgue to Reanimation

Would you like to know how to make an application continue normal operation after a critical error message has appeared? In fact, this is an important, and sometimes urgent task. Suppose that an application containing unique data that have not been saved yet has crashed. In the best case, you ll have to enter this information once again, while in the worst case, you have lost the data for good. There are some utilities on the market aimed exactly at solving this problem (Norton Utilities is a typical example). Unfortunately, however, their abilities are far from comprehensive, and, on average, they turn out to be effective in only one in ten occasions. At the same time, manual reanimation of a faulty program is successful in 75 to 90 per cent of all cases.

Strictly speaking, it is impossible to recover fully the functionality of a crashed program or to roll back all of the actions that preceded the crash. In the best case, you ll be able to save the data before the program totally loses control and starts to behave unpredictably. Even this achievement would have to be counted as a success!

There are at least three different methods of reanimation: a) forcibly exiting the function that has caused a critical exception; b) unwinding the stack and passing control back, c) passing control to the message handler function. Let us consider each of these methods in the example of the testt.exe application, a copy of which can be found on the companion CD.

Jumping ahead a few steps, note that only faults that are caused by algorithmic errors can be reanimated. Errors caused by hardware faults are irrecoverable. If information stored in RAM was corrupted because of a physical defect in the memory, you probably won t be able to recover the crashed application. If, however, the failure did not affect vitally important data structures, there is some hope for successful recovery even in this case.

Forcibly Exiting the Function

Start the test program, enter some text in one or more of the windows, then select the About TestCEdit command from the Help menu. When the dialog opens, click the Make error button. Oops! The program displays a critical error message. If we click OK , all unsaved data will be lost, which isn t what we planned. However, if a previously installed debugger is present in the system, we can still make some attempts at saving the data. For the purposes of being specific, let s suppose that we have Microsoft Visual Studio Debugger.

Click Cancel, and the debugger will immediately disassemble the function that caused the exception (see the listing provided below).

Listing 3.5: Microsoft Visual Studio Debugger has disassembled the function that has thrown an exception
image from book
0040135C push esi 0040135D mov esi, dword ptr [esp+8] 00401361 push edi 00401362 movsx edi, byte ptr [ecx+esi] 00401366 add eax, edi 00401368 inc ecx 00401369 cmp ecx, edx 0040136B jl 00401362 0040136D pop edi 0040136E pop esi 0040136F ret 8
image from book
 

Having analyzed the cause of the exception (the function has been passed the pointer to unallocated memory), we draw the conclusion that it is impossible to make the function continue execution, since we do not know the structure of the data passed to it. In such a case, we have to return forcibly to the parent function, without forgetting to set the error flag, which sends a signal to the program that the current operation has not been accomplished. Unfortunately, there are no commonly adopted error flags. Therefore, different functions use different agreements. To discover the situation in each specific case, we must disassemble the parent function and determine which error code it expects.

Place the cursor on the dump window and enter the name of the pointer to the stack top, ESP register, into the address line. Then press <Enter>. The stack contents will be immediately displayed:

Listing 3.6: Searching for the return address from the current function (in bold)
image from book
0012F488 0012FA64 0012FA64 004012FF 0012F494 00000000 00000064 00403458 0012F4A0 FFFFFFFF 0012F4C4 6C291CEA 0012F4AC 00000019 00000000 6C32FAF0 0012F4B8 0012F4C0 0012FA64 01100059 0012F4C4 006403C2 002F5788 00000000 0012F4DO 00640301 77E16383 004C1E20
image from book
 

The first two double words correspond to the POP EDI/POP ESI machine commands. Therefore, they are of little or no importance to us. As for the next double word, it contains the return address to the parent procedure (in the above-provided example, it is in bold). This is exactly what we need!

Press <Ctrl>+<D>, then click 0x4012FF , and debugger will display the following disassembled text:

Listing 3.7: Disassembled listing of the parent function
image from book
004012FA call 00401350 004012FF cmp eax, 0FFh 00401302 je 0040132D 00401304 push eax 00401305 lea eax, [esp+8] 00401309 push 405054h 0040130E push eax 0040130F call dword ptr ds:[4033B4h] 00401315 add esp, 0Ch 00401318 lea ecx, [esp+4] 0040131C push 0 0040131E push 0 00401320 push ecx 00401321 mov ecx, esi 00401323 call 00401BC4 00401328 pop esi 00401329 add esp, 64h 0040132C ret 0040132C 0040132D push 0 0040132D ; This branch will get control if 401350h function returns FFh. 0040132F push 0 00401331 push 405048h 00401336 mov ecx, esi 00401338 call 00401BC4 0040133D pop esi 0040133E add esp, 64h 00401341 ret
image from book
 

Look at this: If the EAX register is equal to FFh , then the parent function passes the control to branch 40132Dh and terminates execution after several machine commands, passing control to a higher-level function. If, however, EAX != FFh , its value is passed to function 4033B4h . Consequently, we can assume that FFh is the error flag. Let us return to the function being tested by pressing <Ctrl>+<G> and clicking EIP . Then switch to the Registers pane and change the value of EAX to FFh .

Now, it is necessary to find a suitable point of return from the function. It is not possible to simply go to the RET machine command, because before returning from the function, it is necessary to balance the stack. Otherwise , the program will crash irreversibly, throwing us off to some unpredictable location.

In a general case, the number PUSH commands must correspond exactly to the number of POP commands. Also, take into account the fact that PUSH DWORD X is equivalent to SUB ESP , 4 , and POP DWORD X ”to ADD ESP , 4 . After analyzing the disassembled listing of the function, it is possible to draw the conclusion that, to balance the good and the bad in this case, we must pop two double words from the stack top. They correspond to the following machine commands: 40135C : PUSH ESI and 401361 : PUSH EDI . This can be achieved by passing the control to the 40136Dh address, where there are two benevolent POP S that bring the stack to a balanced state. Move the cursor to that position, right-click, and choose the Set Next Statement command from the context menu. As a variant, it is possible to switch to the registers window and change the EIP value from 401362h to 40136Dh .

Press <F5> to make the processor continue with program execution. Voila! The faulty program actually continues execution, and you can save your data. (A good-natured complaint about an error in the last operation can be ignored.)

Unwinding the Stack

It is not possible to forcibly exit from the function in every case. Some critical failures influence several nested functions simultaneously. In this case, in order to reanimate the dead program, we have to carry out a deep rollback, continuing program execution from the point, at which nothing threatened its operability. The exact depth of rollback must be selected experimentally. As a rule, it will be from three to five steps. Bear in mind that if nested functions modify global data (for instance, heap data), then any attempt at carrying out a rollback can result in a total crash of the program being debugged . Therefore, it is desirable to guess the rollback depth on the first attempt. If you are in doubt, just remember that an excess is better than a shortage. On the other hand, excessive rollback results in the loss of all unsaved data...

The rollback procedure comprises the following three steps: a) building the tree of calls; b) determining the coordinates of the stack frame for each call; c) restoring the register context of the parent function. A really good debugger will carry out all of these operations for you. The only thing that remains is to write appropriate values into EIP and ESP . Unfortunately, Microsoft Visual Studio Debugger cannot be qualified as a really effective debugger. It is good for tracing the stack, omitting FPO functions ( Frame Point Omission ”functions with optimized frame), but doesn t report coordinates of the stack frame; therefore, the most difficult part of your job must be carried out manually.

Still, even such a stack of calls is still better than nothing. By unwinding the stack manually, we will rely on the fact that frame coordinates are determined naturally by the return address. Let s suppose that that the contents of the Call Stack window appear as follows :

Listing 3.8: The contents of the Call Stacks window displayed by Microsoft Visual Studio Debugger
image from book
TESTCEDIT! 00401362() MFC42! 6c2922ae() MFC42! 6c298fc5() MFC42! 6c292976() MFC42! 6c291dcc() MFC42! 6c291cea() MFC42! 6c291c73() MFC42! 6c291bfb() MFC42! 6c291bba()
image from book
 

Let s try to find addresses 6C2922AEh and 6C298FC5h , corresponding to the two last steps of execution in the stack contents. Press <ALT>+<6> to switch to the dump window, then use the <Ctrl>+<G> hotkey combination to select the base address and select ESP . Scroll the dump window down, and you ll find both return addresses (in the listing provided below, they are framed):

Listing 3.9: Stack content after unwinding
image from book
0012F488 0012FA64 0012FA64 004012FF 0040136F:ret 8 the first return address 0012F494 00000000 00000064 00403458 00401328:pop esi 0012F4A0 FFFFFFFF 0012F4C4 6C291CEA 0012F4AC 00000019 00000000 6C32EAF0 0012F4B8 0012F4C0 0012EA64 01100059 0012F4C4 00320774 002F5788 00000000 0012F4DO 00320701 77E16383 004C1E20 0012F4DC 00320774 002F5788 00000000 0012F4E8 000003E8 0012EA64 004F8CD8 0012F4F4 0012F4DC 002F5788 0012F560 0012F500 77E61D49 6C2923D8 00403458 0040132C:ret; 0012F50C 00000111 0012F540 6C2922AE 6C29237E:pop ebx/pop ebp/ret 1Ch 0012F518 0012FA64 000003E8 00000000 0012F518 0012FA64 000003E8 00000000 0012F524 004012F0 00000000 0000000C 0012F530 00000000 00000000 0012FA64 0012F53C 000003E8 0012F564 6C298FC5 0012F548 000003E8 00000000 00000000 0012F554 00000000 000003E8 0012FA64
image from book
 

Memory cells below the return addresses represent the register values that are saved when entering the function and restored after exiting it. Memory cells located below return addresses are occupied by function arguments (if the function has any), or belong to the local variables of the parent function (if the nested function doesn t accept any arguments).

Returning to Listing 3.5, note that the two double words on the top of the stack correspond to the POP EDI and POP ESI machine commands, while the address that directly follows them ” 4012FFh ”is the one, to which the 40136Fh : RET 8 command passes control. To continue stack unwinding, we must disassemble the code by this address:

Listing 3.10: Disassembled listing of the grandmother function
image from book
004012FA call 00401350 004012FF cmp eax,0FFh 00401302 je 0040132D 00401304 push eax 00401305 lea eax, [esp+8] 00401309 push 405054h 0040130E push eax 0040130F call dword ptr ds: [4033B4h] 00401315 add esp, 0Ch 00401318 lea ecx, [esp+4] 0040131C push 0 0040131E push 0 00401320 push ecx 00401321 mov ecx,esi 00401323 call 00401BC4 00401328 pop esi 00401329 add esp , 64h 0040132C ret ; SS: [ESP] = 6C2923D8
image from book
 

By scrolling the window downwards, we will notice the ADD ESP, 64 instruction that closes the current stack frame. Eight bytes more are popped by the 40136Fh : RET 8 instruction, and four bytes are taken by 401328 : POP ESI . Thus, the position of return address in the stack is equal to current_ESP + 64h + 8 + 4 == 70h . Going down 70h bytes, you ll see:

Listing 3.11: Return address from the grandmother function
image from book
0012F500 77E61D49 6C2923D8 00403458 00401328:POP ESI/ret;
image from book
 

The first double word is the value of the ESI register, which we will have to restore manually; the second is the return address from the function. Press <Ctrl>+<G>, enter 0x6C2923D8 , and continue to unwind the stack:

Listing 3.12: Disassembled listing of the great-grandmother function
image from book
6C2923D8 jmp 6C29237B 6C29237B mov eax, ebx 6C29237D pop esi 6C29237E pop ebx 6C29237F pop ebp 6C292380 ret 1Ch
image from book
 

Now, we have finally got to restoring registers! Move to the right by one double word (it was just popped from the stack by the RET command), switch to the Registers window, and restore the ESI , EBX , and EBP registers by retrieving their saved values from the stack:

Listing 3.13: The contents of the registers saved in the stack along with the return address
image from book
0012F500 77E61D49 6C2923D8 00403458 6C29237D:pop esi 0012F50C 00000111 0012F540 6C2922AE 6C29237E:pop ebx/pop ebp/ret 1Ch
image from book
 

As an alternative, you can move the EIP register to the 6C29237Dh address, the ESP register ”to the 12F508h address, and then press <F5> to continue program execution. This technique actually works. At the same time, the reanimated program doesn t report an execution error from the last operation (as was the case when restoring by means of forcibly exiting the function). Instead of this, the program doesn t execute that command. Very well!

Passing Control to the Message Handler Function

Neither of the above-described methods of reanimating faulty applications are free from limitations and drawbacks. If the stack is seriously damaged by buffer overflow attacks or by algorithmic errors, the contents of vitally important processor registers will be corrupted. In this case, we won t be able to roll back (because stack contents have been lost) or exit the current function (because EIP points to some unknown location, probably somewhere in outer space). For console applications, there is actually very little that can be done in such situations GUI applications, however, are a different matter. The concept of event-driven architecture provides any windowing application with some server functions. Even if the current execution context is irreversibly lost, we can pass control to the message-handling loop, thus making the program continue processing user commands.

A classic message-handling loop appears as follows:

Listing 3.14: A classic message-handling loop
image from book
while (GetMessage(&msg, NULL, 0, 0)) { TranslateMessage (&msg); DispatchMessage (&msg); }
image from book
 

All you need to do is pass control to the while loop, without even caring about the stack frame tuning, since optimized programs (which are overwhelming in the majority) address their local variables via ESP , rather than via EBP . Of course, when addressing to the msg variable, the function will ruin the stack contents that are located below its top. However, this is of little or no importance to us.

You should, however, realize that after you exit the application, it will definitely die (because instead of the address to return from the function, the RET machine command will find some unpredictable trash on top of the stack). However, this will be after you have saved all of your data, and, therefore, this crash doesn t present any threat. The only exception is in a group of freaky applications that forget to close all opened files and delegate this job to the ExitProcess function. However, even in this case, there is a way out: You can modify the return address in such a way as to make it point to the ExitProcess function!

Let us create the simplest Windows application and experiment with it. Start Visual Studio, choose New ’ Project ’ Win32 Application and then select Typical Hello, World application. Add a new item to the menu, and add the following: char *p ; *p = 0 ; then compile this project with debug info.

Drop the application, then start the debugger. Move the cursor to the first line of the message-handling loop, right-click and select Set Next Statement from the context menu. Press <F5> to continue program execution and it will actually continue to work!

Now, compile the project as a release (i.e., without debug info) and try to reanimate the application in naked machine code. Taking advantage of the fact that Windows is a truly multitasking environment, in which the crashing of one process doesn t interfere with the operation of others, start your favorite disassembler (IDA PRO, for instance) and analyze the import table of the program being debugged. Even freeware programs such as dumpbin are able to do this. However, the report produced by dumpbin is not as clear and illustrative as the results produced by fully functional disassemblers.

The main goal of our search will be the TranslateMessage/DispatchMessage functions and cross-references to the message-handling loop.

Listing 3.15: Searching TranslateMessage/DispatchMessage functions in the import table
image from book
.idata:004040E0 ; BOOL __stdcall TranslateMessage(const MSG *lpMsg) .idata:004040E0 extrn TranslateMessage:dword; DATA XREF: [email protected]+71 r .idata:004040E0 ;[email protected]+8D r .idata:004040E4 ; LONG __stdcall DispatchMessageA(const MSG *lpMsg) .idata:004040E4 extrn DispatchMessageA:dword; DATA XREF: [email protected]+94 r .idata:004040E8
image from book
 

The DispatchMessage function has the only related cross-reference that obviously leads to the message-handling loop we are after. The disassembled listing of this loop appears as follows:

Listing 3.16: The disassembled listing of the message-handling function
image from book
.text:00401050 mov edi, ds:GetMessageA .text:00401050 ; The first call to GetMessageA ; (this isn't the loop itself yet, it is only its threshold). .text :00401050 .text :00401056 push 0 ; wMsgFilterMax .text :00401058 push 0 ; wMsgFilterMin .text :0040105A lea ecx, [esp+2Ch+Msg] .text :0040105A ; ECX points to the memory area, through which GetMessageA .text :0040105A ; will return the message. The current ESP value can be any value. .text :0040105A ; The most important thing here is that it must ; point to the actually allocated memory area. .text:0040105A ; (See memory map, if the ESP value turns out .text:0040105A ; to be corrupted so that it points nowhere .) .text:0040105A ; .text:0040105E push 0 ; hWnd .text:00401060 push ecx ; lpMsg .text:00401061 mov esi, eax .text:00401063 call edi ; GetMessageA .text:00401063 ; Calling GetMessageA .text: 00401063 .text:00401065 test eax, eax .text:00401067 jz short loc_4010AD .text:00401067 ; Checking if there are unprocessed messages in the queue .text:00401067 .text:00401077 loc 401077: ; CODE XREF: [email protected] 1 6+A9 j .text:00401077 ; Starting point of the message loop .text:00401077 .text:00401077 mov eax, [esp+2Ch+Msg.hwnd] .text:0040107B lea edx, [esp+2Ch+Msg] .text:0040107B ; EDX points to the memory area used for passing the messages. .text:0040107B .text:0040107F push edx ; lpMsg .text:00401080 push esi ; hAccTable .text:00401081 push eax ; hWnd .text:00401082 call ebx ; TranslateAcceleratorA .text:00401082 ; Calling the TranslateAcceleratorA function .text:00401082 .text:00401084 test eax, eax .text:00401086 jnz short loc 40109A .text:00401086 ; Checking if there are unprocessed messages in the queue .text:00401086 .text:00401088 lea ecx, [esp+2Ch+Msg] .text:0040108C push ecx ; lpMsg .text:0040108D call ebp ; TranslateMessage .text:0040108D ; Calling the TranslateMessage function, if there is anything to translate .text:0040108D .text:0040108F lea edx, [esp+2Ch+Msg] .text:00401093 push edx ; lpMsg .text:00401094 call ds : DispatchMessageA .text:00401094 ; Dispatching the message .text:0040109A .text:0040109A loc_40109A: ; CODE XREF: [email protected]+86 j .text:0040109A push 0 ; wMsgFilterMax .text:0040109C push 0 ; wMsgFilterMin .text:0040109E lea eax, [esp+34h+Msg] .text:004010A2 push 0 ; hWnd .text:004010A4 push eax ; lpMsg .text:004010A5 call edi ; GetMessageA .text:004010A5 ; reading the next message from the message queue .text:004010A5 .text:004010A7 test eax, eax .text:004010A9 jnz short loc_401077 .text:004010A9 ; running the message handling loop .text:004010A9 .text:004010AB pop ebp .text:004010AC pop ebx .text:004010AD .text:004010AD loc_4010AD: ; CODE XREF: [email protected]+67 j .text:004010AD mov eax, [esp+24h+Msg.wParam] .text:004010B1 pop edi .text:004010B2 pop esi .text:004010B3 add esp, 1Ch .text:004010B6 retn 10h .text:004010B6 [email protected] endp
image from book
 

We can see that the message-handling loop starts from the address 401050h . This is the address, to which it is necessary to pass control in order to continue the execution of the crashed program. Try it. The program works!

Naturally, the task of reanimating a real-world application is much more complicated, because the message-handling loop in this case will be distributed over a large number of functions. Note that it is very difficult to identify all of these functions in the course of superficial disassembling. Nevertheless, applications based on standard libraries (such as MFC or OVL) have a predictable architecture. Therefore, the reanimation of such applications isn t a hopeless task.

Let s consider the structure of the message-handling loop in MFC. MFC applications spend most of their time in the following function: CWinThread :: Run(void) . This function periodically polls the queue for the arrival of new messages and sends them to the appropriate handlers. If one of the handlers has caused a critical fault, program execution can be continued using the Run function. This is its main advantage!

The function has no explicit arguments, but accepts a hidden this argument, pointing to the CWinThread class instance or its derived class, without which the function will be unable to work. Fortunately, tables of virtual methods of the CWinThread class contain a sufficient amount of birthmarks, allowing us to recreate the this pointer manually.

Let's load the Run function into the disassembler and mark all of the calls to the table of virtual methods addressed via the ECX register.

Listing 3.17: A fragment of the disassembled listing of the Run function
image from book
.text:6C29919D n2k Trasnlate main: ; CODE XREF: MPC42 5715+1F j .text:6C29919D ; MFC42 5715+67 j ... .text:6C29919D mov eax, [esi] .text:6C29919F mov ecx, esi .text:6C2991Al call dword ptr [eax+ 64h ] ; CWinThread: : PumpMessage (void) .text:6C2991A4 test eax, eax .text:6C2991A6 jz short loc_6C2991DA .text:6C2991A8 mov eax, [esi] .text:6C2991AA lea ebp, [esi+34h] .text:6C2991AD push ebp .text:6C2991AE mov ecx, esi .text:6C2991B0 call dword ptr [eax+ 6Ch ] ; CWinThread: :IsIdleMessage(MSG*) .text:6C2991B3 test eax, eax .text:6C2991B5 jz short loc 6C2991BE .text:6C2991B7 push 1 .text:6C2991B9 mov [esp+14h] , ebx .text:6C2991BD pop edi .text:6C2991BE .text:6C2991BE loc_6C2991BE: ; CODE XREF: MFC42 5715+51 j .text:6C2991BE push ebx ; wRemoveMsg .text:6C2991BF push ebx ; wMsgFilterMax .text:6C2991C0 push ebx ; wMsgFilterMin .text:6C2991C1 push ebx ; hWnd .text:6C2991C2 push ebp ; lpMsg .text:6C2991C3 call ds : PeekMessageA .text:6C2991C9 test eax, eax .text:6C2991CB jnz short n2k_Trasnlate_main .text:6C2991CD
image from book
 

Thus, the Run function expects to receive the pointer to the double word pointing to the table of virtual methods, elements 0x19 and 0x1B of which represent the PumpMessage and IsIdleMessage functions (or stubs to them), respectively. If DLL was not relocated , the addresses of imported functions can be found using the same disassembler. Otherwise, they should be reconstructed using the base address of the module, which is displayed by the debugger in response to the Modules command. Provided that these two functions were not blocked by the programmer, searching for the needed virtual table should be a trivial task.

For some unknown reason, the MFC42.DLL library doesn t export symbolic names for these functions, so we must get this information on our own. After processing the MFC42.LIB library using the dumpbin utility with the /ARCH command-line option, we will get the ordinals of both functions (for PumpMessage , this is 5307, and for IsIdleMessage ”4079). Now, it remains to find these values in the export list of MFC42.DLL ( dumpbin /EXPORTS mf c42. dll > mf c42. txt ), from which we will discover that the address of the PumpMessage function is 6C291194h , while the address of the IsIdleMessage is 6C292583h .

Now, it is necessary to find the pointers to the PumpMessage/IsIdleMessage functions in memory, or, to be more precise, in the data section, the base address of which is contained in the header of the PE-file. Bear in mind that in x86 processors, the least significant byte is located at the lower address, which means that all numbers are written in inverse order. Unfortunately, Microsoft Visual Studio Debugger doesn t support the memory-searching operation. Therefore, we must bypass this limitation by copying the content of the dump onto the clipboard, pasting it into a text file, and searching for addresses there by pressing <F7>. Finally, the required pointers are found at the addresses 403044h/40304Ch (naturally, in your system these addresses may be different). Note that the distance between the pointers is exactly equal to the distance between the pointers to [EAX + 64h] and [EAX + 6Ch] , while the order, in which they appear in memory, is inverse to the order, in which virtual methods are declared. This is a good symptom, which indicates that we are likely on the right path.

Listing 3.18: The addresses of the IsIdleMessage/PumpMessage functions located in the data section
image from book
00403044 6C2911D4 6C292583 6C291194 ; IsIdleMessage/PumpMessage 00403050 6C2913DO 6C299144 6C297129 0040305C 6C297129 6C297129 6C291A47
image from book
 

The pointers referring to the 403048h/40304Ch addresses, obviously, are the candidates for membership in the virtual methods table of the CWinThread class, for which we are looking. By extending the search range to the entire address space of the process being debugged, we will find the following two stubs:

Listing 3.19: Stubs to the IsIdleMessage/PumpMessage functions located in the data segment
image from book
00401A20 jmp dword ptr ds:[403044h] ; IsIdleMessage 00401A26 jmp dword ptr ds:[403048h] ; 00401A2C jmp dword ptr ds:[40304Ch] ; PumpMessage
image from book
 

We are getting closer! We have found the stubs to the virtual functions instead of the functions themselves . By unrolling this complicated puzzle, let us try to find the references to 401A26h/401A2Ch , which pass control to the code provided above:

Listing 3.20: Virtual table of the CWinThread class
image from book
00403490 00401A9E 00401040 004015F0 0x0, 0x1, 0x2 elements 0040349C 00401390 004015F0 00401A98 0x3, 0x4, 0x5 elements 004034A8 00401A92 00401A8C 00401A86 0x6, 0x7, 0x8 elements 004034B4 00401A80 00401A7A 00401A74 0x9, 0xA, 0xB elements 004034C0 00401010 00401A6E 00401A68 0xC, 0xD, 0xE elements 004034CC 00401A62 00401A5C 00401A56 0xF, 0x10, 0x11 elements 004034D8 00401A50 00401A4A 00401A44 0x12, 0x13, 0x14 elements 004034E4 00401A3E 004010B0 00401A38 0x15, 0x16, 0x17 elements 004034F0 00401A32 00401A2C 00401A26 0x18, 0x19, 0x1A elements (PumpMessage) 004034FC 00401A20 00401A1A 00401A14 0x1B, 0x1C, 0x1D elements (IsIdleMessage)
image from book
 

Even a beginner will easily recognize the virtual functions table in this data structure. The pointers to stubs to PumpMessage/IsIdleMessage are divided by exactly one element, as required by the task conditions. Let us suppose that this virtual table is the one that we need. To check if this assumption is correct, count 0x19 elements upwards from 4034F4h , and try to find the pointer that refers to its starting point. If you are lucky and it turns out to be of the CWinThread class, the program will be able to continue its operation correctly:

Listing 3.21: The instance of CWinThread, manually located in memory
image from book
004050B8 00403490 00000001 00000000 004050C4 00000000 00000000 00000001
image from book
 

Actually, something very similar to the truth can be found in the memory. Let us write the 4050B8h value into the ECX register and locate the Run function in the memory (as already mentioned, its address ” 6C299l64h ”is known, provided that it hasn t been blocked). Then press <Ctrl>+<G>, enter "0x6C299164" , and choose the Set Next Statement command from the right-click menu. The program, having escaped with a slight fright, continues execution, while you have a good reason to be happy and go have a rest.

Hanged applications that react neither to keyboard entry nor to mouse clicks can be reanimated in a similar way.

How to Process Memory Dump

In the software department, the entire floor was sown with the confetti from punch cards, and there were some guys crawling over the printout of a crash dump about 20 meters in length, trying to locate an error in the memory manager. The head of the department approached the president and informed him that there was some hope that the task could be achieved before dinner.

J.Antonov. The Youth of Gates

Memory dump, also known as core , crash-dump, which is saved by the system in the event of a critical error, isn t the most useful tool for detecting the cause of the crash. However, there is often nothing else at the disposal of system administrator. What is the crash dump? This is the last moan of the operating system at the moment of irreversible fault, before it dies altogether. Digging it out is unlikely to please you. On the contrary, it is highly probable that you won t be able to detect the actual cause of the failure. Suppose, for instance, an incorrectly written driver has invaded the memory region belonging to another driver and ruined its data structures, sending all of the numbers there topsy-turvy. At the moment when the victim dies, the faulty driver may already be stopped and, in this case, it will be practically impossible using the memory dump alone to determine that it was the one that actually crashed the system.

Nevertheless, it doesn t make any sense to ignore the dump s existence. After all, it provided the only debugging method before the arrival of interactive debuggers. Contemporary programmers are spoiled by the availability of visual analysis tools. However, it doesn t provide them with much self-confidence in situations where pitiless entropy leaves them alone, face to face with their errors. But enough waxing lyrical. Let s take a closer look at this question.

First and foremost, it is necessary to edit the system configuration (Control Panel ’ System) and make sure that dump settings correspond to our requirements (Advanced ’ Startup and Recovery). Windows 2000 supports three types of memory dumps: small memory dump, kernel memory dump, and complete memory dump. To change the dump settings, you must have administrative privileges.

Small memory dump uses only 64 K (instead of 2 MB, as the context menu states) and includes: a) a copy of BSOD; b) a list of loaded drivers; c) the context of the crashed process with all of its threads; d) the first 16 K of the kernel stack of the crashed process. It s a disappointingly small amount of information, isn t it? Direct dump analysis provides us only with the address, at which the error has occurred and the name of the driver, to which that address belongs. Provided that system configuration didn t change after the moment of failure, we can start the debugger and disassemble the suspected driver. However, this is unlikely to produce a valuable result. After all, the content of the data segment at the moment of failure is unknown to us. Furthermore, we cannot even say for sure that we see the same machine commands as those that caused the failure. Therefore, the small memory dump might be useful only for system administrators, for whom it is sufficient to know the name of the unstable driver. As practice has shown, this information is sufficient in the vast majority of cases. The administrator is expected to send complaints along with an error report and memory dump to driver developers, and replace the driver with a newer , more stable and reliable one. By default, small memory dump will be written to the directory called %SystemRoot%\ Minidump where it is assigned the name starting with the string Mini , followed by the current date and number of the failure for the current day. For example: Mini110701 “69.dmp ”69th system dump saved on November 7, 2001.

Kernel memory dump contains significantly more comprehensive information about the failure. It includes the entire memory allocated to the system kernel and its components ”drivers, Hardware Abstraction Layer (HAL), and so on, as well as a copy of BSOD. The size of the kernel dump depends on the number of installed drivers and varies from system to system. Help system states that this value can vary from 50 to 800 MB. Eight hundred MB is too much to look realistic. A size of approximately 50 to 100 MB seems more likely. The technical documentation states that the approximate size of the kernel dump is about one third of the amount of RAM physically installed in the computer. This is the best compromise between disk space overhead, the speed of dump creation, and the information value of the latter. This option does actually provide you with the required minimum of information. Using this option, it is possible to locate practically all typical errors of the drivers and other kernel components, including those that are due to the hardware malfunction (however, the investigator must have some experience with studying memory crash dumps). By default, the kernel dump is written into the file named %SystemRoot%\ Memory.dmp. Depending on the current settings, the new dump will either overwrite the existing one or be added to its tail.

Full memory dump includes the entire content of the physical memory, both the memory occupied by kernel components and by application processes. Full memory dump turns out to be especially useful when debugging ASPI/SPTI applications, which, due to their specific features, are capable of dropping the kernel even from the application level. Despite its large size, the full memory dump is the favorite option of all system programmers (most administrators prefer the small memory dump). This isn t surprising, if we recall that hard disks long ago have passed the 100 GB threshold. From the programmer s point of view, it is much better to have an unneeded full memory dump than end up suffering because of its absence. By default, the full memory dump will be saved in the file named %SystemRoot%\ Memory.dmp. Depending on the current system settings, it will either overwrite the existing file or will be appended to its end.

Having chosen the preferred type of memory dump, let s simulate the system crash for the testing purposes. This will help us to get the required skills for recovering the system under fire. For this purpose, we ll need the following:

  • Windows Driver Development Kit (DDK), distributed by Microsoft for free and providing detailed technical documentation of the system kernel; several different C/C++ compilers, assembler, and some advanced tools for memory dump analysis.

  • The W2K_KILL.SYS or any other killer driver, such as BSOD.EXE by Mark Russinovitch, which allows you to get the dump at any given time instance, without needing to wait for a critical error to occur (the freeware version of BSOD.EXE can be downloaded from http://www.sysinternals.com ).

  • Symbol files, required for kernel debuggers to function normally and making the disassembled code more readable and obvious. Symbol files are included in the green MSDN distribution set. In principle, you can get by without them. However, the environment variable _NT_SYMBOL_PATH must be defined anyway, otherwise the i386kd.exe debugger won t work.

  • One or more of the books describing the system kernel architecture. The best is Windows 2000 Internals by Mark Russinovitch and David Solomon. This book will be interesting both for system programmers and for administrators.

After installing DDK on your computer, close all applications and start the killer driver. The system will crash, display a BSOD informing of the causes of failure (see Fig. 3.4), and write the dump (the process might be accompanied by a rattling sound).

image from book
Fig. 3.4: Blue Screen Of Death (BSOD), signaling the irrecoverable system failure and providing brief information about it

For most administrators, the appearance of BSOD means only one thing ”the system was feeling so bad that it preferred death to the infamy of unstable operation. As for the enigmatic characters, they remain a total mystery, but not for true professionals!

Let s start from the top left position on the screen, and trace all BSOD elements, one by one.

  • *** STOP : actually means that the system has stopped. It doesn t carry any other useful information.

  • 0x0000001E ”this is the Bug Check code that classifies the failure. Decoding of the Bug Check codes is provided in DDK. In our case, the code is 0x1E ” KMODE_EXEPTION_NOT_HALTED , which is specified by a line directly below. Brief explanations of the most typical Bug Check codes are provided in Table 3.1. Of course, it cannot serve as a replacement for the companion documentation. It will prove you, however, the need to download 70 MB of the DDK.

  • Numbers in brackets are four Bug Check parameters, the physical meaning of which depends on a specific Bug Check code, which has no physical meaning outside its context. With regard to KMODE_EXEPTION_NOT_HALTED , the first Bug Check parameter contains the number of the exception that was thrown. According to Table 1, this is STATUS_ACCESS_VIOLATION ”access to an invalid memory address. The fourth Bug Check parameter specifies the exact address. In this case, it is equal to zero, which means that a specific machine instruction attempted accessing by a null-pointer, corresponding to the initialized pointer that references unallocated memory region. Its address is contained in the second Bug Check parameter. The third Bug Check parameter is undefined in this case.

  • *** Address 0xBE80B00 ”this is the address, at which the failure took place. In this particular case, it is identical to the second Bug Check parameter. This, however, isn t always the case (Bug Check codes are not actually intended to store any addresses).

  • base at 0xBE80A00 ”contains the base loading address of the module that violated the system operating order, by which it is possible to restore the data about that module. (Attention: It isn t always possible to determine correctly the base address.) Using any suitable debugger (for instance, Soft-Ice from NuMega or i386kd from Microsoft), let s issue a command that produces the listing of all loaded drivers with their brief characteristics (in i386kd, this is achieved using the ! drivers command). As a possible alternative, you can use the drivers.exe utility supplied as part of NTDDK. No matter which method you choose, the result will be approximately as follows:

    • kd> !drivers!drivers Loaded System Driver Summary Base Code Size Data Size Driver Name Creation Time 80400000 142dc0 (1291 kb) 4d680 (309 kb) ntoskrnl.exe Wed Dec 08 02:41:11 1999 80062000 cc20 (51 kb) 32c0 (12 kb) hal.dll Wed Nov 03 04:14:22 1999 f4010000 1760 (5 kb) 1000 (4 kb) BOOTVID.DLL Thu Nov 04 04:24:33 1999 bffd8000 21ee0 (135 kb) 59a0 (22 kb) ACPI.sys Thu Nov 11 04:06:04 1999 be193000 16f60 (91 kb) ccc0 (51 kb) kmixer.sys Wed Nov 10 09:52:30 1999 bddb4000 355e0 (213 kb) 10ac0 (66 kb) ATMFD.DLL Fri Nov 12 06:48:40 1999 be80a000 200 (0 kb) a00 (2 kb) w2k_kill.sys Mon Aug 28 02:40:12 2000 TOTAL: 835ca0 (8407 kb) 326180 (3224 kb) (0 kb 0 kb)
  • Note the highlighted string w2k_kill.sys , located at the base address 0xBESOAOO . This driver is exactly the one that we need! This step, though, isn t necessary, since the name of the faulty driver is displayed on the BSOD, anyway.

  • Two lines at the bottom of the screen display the progress of the dump creation, entertaining the administrator by displaying a sequence of swiftly changing digits.

Below, you will find the physical meanings of the most common Bug Check hex codes with brief explanations. The popularity rating of the Bug Check codes was composed by counting the number of times they were referenced in Internet conferences (thanks to Google).

  • OXOA ”symbolic name: IRQL_NOT_LESS_OR_EQUAL

    Driver attempted to access the memory page at the DISPATCH_LEVEL or a higher level, which resulted in a crash, since Virtual Memory Manager (VMM) operates at lower level.

    The possible source of failure can be BIOS, driver, or system service (this is especially typical for anti-virus scanners and FM tuner).

    As a possible alternative, check the cable terminators SCSI drives and the Master/Slave settings on IDE drives . Try to disable the memory caching option in BIOS.

    If this doesn t help, check the four Bug Check code parameters containing the reference to the accessed memory, IRQ level, access type (read/write) and the address of the driver s machine instruction.

  • 0x1E ”symbolic name: KMODE_EXCEPTION_NOT_HANDLED

    The kernel component has thrown an exception, and then forgotten to handle it; the number of the exception is contained in the first Bug Check parameter. It usually takes one of the following values:

    • 0x80000003 (STATUS_BREAKPOINT) : A software breakpoint was encountered , which is a debugging rudiment that the driver neglected to remove.

    • (0xC0000005) STATUS_ACCESS_VIOLATION : Access to invalid address (the fourth Bug Check parameter specifies the exact address) ”error by the developer.

    • (0xC000021A) STATUS_SYSTEM_PROCESS_TERMINATED : Failure of CSRSS and/or Winlogon processes. Both kernel components and user-mode applications can cause this error. As a rule, this happens if the machine is infected by a virus or when the integrity of system files has been violated.

    • (0xC0000221) STATUS_IMAGE_CHECKSUM_MISMATCH : The integrity of one or more system files has been violated. The second Bug Check parameter contains the address of the machine command that has thrown an exception.

  • 0x24 ”symbolic name: NTFS_FILE_SYSTEM

    There is a problem with the NTFS.SYS driver. As a rule, this happens as a result of physical disc corruption or, more rarely, under conditions of an urgent shortage of physical memory.

  • 0x2E ”symbolic name: DATA_BUS_ERROR

    The driver accessed a non-existent physical address. If this isn t the driver s fault, this means that RAM or the processor cache memory (or video memory) is malfunctioning or was overclocked to unsupported frequency values.

  • 0x35 ”symbolic name: NO_MORE_IRP_STACK_LOCATIONS

    The higher-level driver called a lower-level driver via IoCallDriver interface, but there was no free space in the IRP stack and it was impossible to pass the entire IRP. This is a deadly situation that has no direct solutions; the only way out is trying to delete some of the least important drivers, in which case you may hope to get the system up and running again.

  • 0x3F ”symbolic name: NO_MORE_SYSTEM_PTES

    The excessive fragmentation of the PTE table, which results in the impossibility of allocating the memory block requested by the driver. As a rule, this situation is characteristic for audio/video drivers manipulating with vast memory blocks. Usually, such drivers fail to release allocated memory blocks in due time. To solve the problem, try to increase the PTE number (up to 50,000 at maximum) by editing the following registry entries: HKLM\SYSTEM\CurrentControlSet\Control\ SessionManager\Memory Management\SystemPages .

  • 0x50 ”symbolic name: PAGE_FAULT_IN_NONPAGED_AREA

    An attempt to access a non-existent memory page, which is usually caused either by hardware malfunction (as a rule, the faulty component is a RAM chip, or video/cache memory), or by an incorrectly designed service (this is typical for many anti-virus scanners), or by the corruption of the NTFS-formatted volume (run chkdsk with /f and /r command-line options). Also try to disable memory caching in BIOS.

  • 0x58 ”symbolic name: FTDISK_INTERNAL_ERROR

    Failure in the course of loading a RAID array. When trying to boot the system from the primary disk, the system has detected its corruption, after which it tried to access the mirror, but there was no partition table there.

  • 0x76 ”symbolic name: PROCESS_HAS_LOCKED_PAGES

    The driver failed to release locked pages after completion of the I/O operation; to detect the name of the faulty driver, open the HKLM\SYSTEM\CurrentControlSet\ Control\Session Manager\Memory Management branch of the system registry, find the TrackLockedPages DWORD parameter, and set its value to 1. Reboot the system, and it will then save the traced stack. If a faulty driver causes an error again, there will be a BSOD with a Bug Check code equal to 0xCB . This will help detect the driver that causes this error.

  • 0x77 ”symbolic name: KERNEL_STACK_INPAGE_ERROR

    The memory page with the kernel data is not available for technical reasons. If the first Bug Check code is not equal to zero, it can take one of the following values:

    • (0xC000009A) STATUS_INSUFFICIENT_RESOURCES ”system resources are not sufficient.

    • (0xC000009C) STATUS_DEVICE_DATA_ERROR ”disk read/write error (or maybe bad sector).

    • (0xC000009D) STATUS_DEVICE_NOT_CONNECTED ”system cannot see the drive (controller malfunction, bad contact).

    • (0xC000016A) STATUS_DISK_OPERATION_FAILED ”disk I/O error (bad sector or malfunctioning controller).

    • (0xC0000185) STATUS_IO_DEVICE_ERROR ”incorrect termination of a SCSI drive or IRQ conflict of IDE drives.

      A zero value got the first Bug Check code specifies an unknown hardware problem.

      Such messages can appear if the system is infected by viruses, in the event of disk corruption, or in the case of RAM failure. Start Recovery Console and run the ChkDsk command with the /r command-line option.

  • 0x7A ”symbolic name: KERNEL_DATA_INPAGE_ERROR###

    Kernel memory page is not available for technical reasons, the second Bug Check parameter contains the exchange status, and the fourth “the virtual page address that couldn t be loaded.

    Possible reasons for the failure are bad sectors occupied by the pagefile.sys file, failures of the disk controller, or virus infection.

  • 0x7B ”symbolic name: INACCESSIBLE_BOOT_DEVICE

    Boot device is unavailable because the partition table is corrupted or doesn t correspond to the content of the boot.ini file.

    This message may appear after the replacement of the motherboard with an integrated IDE controller or the replacement of an SCSI controller, because each controller requires its native drivers. Thus, after installing a hard disk with the Windows NT operating system on a computer containing incompatible equipment, the OS won t start and needs to be reinstalled. Experienced administrators, however, can reinstall disk drivers, after booting into the Recovery Console.

    It is also recommended to test the usability of equipment and scan the system for viruses.

  • 0x7F ”symbolic name: UNEXPECTED_KERNEL_MODE_TRAP

    Processor exception unhandled by the operating system. As a rule, this situation is caused by hardware malfunction, incorrect CPU overclocking, its incompatibility with installed drivers, or algorithmic errors in drivers.

    Check the usability of your equipment and remove all unnecessary drivers. The first Bug Check parameter contains the exception number and can take the following values:

    • 0x00 ”attempt of dividing by zero

    • 0x01 ”system debugger exception

    • 0x03 ”breakpoint exception

    • 0x04 ”overflow

    • 0x05 ”generated by the BOUND instruction

    • 0x06 ”invalid opcode

    • 0x07 ”Double Fault

      Descriptions of all other exceptions can be found in the technical documentation for Intel and AMD processors.

  • 0xC2 ”symbolic name: BAD_POOL_CALLER

    The current thread has caused an incorrect pool-request, which is usually due to an algorithmic error by the driver developer. However, to all appearances, the system itself isn t bug-free, since to eliminate this error, Microsoft recommends the installation of SP2.

  • 0xCB ”symbolic name: DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS

    After completing the input/output procedure, the driver is unable to release locked pages (see PROCESS_HAS_LOCKED_PAGES ).

    The first Bug Check parameter contains the called address, while the second Bug Check parameter specifies the calling address. The last, fourth, parameter points to the UNICODE string with the driver name.

  • 0xD1 ”symbolic name: DRIVER_IRQL_NOT_LESS_OR_EQUAL

    Same as IRQL_NOT_LESS_OR_EQUAL.

  • 0xE2 ”symbolic name: MANUALLY_INITIATED_CRASH

    A manually generated system failure initiated by pressing the <Ctrl>+<Scroll Lock> hotkey combination, provided that the registry parameter CrashOnCtrlScroll located under HKLM\System\CurrentControlSet\Services\i8042prt\Parameters contains a nonzero value.

  • 0x7A ”symbolic name: KERNEL_DATA_INPAGE_ERROR

    Kernel memory data page is not available for technical reasons. The second Bug Check parameter contains the exchange status. The fourth parameter specifies the virtual page address that couldn t be loaded.

    Possible causes include bad sectors in pagefile.sys, disk controller failures, and virus infection.

Recovering the System after Critical Failure

Unnatural, practically sexual inclination to the F8 button appeared in Rabbit with a good reason.

14,400 bauds and 19,200 users

Operating systems of the Windows NT family can tolerate even critical faults ”even if they occur in most unsuitable instances (for example, in the course of disk defragmentation). Fault-tolerant file system driver does everything on its own (although, it will be wise to run chkDsk anyway).

If you have chosen the Full memory dump or Kernel memory dump options, then, after you boot successfully the next time, the hard disk will drag its read/write head for a long period of time, even if there are no attempts to access it. Don t worry! Windows simply relocates the dump from the virtual memory to its constant location. After starting Task Manager, you ll see a new process in the list ” SaveDump.exe. This is the task that it carries out. The need for such a two-step scheme of saving the dump is explained by the fact that the operability of file system drivers isn t guaranteed at the moment of critical error, and the operating system can t risk using them. Instead, it limits itself to temporary storing the dump in virtual memory. By the way, if the available amount of virtual memory turns out to be insufficient (Advanced Performance ’ Virtual memory), it will be impossible to save the dump.

If the system fails to boot, and this error is persistent, don t forget that you have the <F8> key at your disposal. Choose the Last Known Good Configuration menu option. Starting the system in safe mode with the required minimum of vitally important system services and drivers is a more radical step. System reinstallation is the last resort, and it isn t recommended to resort to this unless absolutely necessary. It is better to try to start the Recovery Console and relocate the dump to another machine, where you ll be able to investigate it.

Loading the Crash Dump

To load the crash dump into your Windows Debugger (windbg.exe), choose the Crash Dump option from the File menu, or press the <Ctrl>+<D> hotkey combination. If you are working with the i386kd.exe debugger, use the -z command-line option followed by the fully qualified path name to the dump file. The name of the dump file must be separated from the command by one or more blanks, and the _NT_SYMBOL_PATH environment variable must specify the full path to the symbol files. Otherwise, the debugger will terminate abnormally. As an alternative, you can use the -y command-line option. In this case, the console screen will appear approximately as follows: i386kd -z C:\WINNT\memory.dmp -y C:\WINNT\Symbols . Note that it is necessary to call the debugger from the Checked Build Environment/Free Build Environment console located in the Windows 2000 DDK folder. Otherwise, you ll fail.

Associating DMP files with the i386kd debugger is a good idea. After you do so, you ll be able to call the debugger by simply pressing the <Enter> key in FAR Manager. The choice of debugging tools, though, is a matter of personal preference. Some people prefer KAnalyze, while others are quite content with simple DumpChk. The range of analysis tools, from which you can choose, is broad (for instance, DDK contains four such tools). Thus, for the sake of distinctness, let us choose i386kd.exe, also known as Kernel Debugger.

As soon as the Kernel Debugger console appears on the screen (Kernel Debugger is the console application preferred by those who spent their youth sitting at terminals), the cursor will quickly disassemble the current machine instruction and drag us into the depths of machine code. Enter u from the keyboard, thus making the debugger to continue code disassembling.

According to symbolic identifiers PspUnhandledExceptionInSystemThread and KeBugcheckEx , we are somewhere deep in the kernel, or, to be more precise, somewhere in the surroundings of the code that displays the BSOD:

Listing 3.22: The results of disassembling the memory dump from the current address
image from book
8045249c 6a01 push 0x1 kd>u [email protected]: 80452484 8B442404 mov eax, dword ptr [esp+4] 80452488 8B00 mov eax, dword ptr [eax] 8045248A FF7018 push dword ptr [eax+18h] 8045248D FF7014 push dword ptr [eax+14h] 80452490 FF700C push dword ptr [eax+0Ch] 80452493 FF30 push dword ptr [eax] 80452495 6A1E push 1Eh 80452497 E8789AFDFF call [email protected] 8045249C 6A01 push 1 8045249E 58 pop eax 8045249F C20400 ret 4
image from book
 

There is nothing interesting in the stack (look for yourself. To view the stack contents, issue the kb command):

Listing 3.23: The stack contents don t provide any clues to the actual nature of the critical error
image from book
kd> kb ChildEBP RetAddr Args to Child f403f71c 8045251c f403f744 8045cc77 f403f74c ntoskrnl!PspUnhandledExceptionInSystemThread+0x18 f403fddc 80465b62 80418ada 00000001 00000000 ntoskrnl!PspSystemThreadStartup+0x5e 00000000 00000000 00000000 00000000 00000000 ntoskrnl!KiThreadStartup+0x16
image from book
 

This turn of things is mystifying. You can disassemble the core as many times as you like, but it won t bring you any closer to the solution. This is logical, since the current address ( 8045249Ch ) is far beyond the limits of the killer driver ( 0BE80A00h ). So let s go another way. Do you recall the address that was displayed on the BSOD? If you don t, this isn t a problem! If the system settings don t prohibit it explicitly, copies of all BSODs are saved in the system log. Let s open it: Control Panel ’ Administrative Tools ’ Event Viewer):

Listing 3.24: A BSOD copy saved in the system log
image from book
The system was rebooted after a critical error: 0x0000001e (0xc0000005, 0xbe80b000, 0x00000000, 0x00000000). Microsoft Windows 2000 [v15.2195] Memory dump was saved: C:\WINNT\MEMORY.DMP.
image from book
 

Based on the category of the critical error ( 0x1E ), we can easily determine the address of the killer instruction ” 0xBE80B000 (in the above-provided listing, it is in bold). Now issue the u BE80B000 command to view its contents, and you ll see:

Listing 3.25: The results of disassembling of the memory dump by the address reported by BSOD
image from book
kd>u 0xBE80B000 be80b000 a100000000 mov eax, [00000000] be80b005 c20800 ret 0x8 be80b008 90 nop be80b009 90 nop be80b00a 90 nop be80b00b 90 nop be80b00c 90 nop be80b00d 90 nop
image from book
 

This looks much closer to the truth. The instruction pointed to by the cursor (in the text, it is in bold) calls on the cell that has a zero address, which causes the critical exception that crashes the system. Now, we know for certain, which branch of the program has caused this exception.

What should we do if we don t have a copy of the BSOD at our disposal? In fact, a copy of the BSOD is always available. You only need to know where to look for it. Try opening the dump file using any hex editor, and you ll find the following strings.

Listing 3.26: A copy of a BSOD in the program dump header
image from book
image from book
image from book
 

All main Bug Check parameters can be recognized immediately : 1E 00 00 00 is the failure category code ” 0x1E (in x86 processors, the least significant byte is located at the lower address, which means that all numbers are written in the inverse order); 05 00 00 C0 is the STATUS_ACCESS_VIOLATION exception code; and 00 BO 80 BE specifies the address of the machine command that has thrown this exception. The combination OF 00 00 00 93 08 can be recognized easily as the system Build number (just write it in decimal notation).

To view Bug Check parameters in more readable format, it is possible to use the following debugger command ” dd KiBugCheckData :

Listing 3.27: Bug Check parameters displayed in more readable format
image from book
kd> dd KiBugCheckData dd KiBugCheckData 8047e6c0 0000001e c0000005 be80b000 00000000 8047e6d0 00000000 00000000 00000001 00000000 8047e6e0 00000000 00000000 00000000 00000000 8047e6f0 00000000 00000000 00000000 00000000 8047e700 00000000 00000000 00000000 00000000 8047e710 00000000 00000000 00000000 00000000 8047e720 00000000 00000000 00000000 00000000 8047e730 00000000 e0ffffff edffffff 00020000
image from book
 

The list of other useful commands includes:

  • !drivers ”the command displaying the list of drivers that were loaded for the moment of failure

  • !arbiter ”the command displaying all arbitrators along with arbitration ranges

  • !filecache ”the command displaying the information about the file system cache and PT

  • !vm ”the command that produces the report on the virtual memory usage, etc.

Unfortunately, it is impossible to provide a complete listing of the commands here. If you need it, you ll find such a listing in the manual for your preferred debugger.

Naturally, it is much more difficult to detect the actual cause of the system crash in the real world. This is because any real driver consists of a large set of functions interacting with one another according to some intricate scheme. These functions form complicated hierarchies, sometimes crossed by tunnels of global variables, turning the driver into a labyrinth. Let us consider an example. The construction appearing as mov eax , [ebx] , where ebx == 0 , works quite normally, by obediently throwing an exception, and it is absolutely senseless trying to talk with it! It is necessary to locate the code that writes a zero value into EBX , which isn t an easy task. Of course, it is possible to scroll the screen upwards, hoping that the program code executes linearly at this section, but no one can guarantee that it is actually the case. The possibility to trace back is also missing. Roughly speaking, the address of the previous machine instruction is unknown, so it isn t recommended to rely on screen scrolling.

Having loaded the driver being tested into any intellectual disassembler that automatically restores cross-references (such as IDA PRO), we will get a more or less complete idea about the topology of the program s controlling branches. Naturally, disassembling, because of its static nature, doesn t guarantee that control hasn t been passed somewhere else. It does, however, narrow the search range. Generally speaking, there are lots of good books about disassembling (for instance, I have written one myself ” Hacker Disassembliny Uncovered by Kris Kaspersky); therefore, I won t concentrate on this topic here. I ll simply wish you good luck.

image from book
Fig. 3.5: The i386kd debugger at work; despite its minimalistic interface, it is a powerful and convenient instrument, allowing you to carry out prodigious tasks by pressing a couple of shortcut keys or keyboard combinations (one of which calls up your own script)
image from book
Fig. 3.6: Windbg with loaded memory dump. Note that the debugger automatically highlights the Bug Check codes without waiting for us to instruct it to do so, and when attempting to disassemble the instruction that has caused the critical exception, the screen displays the string specifying the name of the killer driver: Module Load: W2K KILL.SYS ”a nice touch
fls IS-ETS-KDC-PC-ASCII-01) : NODE_VXKIT-EWKRET01 : WRT_8088 : Writer run terminated. [External loader error.]


The following error is in the Loader Log:
 

**** 10:29:00 UTY0800 CLI error: DBCHCL returned 543.
     text: CLI2: TDWALLETERROR(543): Teradata Wallet error. The data file is
     locked. (database is locked) 


This is a Teradata client error.  This is known to occur with earlier patch releases of the Teradata client (TTU) 16.20.