Click here to learn
about this Sponsor:
Home  |  News  |  Articles  |  Polls  |  Forum  |  Directory

Keywords: Match:
Finding Windows CE bugs with help from "Dr. Watson"
by Abraham Kcholi and Gad Meir (Dec. 8, 2006)

Foreword: The key to fixing a bug discovered in the field is accurate information about the system's state when the bug manifests itself. In this whitepaper, Abraham Kcholi and Gad Meir show how the Windows CE Error Reporting module can be used to retrieve that information.

With the Error Reporting module installed on a Windows CE device, an exception causes a "dump file" to be saved, according to Kcholi and Meir. Using a trivial example of a divide-by-zero exception, the authors explain how the dump file can then be analyzed on a workstation using WinDbg, which can point out the exact source line where the exception occurred.



Finding Windows CE bugs with help from "Dr. Watson"

by Abraham Kcholi and Gad Meir


Introduction

Because we believe that we are perfect, it follows that we create perfect software. Therefore, it is the hardware's fault when our systems crash. By "systems," of course, we're referring to the combination of hardware, operating system, and applications that comprise the whole embedded system.

The scenario goes like this... We deliver the system to the client, get paid (hopefully), and then a week later, we get a nervous call in the middle of the night: "Your system crashed."

Trying to get oriented and open our eyes, we start to query the person on the other end of the line regarding what really occurred, and we end up realizing that something caused the system to crash. We promise to start investigating the problem first thing in the morning. But our beauty sleep has now evaporated, and it's time to go and trace that crash.

If only we had incorporated the Windows Error Reporting (WER) module, into our system! This would have let us retrieve the state of our device at the time the program crashed. More than that, we could have uploaded it from the device, stuck it in WinDbg, and determined the exact line where our mischievous code broke down.

Motivation

As is usually the case, demonstrating this new feature of Windows CE 5.0 is the best way to explain what it does and illustrate its usefulness. Our scenario assumes you developed a program or module running on a Windows CE 5.0 based device, the program is installed on thousands of units, and complaints are flowing in from end users that the application sometimes crashes. Wouldn't it be nice to know exactly why each crash happens, to the level of having a stack trace with the source code line number and the value of local variables at the point of the crash? Well, WER gives you just that.

Our application can be any application, whether native or managed. To demonstrate that it could be any type of application, we will use a simple console application running on a Pocket PC device, but it could be a Windows application, or a special purpose Windows CE module.

The process we are going to describe is CPU agnostic and can be used for any type of hardware running Windows CE 5.0.

Here is the source code of our sample culprit application:


Figure 1

After deploying the application onto the device, running the application will obviously cause a divide-by-zero exception. Since our device has the Error Reporting feature incorporated into its image, a polite message will pop up, asking the user if he or she would like to share their unfortunate experience of that offensive application with Microsoft.


Figure 2

There are two links on the page. Let's examine the second link a little bit further: the link indicated by "To View technical information contained in this error report."


Figure 3

It looks like two files are about to be uploaded. Clicking each of the links, to find out more about what is sent, yields:


Figure 4


Figure 5

The first looks like a report, and the second looks like a memory dump.

Later in the article, we'll see how this information can travel from Microsoft buckets directly to your product support FTP. For now, let's dig a little bit into the CE device file system.

We're interested specifically in the My Device/Windows/System/DumpFiles folder. It is a hidden folder, so you'll need to set show all for the file explorer to view it.


Figure 6

In this folder, you are going to find another folder with the prefix "CE," the date of the application crash in the format "MMDDYY," and a sequence number, in case you are lucky enough to have several applications crash on the same day. In that folder, you can find one or two files that are actually the sources of the data you have seen previously in the Windows CE ER reports.

Since those file are deleted after you make up your mind about sending the report to Microsoft, let's copy that folder to a safe place for further examination (copy and paste to another folder, or to an SD card).


Figure 7

Assuming you got the dump somehow -- from Microsoft, from your customer, or you grabbed it yourself from the customer's device during maintenance -- let's see what can be learned from that dump.

Some sort of analysis tool is needed in order to analyze the dump, and the best one is WinDbg. WinDbg is included in the "debugging tools for windows" package, freely available from the Microsoft WHDC site. We'll talk more about that package later, but first we must set the stage.

We need access to the source code of our application; we need the symbol files produced by the compiler and the application image (exe file). Since we created that program in the first place, it's probably a very straightforward process to get it onto our workstation. So, assuming we installed and configured on our workstation with the needed tools, lets start the analysis process:
  1. Start WinDbg
  2. Drag and drop the dump onto the WinDbg window
Assuming everything is set up correctly, the result would be as shown in Figure 8.


Figure 8
(Click to enlarge)

It becomes clear that it may be a divide-by-zero exception:


(Click to enlarge)

The assembly snippet shows the exact command that caused the crash. However that's just the beginning. Let's click the stack trace button:


Figure 9
(Click to enlarge)

As can be seen, we can tell the exact source file and line number that are causing the problem -- and that's not the end of the story. If we move to the stack frame in our program, we can open up a new source window with the faulty line clearly marked:


Figure 10
(Click to enlarge)

And, last but not least, the locals window is going to give us the local values at that frame, including the value of "i":


Figure 11
(Click to enlarge)

It's very tempting to change the value of "i" and retry the application in the debugger, but there are several practical reasons why you can't do that.

First, the host workstation we are using to analyze the data is most likely an x86-based computer, whereas the target device may be an ARM-based device or a MIPS-based device, etc., and although it looks like a live debugging session, you are actually debugging a piece of dump memory created automatically for you by the Windows CE WER function. Nevertheless, if you can get the dump to your analyzing machine, you can tell exactly what happened to your app at the moment of the crash, which is the primary motivation for the article you are reading right now.

By the way, if you are too lazy to remember all those debugger commands, just remember one command. The !analyze -v command. The following output of that command might explain why it is probably the most useful command in WinDbg:


Figure 12
(Click to enlarge)

Adding Error reporting to the image

Among Windows CE 5.0's most interesting new features is a set of error reporting components that we can add to our image. There are four components that can be added to the image from the catalog. The report upload component, however, can either provide a graphic user interface or not. In Figure 13, a view of error reporting catalog items is shown.


Figure 13

With error reporting incorporated into our device, when a program crashes, the device will automatically save the state of the device at the point in time the program crashed. The error report generator will save a dump file, which includes some very useful information that should be helpful in eliminating bugs that escaped the testing process.

Error Report Generator

The Error Report Generator is the component responsible for the creation of dump files using the configuration options set in the registry.

The dump file formats are compatible with the requirements of Microsoft's Watson website. This enables the uploading server to handle classification of -- and reporting of -- the uploaded dump files.

To generate an error report dump file, at least 128KB of memory must be reserved. The OAL developer initializes the size of the memory to be reserved by setting a variable named dwNKDrWatsonSize. This is done in the OEMInit function, as shown in Figure 14.


Figure 14
(Click to enlarge)

The kernel will use this size to reserve a block of memory at the end of the main memory. The Sysgen variable SYSGEN_WATSON_DMPGEN must be set to include the Error Report Generator in the image.

The HKEY_Local_Machine\System\ErrorReporting\DumpSettings registry key holds the registry values for error report generation. Figure 15 is a sample of such registry setting.


Figure 15
(Click to enlarge)

The Error Report Transfer Driver transfers registry setting values to the aforementioned reserved memory. The Error Report Generator then retrieves these settings from memory, in order to generate the appropriate dump file. These inform the Error Report Generator where to generate the dump file and what type of dump to create; in this case it's the system dump, and the maximum disk size to use is four times the size of the reserved memory.

While developing an OS design, the developer sets the type of crash dump to be generated. Each type of dump follows the same file format, three of which can be generated:
  • Context dumps, 4 KB to 64 KB
    • Information about the crashing system
    • The exception that initiated the crash
    • The context record of the faulting thread
    • A module list, limited to the faulting threads of the owner process
    • A thread list, limited to the faulting threads of the owner process
    • The call stack of the faulting thread
    • 64 bytes of memory above and below the instruction pointer of the faulting thread
    • Stack memory dump of the faulting thread, truncated to fit a 64 KB limit

  • System dumps, 64 KB -- several MB
    • All information in a Context dump
    • Calls tacks and context records for all threads
    • Complete module, process, and thread lists for the entire device
    • 2048 bytes of memory above and below the instruction pointer of the faulting thread.
    • Global variables for the process that was current at the time of the crash

  • Complete dumps, including all physical memory plus at least 64 KB
    • All information in a context dump
    • A complete dump of all used memory

The Error Report Generator generates files in a well-defined format. It starts with a single MINIDUMP_HEADER structure, followed by a number of MINIDUMP_DIRECTORY entries each describing data type, such as system info or exception info, the size of the data in bytes to be stored in the file, and a Relative (to the beginning of the file) Virtual Address (RVA) pointer to where the data begins in the file.

All the relevant structures can be found in $(_COMMONOAKROOT)\INC\DwCeDump.h.

Error Report Transfer Driver

The Error Report Transfer Driver moves the registry values (needed by the Error Report Generator) from the registry to the reserved memory block, and moves the generated files from reserved memory into persistent files.

After transferring a dump file to persistent storage, the Error Report Transfer Driver launches the Report Upload Client specified in the registry.

The Sysgen variable "SYSGEN_WATSON_XFER" must be set to include the Error Report Transfer Driver in the image.

The HKEY_LOCAL_MACHINE\Drivers\BuiltIn\ErrorReporting registry key holds the registry values for Error Report Transfer Driver. Figure 16 shows a sample of such a registry setting, in which the time interval for transfer polling is set to 5 minutes and the poll priority is set to 249.


Figure 16
(Click to enlarge)

Error Report Control Panel

The Error Reporting Control Panel allows the user of a display-based device to configure options for dump file generation by way of a Control Panel applet. The options available to the user are:
  • Enable/disable error reporting -- on a display-based device, error reporting is enabled by default. On a headless device, error reporting is disabled by default.

  • Control the amount of storage space allocated for dump files -- the control panel dialog box contains a set of radio buttons that allow the user to select the amount of storage space for storing dump files, as can be seen in Figure 17.

  • Enable user notification dialogs


Figure 17
(Click to enlarge)

The Sysgen variable "SYSGEN_WATSON_CTLPNL" must be set to include the Error Reporting Control Panel in the image.

The registry settings contained in the HKLM\System\ErrorReporting\DumpSettings registry key and in the HKLM\System\ErrorReporting\UploadSettings registry key are used by the Error Reporting Control Panel to set the initial values in the control panel dialog.

Report Upload Client

The upload client is responsible for uploading the generated and created dump file to the watson.microsoft.com error reporting web site. It is, however, possible to upload this file to another web site -- but that involves code changes, for example the function FValidBucketResponseURL, so it validates a different website than the above mentioned and implemented in (_PUBLICROOT) \WCESHELLFE\OAK\WATSON\DWUI\ DWUIDLGS.CPP.

Another file you want to look at is (_PUBLICROOT) \COMMON\OAK\INC\DWPUBLIC.H. Here, you can define a valid response server (VALID_RESPONSE_SERVER) for your server, and, of course, you need to create an upload website capable getting bucket parameters, grouping minidumps into buckets, and responding to the upload client. While all this is possible, it might not be worth the trouble.

Minidumps and Buckets

A minidump is a dump file generated on the device by Dr. Watson, containing the most important parts of a crashed application. It's "mini" name results from the fact that it contains only what is needed to identify and analyze the crashed application. A bucket represents a unique bug or problem and identifies the component responsible for the bug. Bucketing helps the upload server to organize uploaded minidumps. All of this means that minidumps describing the same problem are grouped together in what is termed a bucket.

The structure DMPFILEINFO contains all the information needed to group a minidump file in a bucket:


// Structure to contain information regarding the dump file
typedef struct tagDMPFILEINFO
{
WORD wBucketParams; // how many bucket parameters are being used
LPWSTR rgwzBucketParams[MAX_BUCKETPARAMS]; // bucket parameters for generic
// mode
LPWSTR pwzQueryString; // additional query string
LPWSTR pwzAppName; // Name to display in the UI.

LPWSTR pwzFilesToKeep; // files to include in log but not delete
LPWSTR pwzFilesToDelete; // files to include in log but delete when finished
BOOL fGenericParams; // True indicates the bucket parameters are generic
// parameters
} DMPFILEINFO, *PDMPFILEINFO;

Figure 18

How does it work?

When an application crashes, Dr. Watson goes into action and calls a function GenerateDumpFileContent implemented in (_WINCEROOT) PRIVATE\WINCEOS\UTILS\USREXCEPTDMP\UDUMPGEN.CPP.

This function does most of the work. It makes sure Dr. Watson is not preempted, and completes its job by setting its thread to the highest priority and its quantum to run to completion. It then gathers system, module, exception, process, and thread information into a CRASH_DATA structure defined in the same file. This structure actually defines a collection of structures. Once crash data has been collected, it resets the thread to its original state and writes the crash data to a dump file. That's it.

Epilogue

This article is by no means a comprehensive view on the subject of post mortem debugging and error reporting of retail devices. However, it should be viewed as a teaser for the reader to delve into the subject and take a look at the sources available. The following locations are a good place to begin:
  • (_PUBLICROOT) \WCESHELLFE\OAK\WATSON\DWUI
  • (_WINCEROOT) PRIVATE\WINCEOS\COREOS\NK\OSAXS
  • (_WINCEROOT) PRIVATE\WINCEOS\UTILS\USREXCEPTDMP
We hope that error reporting will become part of the retail images you create, mainly so you can provide better and more robust systems for your clients.



About the authors:

Abraham Kcholi (left) holds a B.Sc. degree in Pure Mathematics from London School of Economics. He has many years of experience in developing Windows CE, GIS, and realtime systems for various applications including military ones.

Gad Meir (right) has worked in the computer industry for many years (his second computer language was assembly of PDP 8). His main specialty is using MSF (Microsoft solution framework) principles to identifying errors and problems in the development/deployment processes and procedures. His main goal is trying to prevent these types of problems in the project's planning phase. Unfortunately he is usually summoned for the post mortem phase, a fact that leads directly to his second specialty, which is analyzing dumps, blue screens, and other low level plumbing tasks. Currently, Gad is the R&D Manager at IDAG Ltd.



Related stories:

(Click here for further information)


7 Advantages of D2D Backup
For decades, tape has been the backup medium of choice. But, now, disk-to-disk (D2D) backup is gaining in favor. Learn why you should make the move in this whitepaper.

4 Legal Reasons to Control Internet Access
The Internet is obviously a valuable resource for many organizations. However, many are exposed to legal liability concerns because they fail to control Internet access. Learn if you're safe in this white paper.

Rapidly Resolve J2EE Application Problems
Whether you are in the process of building J2EE applications or have J2EE applications already running in production, you must ensure that they deliver the expected ROI. Learn how in this white paper.

Load Testing 2.0 for Web 2.0
There are many unknowns in stress testing Web 2.0 applications. Find out how to test the performance of Web 2.0 in this white paper.

Build Better Games Online
For the game infrastructure providers, life is complex. Making money from games has become more complicated. Why? Find out in this white paper.

Building a Virtual Infrastructure from Servers to Storage
This white paper discusses the virtual storage solutions that reduce cost, increase storage utilization, and address the challenges of backing up and restoring Server environments.

Gaining Faster Wireless Connections with WiMAX
Welcome to what is quickly becoming the hyperconnected world where anything that would benefit from being connected to the network will be connected. Learn more in this white paper.

Is Your Desktop a Security Threat?
The new wave of sophisticated crimeware not only targets specific companies, but also targets desktops and laptops as backdoor entryways into those business’ operations and resources. Learn how to stay safe in this white paper.

Increasing SAN Reliability by 100 Percent
Storage area networks (SAN) are a strong part of storage plans. Learn how to increase your reliability and uptime by 100 percent in this case study.

 


Got a HOT tip?   please tell us!
Free weekly newsletter
Enter your email...
Click here for a profile of each sponsor:
PLATINUM SPONSORS
(Become a sponsor)

ADVERTISEMENT
(Advertise here)


Check out the latest Windows-powered...

mobile phones!

other cool
gadgets

HOT TOPICS
Microsoft targets PNDs with new embedded OS
Microsoft tips .NET MF 3.0 highlights
Microsoft previews Windows Embedded Standard
Microsoft offers free Windows CE 6.0 textbook
Microsoft renames embedded operating systems
Microsoft unveils Windows Mobile 6.1
New Atom models target low-cost PCs
REFERENCE GUIDES
Windows Device Showcase
Intro to Windows Embedded
Intro to Shared Source
Real-time Windows Embedded
Windows Embedded books
Join our Windows Embedded discussion forums:
Windows XP Embedded
Windows CE
Windows Mobile


Windows Embedded developer newsgroups
Windows CE
XP Embedded
PocketPC
Smartphone

Microsoft's Windows Embedded resources
Embedded dev center
Mobile dev center
Windows CE tutorials
XP Embedded tutorials
Windows Embedded seminars
Windows Embedded application categories
3rd-party partners


BREAKING NEWS

• Upated JVM supports Windows CE
• Windows Mobile 6.1 phone has GPS
• Windows CE thin client hides in wall sockets
• Portable spectrum analyzer runs Windows CE
• VoIP client gains add-ons, API
• Windows Mobile phone has dual active SIMs
• Access gives away Windows Mobile utilities
• Intel's Atom powers mini-ITX board
• Microsoft revamps Windows Mobile website
• Low-cost phone bundles IM client
• Pico-ITX board bears twins
• Microsoft details finalists in student competition
• Intrinsyc revs Windows CE-based software platform
• $300 mini-laptop runs Windows CE
• Microsoft releases server virtualization technology


MOST POPULAR (last 90 days)
Windows Mobile 6 SDKs available for download
Guide to HTC's Windows Mobile smartphone platforms
Microsoft unveils Windows Mobile 6.1
HTC announces unlocked Windows Mobile 6.1 phone
UMPC squeezes in optical drive
Running Windows Mobile 6.1 on your desktop computer
Microsoft releases Windows XP Service Pack 3
Mobile Firefox gets speedup, design tweaks
MOST POPULAR (Classics from the vault)
The Windows Mobile Phones Showcase
Windows XP Embedded USB boot
Troubleshooting Windows XPe's blue screen "Stop 0x0000007B" error
Asus reveals $190 mini notebook
HTC adds GPS to Windows Mobile Touch line
Windows Mobile VPN client plays with Cisco
Guide to HTC's Windows Mobile smartphone platforms
Customizing Windows XP Embedded thin clients
The Windows Mobile Pocket PCs Showcase

Also visit our sister sites:


Sign up for WindowsForDevices.com's...

news feed

Home  |  News  |  Articles  |  Polls  |  Forum  |  Directory  |  About  |  Contact
 

Ziff Davis Enterprise Home | Contact Us | Advertise | Link to Us | Reprints | Magazine Subscriptions | Newsletters
Tech RSS Feeds | White Papers | ROI Calculators | Tech Podcasts | Tech Video | VARs | Channel News

Baseline | Careers | Channel Insider | CIO Insight | DesktopLinux | DeviceForge | DevSource | eSeminars |
eWEEK | Enterprise Network Security | LinuxDevices | Linux Watch | Microsoft Watch | Mid-market | Networking | PDF Zone |
Publish | Security IT Hub | Strategic Partner | Web Buyer's Guide | Windows for Devices

Developer Shed | Dev Shed | ASP Free | Dev Articles | Dev Hardware | SEO Chat | Tutorialized | Scripts |
Code Walkers | Web Hosters | Dev Mechanic | Dev Archives | igrep

Use of this site is governed by our Terms of Service and Privacy Policy. Except where otherwise specified, the contents of this site are copyright © 1999-2008 Ziff Davis Enterprise Holdings Inc. All Rights Reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis Enterprise is prohibited. Windows is a trademark or registered trademark of Microsoft Corporation in the United States and/or other countries and is used by WindowsForDevices under license from owner. All other marks are the property of their respective owners. WindowsForDevices is an independent publication not affiliated with Microsoft Corporation.