The LLDB Debugger

Variable display

LLDB was recently modified to allow users to define custom formatting options for the variables display.

Usually, when you type frame variable or run some expression LLDB will automatically choose a format to display your results on a per-type basis, as in the following example:

(lldb) frame variable -T sp
(SimpleWithPointers) sp = {
    (int *) x = 0x0000000100100120
    (float *) y = 0x0000000100100130
    (char *) z = 0x0000000100100140 "6"
}

However, in certain cases, you may want to associate a different format to the display for certain datatypes. To do so, you need to give hints to the debugger as to how datatypes should be displayed.
A new type command has been introduced in LLDB which allows to do just that.

Using it you can obtain a format like this one for sp, instead of the default shown above:

(lldb) frame variable sp
(SimpleWithPointers) sp = (x=0x0000000100100120 -> -1, y=0x0000000100100130 -> -2, z="3")

There are two kinds of printing options: summary and format. While a detailed description of both will be given below, one can briefly say that a summary is mainly used for aggregate types, while a format is attached to primitive types.

To reflect this, the the type command has two subcommands:

type format

type summary

These commands are meant to bind printing options to types. When variables are printed, LLDB will first check if custom printing options have been associated to a variable's type and, if so, use them instead of picking the default choices.

The two commands type format and type summary each have four subcommands:

add: associates a new printing option to one or more types

delete: deletes an existing association

list: provides a listing of all associations

clear: deletes all associations

type format

Type formats enable you to quickly override the default format for displaying primitive types (the usual basic C/C++/ObjC types: int, float, char, ...).

If for some reason you want all int variables in your program to print out as hex, you can add a format to the int type.

This is done by typing

(lldb) type format add -f hex int
at the LLDB command line.

The -f option accepts a format name, and a list of types to which you want the new format applied.

A frequent scenario is that your program has a typedef for a numeric type that you know represents something that must be printed in a certain way. Again, you can add a format just to that typedef by using type format add with the name alias.

But things can quickly get hierarchical. Let's say you have a situation like the following:

typedef int A;
typedef A B;
typedef B C;
typedef C D;

and you want to show all A's as hex, all C's as pointers and leave the defaults untouched for other types.

If you simply type

(lldb) type format add -f hex A
(lldb) type format add -f pointer C

values of type B will be shown as hex and values of type D as pointers.

This is because by default LLDB cascades formats through typedef chains. In order to avoid that you can use the option -C no to prevent cascading, thus making the two commands required to achieve your goal:

(lldb) type format add -C no -f hex A
(lldb) type format add -C no -f pointer C

Two additional options that you will want to look at are -p and -r. These two options prevent LLDB from applying a format for type T to values of type T* and T& respectively.

(lldb) type format add -f float32[] int
(lldb) fr var pointer *pointer -T
(int *) pointer = {1.46991e-39 1.4013e-45}
(int) *pointer = {1.53302e-42}
(lldb) type format add -f float32[] int -p
(lldb) fr var pointer *pointer -T
(int *) pointer = 0x0000000100100180
(int) *pointer = {1.53302e-42}

As the previous example highlights, you will most probably want to use -p for your formats.

If you need to delete a custom format simply type type format delete followed by the name of the type to which the format applies. To delete ALL formats, use type format clear. To see all the formats defined, type type format list.

If all you need to do, however, is display one variable in a custom format, while leaving the others of the same type untouched, you can simply type:

(lldb) frame variable counter -f hex

This has the effect of displaying the value of counter as an hexadecimal number, and will keep showing it this way until you either pick a different format or till you let your program run again.

Finally, this is a list of formatting options available out of which you can pick:

Format name Abbreviation Description
default
the default LLDB algorithm is used to pick a format
boolean B show this as a true/false boolean, using the customary rule that 0 is false and everything else is true
binary b show this as a sequence of bits
bytes y show the bytes one after the other
e.g. (int) s.x = 07 00 00 00
bytes with ASCII Y show the bytes, but try to print them as ASCII characters
e.g. (int *) c.sp.x = 50 f8 bf 5f ff 7f 00 00 P.._....
character c show the bytes printed as ASCII characters
e.g. (int *) c.sp.x = P\xf8\xbf_\xff\x7f\0\0
printable character C show the bytes printed as printable ASCII characters
e.g. (int *) c.sp.x = P.._....
complex float F interpret this value as the real and imaginary part of a complex floating-point number
e.g. (int *) c.sp.x = 2.76658e+19 + 4.59163e-41i
c-string s show this as a 0-terminated C string
signed decimal i show this as a signed integer number (this does not perform a cast, it simply shows the bytes as signed integer)
enumeration E show this as an enumeration, printing the value's name if available or the integer value otherwise
e.g. (enum enumType) val_type = eValue2
hex x show this as in hexadecimal notation (this does not perform a cast, it simply shows the bytes as hex)
float f show this as a floating-point number (this does not perform a cast, it simply interprets the bytes as an IEEE754 floating-point value)
octal o show this in octal notation
OSType O show this as a MacOS OSType
e.g. (float) *c.sp.y = '\n\x1f\xd7\n'
unicode16 U show this as UTF-16 characters
e.g. (float) *c.sp.y = 0xd70a 0x411f
unicode32
show this as UTF-32 characters
e.g. (float) *c.sp.y = 0x411fd70a
unsigned decimal u show this as an unsigned integer number (this does not perform a cast, it simply shows the bytes as unsigned integer)
pointer p show this as a native pointer (unless this is really a pointer, the resulting address will probably be invalid)
char[]
show this as an array of characters
e.g. (char) *c.sp.z = {X}
int8_t[], uint8_t[]
int16_t[], uint16_t[]
int32_t[], uint32_t[]
int64_t[], uint64_t[]
uint128_t[]

show this as an array of the corresponding integer type
e.g.
(int) sarray[0].x = {1 0 0 0}
(int) sarray[0].x = {0x00000001}
float32[], float64[]
show this as an array of the corresponding floating-point type
e.g. (int *) pointer = {1.46991e-39 1.4013e-45}
complex integer I interpret this value as the real and imaginary part of a complex integer number
e.g. (int *) pointer = 1048960 + 1i
character array a show this as a character array
e.g. (int *) pointer = \x80\x01\x10\0\x01\0\0\0

type summary

Type formats work by showing a different kind of display for the value of a variable. However, they only work for basic types. When you want to display a class or struct in a custom format, you cannot do that using formats.

A different feature, type summaries, works by extracting information from classes, structures, ... (aggregate types) and arranging it in a user-defined format, as in the following example:

before adding a summary...
(lldb) fr var -T one
(i_am_cool) one = {
    (int) integer = 3
    (float) floating = 3.14159
    (char) character = 'E'
}

after adding a summary...
(lldb) fr var one
(i_am_cool) one = int = 3, float = 3.14159, char = 69

There are two ways to use type summaries: the first one is to bind a summary string to the datatype; the second is to bind a Python script to the datatype. Both options are enabled by the type summary add command.

In the example, the command we type was:

(lldb) type summary add -f "int = ${var.integer}, float = ${var.floating}, char = ${var.character%u}" i_am_cool

Initially, we will focus on summary strings, and then describe the Python binding mechanism.

Summary Strings

While you may already have guessed a lot about the format of summary strings from the above example, a detailed description of their format follows.

Summary strings can contain plain text, control characters and special symbols that have access to information about the current object and the overall program state.

Normal characters are any text that doesn't contain a '{', '}', '$', or '\' character.

Variable names are found in between a "${" prefix, and end with a "}" suffix. In other words, a variable looks like "${frame.pc}".

Basically, all the variables described in Frame and Thread Formatting are accepted. Also acceptable are the control characters and scoping features described in that page. Additionally, ${var and ${*var become acceptable symbols in this scenario.

The simplest thing you can do is grab a member variable of a class or structure by typing its expression path. In the previous example, the expression path for the floating member is simply .floating. Thus, to ask the summary string to display floating you would type ${var.floating} (${var is a placeholder token replaced with whatever variable is being displayed).

If you have code like the following:
struct A {
    int x;
    int y;
};
struct B {
    A x;
    A y;
    int z;
};
the expression path for the y member of the x member of an object of type B would be .x.y and you would type ${var.x.y} to display it in a summary string for type B.

As you could be using a summary string for both displaying objects of type T or T* (unless -p is used to prevent this), the expression paths do not differentiate between . and ->, and the above expression path .x.y would be just as good if you were displaying a B*, or even if the actual definition of B were:
struct B {
    A *x;
    A y;
    int z;
};

This is unlike the behaviour of frame variable which, on the contrary, will enforce the distinction. As hinted above, the rationale for this choice is that waiving this distinction enables one to write a summary string once for type T and use it for both T and T* instances. As a summary string is mostly about extracting nested members' information, a pointer to an object is just as good as the object itself for the purpose.

Of course, you can have multiple entries in one summary string, as shown in the previous example.

As you can see, the last expression path also contains a %u symbol which is nowhere to be found in the actual member variable name. The symbol is reminding of a printf() format symbol, and in fact it has a similar effect. If you add a % sign followed by any one format name or abbreviation from the above table after an expression path, the resulting object will be displyed using the chosen format.

You can also use some other special format markers, not available for type formatters, but which carry a special meaning when used in this context:

Symbol Description
%S Use this object's summary (the default for aggregate types)
%V Use this object's value (the default for non-aggregate types)
%@ Use a language-runtime specific description (for C++ this does nothing, for Objective-C it calls the NSPrintForDebugger API)
%L Use this object's location (memory address, register name, ...)

As previously said, pointers and values are treated the same way when getting to their members in an expression path. However, if your expression path leads to a pointer, LLDB will not automatically dereference it. In order to obtain The deferenced value for a pointer, your expression path must start with ${*var instead of ${var. Because there is no need to dereference pointers along your way, the dereferencing symbol only applies to the result of the whole expression path traversing.
e.g.
(lldb) fr var -T c
(Couple) c = {
    (SimpleWithPointers) sp = {
        (int *) x = 0x00000001001000b0
        (float *) y = 0x00000001001000c0
        (char *) z = 0x00000001001000d0 "X"
    }
    (Simple *) s = 0x00000001001000e0
}

If one types the following commands:

(lldb) type summary add -f "int = ${*var.sp.x}, float = ${*var.sp.y}, char = ${*var.sp.z%u}, Simple = ${*var.s}" Couple
(lldb) type summary add -c -p Simple

the output becomes:
(lldb) fr var c
(Couple) c = int = 9, float = 9.99, char = 88, Simple = (x=9, y=9.99, z='X')

Option -c to type summary add tells LLDB not to look for a summary string, but instead to just print a listing of all the object's children on one line, as shown in the summary for object Simple.

We are using the -p flag here to show that aggregate types can be dereferenced as well as basic types. The following command sequence would work just as well and produce the same output:

(lldb) type summary add -f "int = ${*var.sp.x}, float = ${*var.sp.y}, char = ${*var.sp.z%u}, Simple = ${var.s}" Couple
(lldb) type summary add -c Simple

Bitfields and array syntax

Sometimes, a basic type's value actually represents several different values packed together in a bitfield. With the classical view, there is no way to look at them. Hexadecimal display can help, but if the bits actually span byte boundaries, the help is limited. Binary view would show it all without ambiguity, but is often too detailed and hard to read for real-life scenarios. To cope with the issue, LLDB supports native bitfield formatting in summary strings. If your expression paths leads to a so-called scalar type (the usual int, float, char, double, short, long, long long, double, long double and unsigned variants), you can ask LLDB to only grab some bits out of the value and display them in any format you like. The syntax is similar to that used for arrays, just you can also give a pair of indices separated by a -.
e.g.
(lldb) fr var float_point
(float) float_point = -3.14159

(lldb) type summary add -f "Sign: ${var[31]%B} Exponent: ${var[30-23]%x} Mantissa: ${var[0-22]%u}" float

(lldb) fr var float_point
(float) float_point = -3.14159 Sign: true Exponent: 0x00000080 Mantissa: 4788184
In this example, LLDB shows the internal representation of a float variable by extracting bitfields out of a float object.

As far as the syntax is concerned, it looks much like the normal C array syntax, but also allows you to specify 2 indices, separated by a - symbol (a range). Ranges can be given either with the lower or the higher index first, and range extremes are always included in the bits extracted.

LLDB also allows to use a similar syntax to display array members inside a summary string. For instance, you may want to display all arrays of a given type using a more compact notation than the default, and then just delve into individual array members that prove interesting to your debugging task. You can tell LLDB to format arrays in special ways, possibly independent of the way the array members' datatype is formatted.
e.g.
(lldb) fr var sarray
(Simple [3]) sarray = {
    [0] = {
        x = 1
        y = 2
        z = '\x03'
    }
    [1] = {
        x = 4
        y = 5
        z = '\x06'
    }
    [2] = {
        x = 7
        y = 8
        z = '\t'
    }
}

(lldb) type summary add -f "${var[].x}" "Simple [3]"

(lldb) fr var sarray
(Simple [3]) sarray = [1,4,7]

The [] symbol amounts to: if var is an array and I knows its size, apply this summary string to every element of the array. Here, we are asking LLDB to display .x for every element of the array, and in fact this is what happens. If you find some of those integers anomalous, you can then inspect that one item in greater detail, without the array format getting in the way:
(lldb) fr var sarray[1]
(Simple) sarray[1] = {
    x = 4
    y = 5
    z = '\x06'
}

You can also ask LLDB to only print a subset of the array range by using the same syntax used to extract bit for bitfields:

(lldb) type summary add -f "${var[1-2].x}" "Simple [3]"

(lldb) fr var sarray
(Simple [3]) sarray = [4,7]

The same logic works if you are printing a pointer instead of an array, however in this latter case, the empty square brackets operator [] cannot be used and you need to give exact range limits.

In general, LLDB needs the square brackets operator [] in order to handle arrays and pointers correctly, and for pointers it also needs a range. However, a few special cases are defined to make your life easier:

  • you can print a 0-terminated string (C-string) using the %s format, omitting square brackets, as in:
(lldb) type summary add -f "${var%s}" "char *"
This works for char* and char[] objects, and uses the \0 terminator when possible to terminate the string, instead of relying on array length.
  • anyone of the array formats (int8_t[], float32{}, ...), and the y, Y and a formats work to print an array of a non-aggregate type, even if square brackets are omitted.
(lldb) type summary add -f "${var%int32_t[]}" "int [10]"
This feature, however, is not enabled for pointers because there is no way for LLDB to detect the end of the pointed data.
This also does not work for other formats (e.g. boolean), and you must specify the square brackets operator to get the expected output.

Python scripting

Most of the times, summary strings prove good enough for the job of summarizing the contents of a variable. However, as soon as you need to do more than picking some values and rearranging them for display, summary strings stop being an effective tool. This is because summary strings lack the power to actually perform some computation on the value of variables.

To solve this issue, you can bind some Python scripting code as a summary for your datatype, and that script has the ability to both extract children variables as the summary strings do and to perform active computation on the extracted values. As a small example, let's say we have a Rectangle class:

class Rectangle
{
private:
    int height;
    int width;
public:
    Rectangle() : height(3), width(5) {}
    Rectangle(int H) : height(H), width(H*2-1) {}
    Rectangle(int H, int W) : height(H), width(W) {}
    int GetHeight() { return height; }
    int GetWidth() { return width; }
};

Summary strings are effective to reduce the screen real estate used by the default viewing mode, but are not effective if we want to display the area, perimeter and length of diagonal of Rectangle objects

To obtain this, we can simply attach a small Python script to the Rectangle class, as shown in this example:

(lldb) type summary add -P Rectangle
Enter your Python command(s). Type 'DONE' to end.
def function (valobj,dict):
    height_val = valobj.GetChildMemberWithName('height')
    width_val = valobj.GetChildMemberWithName('width')
    height_str = height_val.GetValue()
    width_str = width_val.GetValue()
    height = int(height_str)
    width = int(width_str)
    area = height*width
    perimeter = 2*height + 2*width
    diag = sqrt(height*height+width*width)
    return 'Area: ' + str(area) + ', Perimeter: ' + str(perimeter) + ', Diagonal: ' + str(diag)
    DONE
(lldb) script
Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
>>> from math import sqrt
>>> quit()
(lldb) frame variable
(Rectangle) r1 = Area: 20, Perimeter: 18, Diagonal: 6.40312423743
(Rectangle) r2 = Area: 72, Perimeter: 36, Diagonal: 13.416407865
(Rectangle) r3 = Area: 16, Perimeter: 16, Diagonal: 5.65685424949

In this scenario, you need to enter the interactive interpreter to import the function sqrt() from the math library. As the example shows, everything you enter into the interactive interpreter is saved for you to use it in scripts. This way you can define your own utility functions and use them in your summary scripts if necessary.

In order to write effective summary scripts, you need to know the LLDB public API, which is the way Python code can access the LLDB object model. For further details on the API you should look at this page, or at the LLDB doxygen documentation when it becomes available.

As a brief introduction, your script is encapsulated into a function that is passed two parameters: valobj and dict.

dict is an internal support parameter used by LLDB and you should not use it.
valobj is the object encapsulating the actual variable being displayed, and its type is SBValue. The most important thing you can do with an SBValue is retrieve its children objects, by calling GetChildMemberWithName(), passing it the child's name as a string, or ask it for its value, by calling GetValue(), which returns a Python string.

If you need to delve into several levels of hierarchy, as you can do with summary strings, you must use the method GetValueForExpressionPath(), passing it an expression path just like those you could use for summary strings. However, if you need to access array slices, you cannot do that (yet) via this method call, and you must use GetChildMemberWithName() querying it for the array items one by one.

Other than interactively typing a Python script there are two other ways for you to input a Python script as a summary:

  • using the -s option to type summary add and typing the script code as an option argument; as in:
(lldb) type summary add -s "height = int(valobj.GetChildMemberWithName('height').GetValue());width = int(valobj.GetChildMemberWithName('width').GetValue()); return 'Area: ' + str(height*width)" Rectangle
  • using the -F option to type summary add and giving the name of a Python function with the correct prototype. Most probably, you will define (or have already defined) the function in the interactive interpreter, or somehow loaded it from a file.

Regular expression typenames

As you noticed, in order to associate the custom summary string to the array types, one must give the array size as part of the typename. This can long become tiresome when using arrays of different sizes, Simple [3], Simple [9], Simple [12], ...

If you use the -x option, type names are treated as regular expressions instead of type names. This would let you rephrase the above example for arrays of type Simple [3] as:

(lldb) type summary add -f "${var[].x}" -x "Simple \[[0-9]+\]"
(lldb) fr var sarray
(Simple [3]) sarray = [1,4,7]
The above scenario works for Simple [3] as well as for any other array of Simple objects.

While this feature is mostly useful for arrays, you could also use regular expressions to catch other type sets grouped by name. However, as regular expression matching is slower than normal name matching, LLDB will first try to match by name in any way it can, and only when this fails, will it resort to regular expression matching. Thus, if your type has a base class with a cascading summary, this will be preferred over any regular expression match for your type itself.

Named summaries

For a given datatype, there may be different meaningful summary representations. However, currently, only one summary can be associated to a given datatype. If you need to temporarily override the association for a variable, without changing the summary string bound to the datatype, you can use named summaries.

Named summaries work by attaching a name to a summary string when creating it. Then, when there is a need to attach the summary string to a variable, the frame variable command, supports a --summary option that tells LLDB to use the named summary given instead of the default one.

(lldb) type summary add -f "x=${var.integer}" --name NamedSummary
(lldb) fr var one
(i_am_cool) one = int = 3, float = 3.14159, char = 69
(lldb) fr var one --summary NamedSummary
(i_am_cool) one = x=3

When defining a named summmary, binding it to one or more types becomes optional. Even if you bind the named summary to a type, and later change the summary string for that type, the named summary will not be changed by that. You can delete named summaries by using the type summary delete command, as if the summary name was the datatype that the summary is applied to

A summary attached to a variable using the --summary option, has the same semantics that a custom format attached using the -f option has: it stays attached till you attach a new one, or till you let your program run again.

Finding summaries 101

While the rules for finding an appropriate format for a type are relatively simple (just go through typedef hierarchies), summaries follow a more complicated process in finding the right summary string for a variable. Namely, what happens is:

  • If there is a summary for the type of the variable, use it
  • If this object is a pointer, and there is a summary for the pointee type that does not skip pointers, use it
  • If this object is a reference, and there is a summary for the pointee type that does not skip references, use it
  • If this object is an Objective-C class with a parent class, look at the parent class (and parent of parent, ...)
  • If this object is a C++ class with base classes, look at base classes (and bases of bases, ...)
  • If this object is a C++ class with virtual base classes, look at the virtual base classes (and bases of bases, ...)
  • If this object's type is a typedef, go through typedef hierarchy (LLDB might not be able to do this if the compiler has not emitted enough information. If the required information to traverse typedef hierarchies is missing, type cascading will not work. The clang compiler, part of the LLVM project, emits the correct debugging information for LLDB to cascade)
  • If everything has failed, repeat the above search, looking for regular expressions instead of exact matches

TODOs

  • There's no way to do multiple dereferencing, and you need to be careful what the dereferencing operation is binding to in complicated scenarios
  • type format add does not support the -x option
  • Object location cannot be printed in the summary string