XAA.HOWTO   [plain text]




                          XAA.HOWTO

  This file describes how to add basic XAA support to a chipset driver.

0)  What is XAA
1)  XAA Initialization and Shutdown
2)  The Primitives
  2.0  Generic Flags
  2.1  Screen to Screen Copies
  2.2  Solid Fills
  2.3  Solid Lines
  2.4  Dashed Lines
  2.5  Color Expand Fills
    2.5.1 Screen to Screen Color Expansion
    2.5.2 CPU to Screen Color Expansion
      2.5.2.1 The Direct Method
      2.5.2.2 The Indirect Method
  2.6  8x8 Mono Pattern Fills
  2.7  8x8 Color Pattern Fills
  2.8  Image Writes
    2.8.1 The Direct Method
    2.8.2 The Indirect Method
  2.9 Clipping
3)  The Pixmap Cache
4)  Offscreen Pixmaps

/********************************************************************/

0) WHAT IS XAA
	
   XAA (the XFree86 Acceleration Architecture) is a device dependent
layer that encapsulates the unaccelerated framebuffer rendering layer,
intercepting rendering commands sent to it from higher levels of the
server.  For rendering tasks where hardware acceleration is not 
possible, XAA allows the requests to proceed to the software rendering
code.  Otherwise, XAA breaks the sometimes complicated X primitives
into simpler primitives more suitable for hardware acceleration and
will use accelerated functions exported by the chipset driver to 
render these.

   XAA provides a simple, easy to use driver interface that allows
the driver to communicate its acceleration capabilities and restrictions
back to XAA.  XAA will use the information provided by the driver
to determine whether or not acceleration will be possible for a
particular X primitive.



1) XAA INITIALIZATION AND SHUTDOWN

   All relevant prototypes and defines are in xaa.h.

   To Initialize the XAA layer, the driver should allocate an XAAInfoRec
via XAACreateInfoRec(), fill it out as described in this document
and pass it to XAAInit().  XAAInit() must be called _after_ the 
framebuffer initialization (usually cfb?ScreenInit or similar) since 
it is "wrapping" that layer.  XAAInit() should be called _before_ the 
cursor initialization (usually miDCInitialize) since the cursor
layer needs to "wrap" all the rendering code including XAA.

   When shutting down, the driver should free the XAAInfoRec
structure in its CloseScreen function via XAADestroyInfoRec().
The prototypes for the functions mentioned above are as follows:

   XAAInfoRecPtr XAACreateInfoRec(void);
   Bool XAAInit(ScreenPtr, XAAInfoRecPtr);
   void XAADestroyInfoRec(XAAInfoRec);

   The driver informs XAA of it's acceleration capablities by
filling out an XAAInfoRec structure and passing it to XAAInit().
The XAAInfoRec structure contains many fields, most of which are
function pointers and flags.  Each primitive will typically have
two functions and a set of flags associated with it, but it may
have more.  These two functions are the "SetupFor" and "Subsequent" 
functions.  The "SetupFor" function tells the driver that the 
hardware should be initialized for a particular type of graphics 
operation.  After the "SetupFor" function, one or more calls to the 
"Subsequent" function will be made to indicate that an instance
of the particular primitive should be rendered by the hardware.
The details of each instance (width, height, etc...) are given
with each "Subsequent" function.   The set of flags associated
with each primitive lets the driver tell XAA what its hardware
limitations are (eg. It doesn't support a planemask, it can only
do one of the raster-ops, etc...).

  Of the XAAInfoRec fields, one is required.  This is the
Sync function.  XAA initialization will fail if this function
is not provided.

void Sync(ScrnInfoPtr pScrn)			/* Required */

   Sync will be called when XAA needs to be certain that all
   graphics coprocessor operations are finished, such as when
   the framebuffer must be written to or read from directly
   and it must be certain that the accelerator will not be
   overwriting the area of interest.

   One needs to make certain that the Sync function not only
   waits for the accelerator fifo to empty, but that it waits for
   the rendering of that last operation to complete.

   It is guaranteed that no direct framebuffer access will
   occur after a "SetupFor" or "Subsequent" function without
   the Sync function being called first.



2)  THE PRIMITIVES

2.0  Generic Flags

  Each primitive type has a set of flags associated with it which
allow the driver to tell XAA what the hardware limitations are.
The common ones are as follows:

/* Foreground, Background, rop and planemask restrictions */

   GXCOPY_ONLY

     This indicates that the accelerator only supports GXcopy
     for the particular primitive.

   ROP_NEEDS_SOURCE

     This indicates that the accelerator doesn't supports a
     particular primitive with rops that don't involve the source.
     These rops are GXclear, GXnoop, GXinvert and GXset. If neither
     this flag nor GXCOPY_ONLY is defined, it is assumed that the
     accelerator supports all 16 raster operations (rops) for that
     primitive.

   NO_PLANEMASK

     This indicates that the accelerator does not support a hardware
     write planemask for the particular primitive.

   RGB_EQUAL

     This indicates that the particular primitive requires the red, 
     green and blue bytes of the foreground color (and background color,
     if applicable) to be equal. This is useful for 24bpp when a graphics
     coprocessor is used in 8bpp mode, which is not uncommon in older
     hardware since some have no support for or only limited support for 
     acceleration at 24bpp. This way, many operations will be accelerated 
     for the common case of "grayscale" colors.  This flag should only
     be used in 24bpp.

  In addition to the common ones listed above which are possible for
nearly all primitives, each primitive may have its own flags specific
to that primitive.  If such flags exist they are documented in the
descriptions of those primitives below.
 



2.1  Screen to Screen Copies

   The SetupFor and Subsequent ScreenToScreenCopy functions provide
   an interface for copying rectangular areas from video memory to
   video memory.  To accelerate this primitive the driver should
   provide both the SetupFor and Subsequent functions and indicate
   the hardware restrictions via the ScreenToScreenCopyFlags.  The
   NO_PLANEMASK, GXCOPY_ONLY and ROP_NEEDS_SOURCE flags as described
   in Section 2.0 are valid as well as the following:

    NO_TRANSPARENCY
     
      This indicates that the accelerator does not support skipping
      of color keyed pixels when copying from the source to the destination.

    TRANSPARENCY_GXCOPY_ONLY

      This indicates that the accelerator supports skipping of color keyed
      pixels only when the rop is GXcopy.

    ONLY_LEFT_TO_RIGHT_BITBLT

      This indicates that the hardware only accepts blitting when the
      x direction is positive.

    ONLY_TWO_BITBLT_DIRECTIONS

      This indicates that the hardware can only cope with blitting when
      the direction of x is the same as the direction in y.


void SetupForScreenToScreenCopy( ScrnInfoPtr pScrn,
			int xdir, int ydir,
			int rop,
			unsigned int planemask,
			int trans_color )

    When this is called, SubsequentScreenToScreenCopy will be called
    one or more times directly after.  If ydir is 1, then the accelerator
    should copy starting from the top (minimum y) of the source and
    proceed downward.  If ydir is -1, then the accelerator should copy
    starting from the bottom of the source (maximum y) and proceed
    upward.  If xdir is 1, then the accelerator should copy each
    y scanline starting from the leftmost pixel of the source.  If
    xdir is -1, it should start from the rightmost pixel.  
       If trans_color is not -1 then trans_color indicates that the
    accelerator should not copy pixels with the color trans_color
    from the source to the destination, but should skip them. 
    Trans_color is always -1 if the NO_TRANSPARENCY flag is set.
 

void SubsequentScreenToScreenCopy(ScrnInfoPtr pScrn,
			int x1, int y1,
			int x2, int y2,
			int width, int height)

    Copy a rectangle "width" x "height" from the source (x1,y1) to the 
    destination (x2,y2) using the parameters passed by the last
    SetupForScreenToScreenCopy call. (x1,y1) and (x2,y2) always denote 
    the upper left hand corners of the source and destination regardless 
    of which xdir and ydir values are given by SetupForScreenToScreenCopy.  



2.2 Solid Fills

   The SetupFor and Subsequent SolidFill(Rect/Trap) functions provide
   an interface for filling rectangular areas of the screen with a
   foreground color.  To accelerate this primitive the driver should
   provide both the SetupForSolidFill and SubsequentSolidFillRect 
   functions and indicate the hardware restrictions via the SolidFillFlags.
   The driver may optionally provide a SubsequentSolidFillTrap if
   it is capable of rendering the primitive correctly.  
   The GXCOPY_ONLY, ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags
   as described in Section 2.0 are valid.

  
void SetupForSolidFill(ScrnInfoPtr pScrn, 
                       int color, int rop, unsigned int planemask)

    SetupForSolidFill indicates that any combination of the following 
    may follow it.

	SubsequentSolidFillRect
	SubsequentSolidFillTrap


 
void SubsequentSolidFillRect(ScrnInfoPtr pScrn, int x, int y, int w, int h)

     Fill a rectangle of dimensions "w" by "h" with origin at (x,y) 
     using the color, rop and planemask given by the last 
     SetupForSolidFill call.

void SubsequentSolidFillTrap(ScrnInfoPtr pScrn, int y, int h, 
	int left, int dxL, int dyL, int eL,
	int right, int dxR, int dyR, int eR)

     These parameters describe a trapezoid via a version of
     Bresenham's parameters. "y" is the top line. "h" is the
     number of spans to be filled in the positive Y direction.
     "left" and "right" indicate the starting X values of the
     left and right edges.  dy/dx describes the edge slope.
     These are not the deltas between the beginning and ending
     points on an edge.  They merely describe the slope. "e" is
     the initial error term.  It's the relationships between dx,
     dy and e that define the edge.
	If your engine does not do bresenham trapezoids or does
     not allow the programmer to specify the error term then
     you are not expected to be able to accelerate them.


2.3  Solid Lines

    XAA provides an interface for drawing thin lines.  In order to
    draw X lines correctly a high degree of accuracy is required.
    This usually limits line acceleration to hardware which has a
    Bresenham line engine, though depending on the algorithm used,
    other line engines may come close if they accept 16 bit line 
    deltas.  XAA has both a Bresenham line interface and a two-point
    line interface for drawing lines of arbitrary orientation.  
    Additionally there is a SubsequentSolidHorVertLine which will
    be used for all horizontal and vertical lines.  Horizontal and
    vertical lines are handled separately since hardware that doesn't
    have a line engine (or has one that is unusable due to precision
    problems) can usually draw these lines by some other method such
    as drawing them as thin rectangles.  Even for hardware that can
    draw arbitrary lines via the Bresenham or two-point interfaces,
    the SubsequentSolidHorVertLine is used for horizontal and vertical
    lines since most hardware is able to render the horizontal lines
    and sometimes the vertical lines faster by other methods (Hint:
    try rendering horizontal lines as flattened rectangles).  If you have 
    not provided a SubsequentSolidHorVertLine but you have provided 
    Bresenham or two-point lines, a SubsequentSolidHorVertLine function 
    will be supplied for you.

    The flags field associated with Solid Lines is SolidLineFlags and 
    the GXCOPY_ONLY, ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags as
    described in Section 2.0 are valid restrictions.  

    Some line engines have line biases hardcoded to comply with
    Microsoft line biasing rules.  A tell-tale sign of this is the
    hardware lines not matching the software lines in the zeroth and
    fourth octants.  The driver can set the flag:
	
	MICROSOFT_ZERO_LINE_BIAS

    in the AccelInfoRec.Flags field to adjust the software lines to
    match the hardware lines.   This is in the generic flags field
    rather than the SolidLineFlags since this flag applies to all
    software zero-width lines on the screen and not just the solid ones.


void SetupForSolidLine(ScrnInfoPtr pScrn, 
                       int color, int rop, unsigned int planemask)

    SetupForSolidLine indicates that any combination of the following 
    may follow it.

	SubsequentSolidBresenhamLine
	SubsequentSolidTwoPointLine
        SubsequentSolidHorVertLine 	


void SubsequentSolidHorVertLine( ScrnInfoPtr pScrn,
        			int x, int y, int len, int dir )

    All vertical and horizontal solid thin lines are rendered with
    this function.  The line starts at coordinate (x,y) and extends
    "len" pixels inclusive.  In the direction indicated by "dir."
    The direction is either DEGREES_O or DEGREES_270.  That is, it
    always extends to the right or down.



void SubsequentSolidTwoPointLine(ScrnInfoPtr pScrn,
        	int x1, int y1, int x2, int y2, int flags)

    Draw a line from (x1,y1) to (x2,y2).  If the flags field contains
    the flag OMIT_LAST, the last pixel should not be drawn.  Otherwise,
    the pixel at (x2,y2) should be drawn.

    If you use the TwoPoint line interface there is a good possibility
    that your line engine has hard-coded line biases that do not match
    the default X zero-width lines.  If so, you may need to set the
    MICROSOFT_ZERO_LINE_BIAS flag described above.  Note that since
    any vertex in the 16-bit signed coordinate system is valid, your
    line engine is expected to handle 16-bit values if you have hardware
    line clipping enabled.  If your engine cannot handle 16-bit values,
    you should not use hardware line clipping.


void SubsequentSolidBresenhamLine(ScrnInfoPtr pScrn,
        int x, int y, int major, int minor, int err, int len, int octant)

    "X" and "y" are the starting point of the line.  "Major" and "minor" 
    are the major and minor step constants.  "Err" is the initial error
    term.  "Len" is the number of pixels to be drawn (inclusive). "Octant"
    can be any combination of the following flags OR'd together:

      Y_MAJOR		Y is the major axis (X otherwise)
      X_DECREASING	The line is drawn from right to left
      Y_DECREASING	The line is drawn from bottom to top
	  
    The major, minor and err terms are the "raw" Bresenham parameters
    consistent with a line engine that does:

	e = err;
	while(len--) {
	   DRAW_POINT(x,y);
	   e += minor;
	   if(e >= 0) {
		e -= major;
		TAKE_ONE_STEP_ALONG_MINOR_AXIS;
	   }
	   TAKE_ONE_STEP_ALONG_MAJOR_AXIS;
	}

    IBM 8514 style Bresenham line interfaces require their parameters
    modified in the following way:

	Axial = minor;
	Diagonal = minor - major;
	Error = minor + err;

SolidBresenhamLineErrorTermBits

    This field allows the driver to tell XAA how many bits large its
    Bresenham parameter registers are.  Many engines have registers that
    only accept 12 or 13 bit Bresenham parameters, and the parameters
    for clipped lines may overflow these if they are not scaled down.
    If this field is not set, XAA will assume the engine can accomodate
    16 bit parameters, otherwise, it will scale the parameters to the
    size specified.


2.4  Dashed Lines

    The same degree of accuracy required by the solid lines is required
    for drawing dashed lines as well.  The dash pattern itself is a
    buffer of binary data where ones are expanded into the foreground
    color and zeros either correspond to the background color or
    indicate transparency depending on whether or not DoubleDash or
    OnOffDashes are being drawn.  

    The flags field associated with dashed Lines is DashedLineFlags and 
    the GXCOPY_ONLY, ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags as
    described in Section 2.0 are valid restrictions.  Additionally, the
    following flags are valid:

      NO_TRANSPARENCY

	This indicates that the driver cannot support dashed lines
	with transparent backgrounds (OnOffDashes).

      TRANSPARENCY_ONLY

	This indicates that the driver cannot support dashes with
	both a foreground and background color (DoubleDashes).

      LINE_PATTERN_POWER_OF_2_ONLY

	This indicates that only patterns with a power of 2 length
	can be accelerated.

      LINE_PATTERN_LSBFIRST_MSBJUSTIFIED
      LINE_PATTERN_LSBFIRST_LSBJUSTIFIED
      LINE_PATTERN_MSBFIRST_MSBJUSTIFIED
      LINE_PATTERN_MSBFIRST_LSBJUSTIFIED

	These describe how the line pattern should be packed.
	The pattern buffer is DWORD padded.  LSBFIRST indicates
	that the pattern runs from the LSB end to the MSB end.
	MSBFIRST indicates that the pattern runs from the MSB end
	to the LSB end.  When the pattern does not completely fill
	the DWORD padded buffer, the pattern will be justified 
	towards the MSB or LSB end based on the flags above.


    The following field indicates the maximum length dash pattern that
    should be accelerated.

	int DashPatternMaxLength


void SetupForDashedLine(ScrnInfoPtr pScrn,
		int fg, int bg, int rop, unsigned int planemask,
        	int length, unsigned char *pattern)

    
    SetupForDashedLine indicates that any combination of the following 
    may follow it.

	SubsequentDashedBresenhamLine
	SubsequentDashedTwoPointLine

    If "bg" is -1, then the background (pixels corresponding to clear
    bits in the pattern) should remain unmodified. "Bg" indicates the
    background color otherwise.  "Length" indicates the length of
    the pattern in bits and "pattern" points to the DWORD padded buffer
    holding the pattern which has been packed according to the flags
    set above.  

    
void SubsequentDashedTwoPointLine( ScrnInfoPtr pScrn,
        int x1, int y1, int x2, int y2, int flags, int phase)

void SubsequentDashedBresenhamLine(ScrnInfoPtr pScrn,
        int x1, int y1, int major, int minor, int err, int len, int octant,
        int phase)
  
    These are the same as the SubsequentSolidTwoPointLine and
    SubsequentBresenhamLine functions except for the addition
    of the "phase" field which indicates the offset into the dash 
    pattern that the pixel at (x1,y1) corresponds to.

    As with the SubsequentBresenhamLine, there is an
 
	int DashedBresenhamLineErrorTermBits 
   
    field which indicates the size of the error term registers
    used with dashed lines.  This is usually the same value as
    the field for the solid lines (because it's usually the same
    register).
       
      

2.5   Color Expansion Fills

    When filling a color expansion rectangle, the accelerator
    paints each pixel depending on whether or not a bit in a
    corresponding bitmap is set or clear. Opaque expansions are 
    when a set bit corresponds to the foreground color and a clear 
    bit corresponds to the background color.  A transparent expansion
    is when a set bit corresponds to the foreground color and a
    clear bit indicates that the pixel should remain unmodified.
   
    The graphics accelerator usually has access to the source 
    bitmap in one of two ways: 1) the bitmap data is sent serially
    to the accelerator by the CPU through some memory mapped aperture
    or 2) the accelerator reads the source bitmap out of offscreen
    video memory.  Some types of primitives are better suited towards 
    one method or the other.  Type 2 is useful for reusable patterns
    such as stipples which can be cached in offscreen memory.  The
    aperature method can be used for stippling but the CPU must pass
    the data across the bus each time a stippled fill is to be performed.  
    For expanding 1bpp client pixmaps or text strings to the screen,
    the aperature method is usually superior because the intermediate
    copy in offscreen memory needed by the second method would only be 
    used once.  Unfortunately, many accelerators can only do one of these
    methods and not both.  

    XAA provides both ScreenToScreen and CPUToScreen color expansion 
    interfaces for doing color expansion fills.  The ScreenToScreen
    functions can only be used with hardware that supports reading
    of source bitmaps from offscreen video memory, and these are only
    used for cacheable patterns such as stipples.  There are two
    variants of the CPUToScreen routines - a direct method intended
    for hardware that has a transfer aperature, and an indirect method
    intended for hardware without transfer aperatures or hardware
    with unusual transfer requirements.  Hardware that can only expand
    bitmaps from video memory should supply ScreenToScreen routines
    but also ScanlineCPUToScreen (indirect) routines to optimize transfers 
    of non-cacheable data.  Hardware that can only accept source bitmaps
    through an aperature should supply CPUToScreen (or ScanlineCPUToScreen) 
    routines. Hardware that can do both should provide both ScreenToScreen 
    and CPUToScreen routines.

    For both ScreenToScreen and CPUToScreen interfaces, the GXCOPY_ONLY,
    ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags described in
    Section 2.0 are valid as well as the following:

    /* bit order requirements (one of these must be set) */
   
    BIT_ORDER_IN_BYTE_LSBFIRST

      This indicates that least significant bit in each byte of the source
      data corresponds to the leftmost of that block of 8 pixels.  This
      is the prefered format.

    BIT_ORDER_IN_BYTE_MSBFIRST    

      This indicates that most significant bit in each byte of the source
      data corresponds to the leftmost of that block of 8 pixels.

    /* transparency restrictions */

    NO_TRANSPARENCY

      This indicates that the accelerator cannot do a transparent expansion.

    TRANSPARENCY_ONLY

      This indicates that the accelerator cannot do an opaque expansion.
      In cases where where the background needs to be filled, XAA will
      render the primitive in two passes when using the CPUToScreen
      interface, but will not do so with the ScreenToScreen interface 
      since that would require caching of two patterns.  Some 
      ScreenToScreen hardware may be able to render two passes at the
      driver level and remove the TRANSPARENCY_ONLY restriction if
      it can render pixels corresponding to the zero bits.



2.5.1  Screen To Screen Color Expansion

    The ScreenToScreenColorExpandFill routines provide an interface
    for doing expansion blits from source patterns stored in offscreen
    video memory.

    void SetupForScreenToScreenColorExpandFill (ScrnInfoPtr pScrn,
        			int fg, int bg, 
				int rop, unsigned int planemask)


    Ones in the source bitmap will correspond to the fg color.
    Zeros in the source bitmap will correspond to the bg color
    unless bg = -1.  In that case the pixels corresponding to the
    zeros in the bitmap shall be left unmodified by the accelerator.

    For hardware that doesn't allow an easy implementation of skipleft, the
    driver can replace CacheMonoStipple function with one that stores multiple
    rotated copies of the stipple and select between them. In this case the
    driver should set CacheColorExpandDensity to tell XAA how many copies of
    the pattern are stored in the width of a cache slot. For instance if the
    hardware can specify the starting address in bytes, then 8 rotated copies
    of the stipple are needed and CacheColorExpandDensity should be set to 8.

    void SubsequentScreenToScreenColorExpandFill( ScrnInfoPtr pScrn,
				int x, int y, int w, int h,
				int srcx, int srcy, int offset )

   
    Fill a rectangle "w" x "h" at location (x,y).  The source pitch
    between scanlines is the framebuffer pitch (pScrn->displayWidth
    pixels) and srcx and srcy indicate the start of the source pattern 
    in units of framebuffer pixels. "Offset" indicates the bit offset
    into the pattern that corresponds to the pixel being painted at
    "x" on the screen.  Some hardware accepts source coordinates in
    units of bits which makes implementation of the offset trivial.
    In that case, the bit address of the source bit corresponding to
    the pixel painted at (x,y) would be:
	
     (srcy * pScrn->displayWidth + srcx) * pScrn->bitsPerPixel + offset

    It should be noted that the offset assumes LSBFIRST hardware.  
    For MSBFIRST hardware, the driver may need to implement the 
    offset by bliting only from byte boundaries and hardware clipping.



2.5.2  CPU To Screen Color Expansion


    The CPUToScreenColorExpandFill routines provide an interface for 
    doing expansion blits from source patterns stored in system memory.
    There are two varieties of this primitive, a CPUToScreenColorExpandFill
    and a ScanlineCPUToScreenColorExpandFill.  With the 
    CPUToScreenColorExpandFill method, the source data is sent serially
    through a memory mapped aperature.  With the Scanline version, the
    data is rendered scanline at a time into intermediate buffers with
    a call to SubsequentColorExpandScanline following each scanline.

    These two methods have separate flags fields, the
    CPUToScreenColorExpandFillFlags and ScanlineCPUToScreenColorExpandFillFlags
    respectively.  Flags specific to one method or the other are described 
    in sections 2.5.2.1 and 2.5.2.2 but for both cases the bit order and
    transparency restrictions listed at the beginning of section 2.5 are 
    valid as well as the following:
    
    /* clipping  (optional) */
    
    LEFT_EDGE_CLIPPING
 
      This indicates that the accelerator supports omission of up to
      31 pixels on the left edge of the rectangle to be filled.  This
      is beneficial since it allows transfer of the source bitmap to
      always occur from DWORD boundaries. 

    LEFT_EDGE_CLIPPING_NEGATIVE_X

      This flag indicates that the accelerator can render color expansion
      rectangles even if the value of x origin is negative (off of
      the screen on the left edge).

    /* misc */

    TRIPLE_BITS_24BPP

      When enabled (must be in 24bpp mode), color expansion functions
      are expected to require three times the amount of bits to be
      transferred so that 24bpp grayscale colors can be used with color
      expansion in 8bpp coprocessor mode. Each bit is expanded to 3
      bits when writing the monochrome data.


 2.5.1 The Direct Method 


    Using the direct method of color expansion XAA will send all
    bitmap data to the accelerator serially through an memory mapped
    transfer window defined by the following two fields:

      unsigned char *ColorExpandBase

        This indicates the memory address of the beginning of the aperture.

      int ColorExpandRange

        This indicates the size in bytes of the aperture.

    The driver should specify how the transfered data should be padded.
    There are options for both the padding of each Y scanline and for the
    total transfer to the aperature.
    One of the following two flags must be set:

      CPU_TRANSFER_PAD_DWORD

        This indicates that the total transfer (sum of all scanlines) sent
        to the aperature must be DWORD padded.  This is the default behavior.

      CPU_TRANSFER_PAD_QWORD 

	This indicates that the total transfer (sum of all scanlines) sent
	to the aperature must be QWORD padded.  With this set, XAA will send
        an extra DWORD to the aperature when needed to ensure that only
        an even number of DWORDs are sent.

    And then there are the flags for padding of each scanline:

      SCANLINE_PAD_DWORD

	This indicates that each Y scanline should be DWORD padded.
        This is the only option available and is the default.

    Finally, there is the CPU_TRANSFER_BASE_FIXED flag which indicates
    that the aperture is a single register rather than a range of
    registers, and XAA should write all of the data to the first DWORD.
    If the ColorExpandRange is not large enough to accomodate scanlines
    the width of the screen, this option will be forced. That is, the
    ColorExpandRange must be:

        ((virtualX + 31)/32) * 4   bytes or more.

        ((virtualX + 62)/32 * 4) if LEFT_EDGE_CLIPPING_NEGATIVE_X is set.
  
    If the TRIPLE_BITS_24BPP flag is set, the required area should be 
    multiplied by three.
     
    
void SetupForCPUToScreenColorExpandFill(ScrnInfoPtr pScrn,
        		int fg, int bg,
			int rop,
			unsigned int planemask)

  
 
     Ones in the source bitmap will correspond to the fg color.
     Zeros in the source bitmap will correspond to the bg color
     unless bg = -1.  In that case the pixels corresponding to the
     zeros in the bitmap shall be left unmodified by the accelerator.


void SubsequentCPUToScreenColorExpandFill(ScrnInfoPtr pScrn,
			int x, int y, int w, int h,
			int skipleft )

     When this function is called, the accelerator should be setup
     to fill a rectangle of dimension "w" by "h" with origin at (x,y)
     in the fill style prescribed by the last call to 
     SetupForCPUToScreenColorExpandFill.  XAA will pass the data to 
     the aperture immediately after this function is called.  If the 
     skipleft is non-zero (and LEFT_EDGE_CLIPPING has been enabled), then 
     the accelerator _should_not_ render skipleft pixels on the leftmost
     edge of the rectangle.  Some engines have an alignment feature
     like this built in, some others can do this using a clipping
     window.

     It can be arranged for XAA to call Sync() after it is through 
     calling the Subsequent function by setting SYNC_AFTER_COLOR_EXPAND 
     in the  CPUToScreenColorExpandFillFlags.  This can provide the driver 
     with an oportunity to reset a clipping window if needed.

    
2.5.2  The Indirect Method 

     Using the indirect method, XAA will render the bitmap data scanline
     at a time to one or more buffers.  These buffers may be memory
     mapped apertures or just intermediate storage.

     int NumScanlineColorExpandBuffers

       This indicates the number of buffers available.

     unsigned char **ScanlineColorExpandBuffers

       This is an array of pointers to the memory locations of each buffer.
       Each buffer is expected to be large enough to accommodate scanlines
       the width of the screen.  That is:

        ((virtualX + 31)/32) * 4   bytes or more.

        ((virtualX + 62)/32 * 4) if LEFT_EDGE_CLIPPING_NEGATIVE_X is set.
  
     Scanlines are always DWORD padded.
     If the TRIPLE_BITS_24BPP flag is set, the required area should be 
     multiplied by three.


void SetupForScanlineCPUToScreenColorExpandFill(ScrnInfoPtr pScrn,
        		int fg, int bg,
			int rop,
			unsigned int planemask)
 
     Ones in the source bitmap will correspond to the fg color.
     Zeros in the source bitmap will correspond to the bg color
     unless bg = -1.  In that case the pixels corresponding to the
     zeros in the bitmap shall be left unmodified by the accelerator.

     
void SubsequentScanlineCPUToScreenColorExpandFill(ScrnInfoPtr pScrn,
			int x, int y, int w, int h,
			int skipleft )

void SubsequentColorExpandScanline(ScrnInfoPtr pScrn, int bufno)


    When SubsequentScanlineCPUToScreenColorExpandFill is called, XAA 
    will begin transfering the source data scanline at a time, calling  
    SubsequentColorExpandScanline after each scanline.  If more than
    one buffer is available, XAA will cycle through the buffers.
    Subsequent scanlines will use the next buffer and go back to the
    buffer 0 again when the last buffer is reached.  The index into
    the ScanlineColorExpandBuffers array is presented as "bufno"
    with each SubsequentColorExpandScanline call.

    The skipleft field is the same as for the direct method.

    The indirect method can be use to send the source data directly 
    to a memory mapped aperture represented by a single color expand
    buffer, scanline at a time, but more commonly it is used to place 
    the data into offscreen video memory so that the accelerator can 
    blit it to the visible screen from there.  In the case where the
    accelerator permits rendering into offscreen video memory while
    the accelerator is active, several buffers can be used so that
    XAA can be placing source data into the next buffer while the
    accelerator is blitting the current buffer.  For cases where
    the accelerator requires some special manipulation of the source
    data first, the buffers can be in system memory.  The CPU can
    manipulate these buffers and then send the data to the accelerator.



2.6   8x8 Mono Pattern Fills

    XAA provides support for two types of 8x8 hardware patterns -
    "Mono" patterns and "Color" patterns.  Mono pattern data is
    64 bits of color expansion data with ones indicating the
    foreground color and zeros indicating the background color.
    The source bitmaps for the 8x8 mono patterns can be presented
    to the graphics accelerator in one of two ways.  They can be
    passed as two DWORDS to the 8x8 mono pattern functions or
    they can be cached in offscreen memory and their locations
    passed to the 8x8 mono pattern functions.  In addition to the
    GXCOPY_ONLY, ROP_NEEDS_SOURCE, NO_PLANEMASK and RGB_EQUAL flags
    defined in Section 2.0, the following are defined for the
    Mono8x8PatternFillFlags:

    HARDWARE_PATTERN_PROGRAMMED_BITS

      This indicates that the 8x8 patterns should be packed into two
      DWORDS and passed to the 8x8 mono pattern functions.  The default
      behavior is to cache the patterns in offscreen video memory and
      pass the locations of these patterns to the functions instead.
      The pixmap cache must be enabled for the default behavior (8x8 
      pattern caching) to work.  See Section 3 for how to enable the
      pixmap cache. The pixmap cache is not necessary for 
      HARDWARE_PATTERN_PROGRAMMED_BITS.

    HARDWARE_PATTERN_PROGRAMMED_ORIGIN

      If the hardware supports programmable pattern offsets then
      this option should be set. See the table below for further
      infomation.

    HARDWARE_PATTERN_SCREEN_ORIGIN

      Some hardware wants the pattern offset specified with respect to the
      upper left-hand corner of the primitive being drawn.  Other hardware 
      needs the option HARDWARE_PATTERN_SCREEN_ORIGIN set to indicate that 
      all pattern offsets should be referenced to the upper left-hand 
      corner of the screen.  HARDWARE_PATTERN_SCREEN_ORIGIN is preferable 
      since this is more natural for the X-Window system and offsets will 
      have to be recalculated for each Subsequent function otherwise.

    BIT_ORDER_IN_BYTE_MSBFIRST
    BIT_ORDER_IN_BYTE_LSBFIRST

      As with other color expansion routines this indicates whether the
      most or the least significant bit in each byte from the pattern is 
      the leftmost on the screen.

    TRANSPARENCY_ONLY
    NO_TRANSPARENCY

      This means the same thing as for the color expansion rect routines
      except that for TRANSPARENCY_ONLY XAA will not render the primitive
      in two passes since this is more easily handled by the driver.
      It is recommended that TRANSPARENCY_ONLY hardware handle rendering
      of opaque patterns in two passes (the background can be filled as
      a rectangle in GXcopy) in the Subsequent function so that the
      TRANSPARENCY_ONLY restriction can be removed. 



    Additional information about cached patterns...
    For the case where HARDWARE_PATTERN_PROGRAMMED_BITS is not set and 
    the pattern must be cached in offscreen memory, the first pattern
    starts at the cache slot boundary which is set by the 
    CachePixelGranularity field used to configure the pixmap cache.
    One should ensure that the CachePixelGranularity reflects any 
    alignment restrictions that the accelerator may put on 8x8 pattern 
    storage locations.  When HARDWARE_PATTERN_PROGRAMMED_ORIGIN is set 
    there is only one pattern stored.  When this flag is not set,
    all 64 pre-rotated copies of the pattern are cached in offscreen memory.
    The MonoPatternPitch field can be used to specify the X position pixel
    granularity that each of these patterns must align on.  If the
    MonoPatternPitch is not supplied, the patterns will be densely packed
    within the cache slot.  The behavior of the default XAA 8x8 pattern
    caching mechanism to store all 8x8 patterns linearly in video memory.
    If the accelerator needs the patterns stored in a more unusual fashion,
    the driver will need to provide its own 8x8 mono pattern caching 
    routines for XAA to use. 

    The following table describes the meanings of the "patx" and "paty"
    fields in both the SetupFor and Subsequent functions.

    With HARDWARE_PATTERN_SCREEN_ORIGIN
    -----------------------------------

    HARDWARE_PATTERN_PROGRAMMED_BITS and HARDWARE_PATTERN_PROGRAMMED_ORIGIN

	SetupFor: patx and paty are the first and second DWORDS of the
		  8x8 mono pattern.

	Subsequent: patx and paty are the x,y offset into that pattern.
		    All Subsequent calls will have the same offset in 
		    the case of HARDWARE_PATTERN_SCREEN_ORIGIN so only
		    the offset specified by the first Subsequent call 
		    after a SetupFor call will need to be observed.

    HARDWARE_PATTERN_PROGRAMMED_BITS only

	SetupFor: patx and paty hold the first and second DWORDS of
		  the 8x8 mono pattern pre-rotated to match the desired
		  offset.

	Subsequent: These just hold the same patterns and can be ignored.

    HARDWARE_PATTERN_PROGRAMMED_ORIGIN only

	SetupFor: patx and paty hold the x,y coordinates of the offscreen
		  memory location where the 8x8 pattern is stored.  The
		  bits are stored linearly in memory at that location.

	Subsequent: patx and paty hold the offset into the pattern.
		    All Subsequent calls will have the same offset in 
		    the case of HARDWARE_PATTERN_SCREEN_ORIGIN so only
		    the offset specified by the first Subsequent call 
		    after a SetupFor call will need to be observed.

    Neither programmed bits or origin

	SetupFor: patx and paty hold the x,y coordinates of the offscreen 	
		  memory location where the pre-rotated 8x8 pattern is
		  stored.

	Subsequent: patx and paty are the same as in the SetupFor function
		    and can be ignored.
		  

    Without HARDWARE_PATTERN_SCREEN_ORIGIN
    -------------------------------------- 

    HARDWARE_PATTERN_PROGRAMMED_BITS and HARDWARE_PATTERN_PROGRAMMED_ORIGIN

	SetupFor: patx and paty are the first and second DWORDS of the
		  8x8 mono pattern.

	Subsequent: patx and paty are the x,y offset into that pattern.

    HARDWARE_PATTERN_PROGRAMMED_BITS only

	SetupFor: patx and paty holds the first and second DWORDS of
		  the unrotated 8x8 mono pattern.  This can be ignored. 

	Subsequent: patx and paty hold the rotated 8x8 pattern to be 
		    rendered.

    HARDWARE_PATTERN_PROGRAMMED_ORIGIN only

	SetupFor: patx and paty hold the x,y coordinates of the offscreen
		  memory location where the 8x8 pattern is stored.  The
		  bits are stored linearly in memory at that location.

	Subsequent: patx and paty hold the offset into the pattern.

    Neither programmed bits or origin

	SetupFor: patx and paty hold the x,y coordinates of the offscreen 	
		  memory location where the unrotated 8x8 pattern is
		  stored.  This can be ignored.

	Subsequent: patx and paty hold the x,y coordinates of the
		    rotated 8x8 pattern to be rendered.



void SetupForMono8x8PatternFill(ScrnInfoPtr pScrn, int patx, int paty,
        int fg, int bg, int rop, unsigned int planemask)

    SetupForMono8x8PatternFill indicates that any combination of the 
    following  may follow it.

	SubsequentMono8x8PatternFillRect
	SubsequentMono8x8PatternFillTrap

    The fg, bg, rop and planemask fields have the same meaning as the
    ones used for the other color expansion routines.  Patx's and paty's
    meaning can be determined from the table above.

 
void SubsequentMono8x8PatternFillRect( ScrnInfoPtr pScrn,
        	int patx, int paty, int x, int y, int w, int h)

     Fill a rectangle of dimensions "w" by "h" with origin at (x,y) 
     using the parameters give by the last SetupForMono8x8PatternFill
     call.  The meanings of patx and paty can be determined by the
     table above.

void SubsequentMono8x8PatternFillTrap( ScrnInfoPtr pScrn,
     			   int patx, int paty, int y, int h, 
     			   int left, int dxL, int dyL, int eL,
     			   int right, int dxR, int dyR, int eR )

     The meanings of patx and paty can be determined by the table above.
     The rest of the fields have the same meanings as those in the 
     SubsequentSolidFillTrap function. 



2.7   8x8 Color Pattern Fills
  
    8x8 color pattern data is 64 pixels of full color data that
    is stored linearly in offscreen video memory.  8x8 color patterns 
    are useful as a substitute for 8x8 mono patterns when tiling,
    doing opaque stipples, or in the case where transperency is
    supported, regular stipples.  8x8 color pattern fills also have
    the additional benefit of being able to tile full color 8x8
    patterns instead of just 2 color ones like the mono patterns.
    However, full color 8x8 patterns aren't used very often in the
    X Window system so you might consider passing this primitive
    by if you already can do mono patterns, especially if they 
    require alot of cache area.  Color8x8PatternFillFlags is
    the flags field for this primitive and the GXCOPY_ONLY,
    ROP_NEEDS_SOURCE and NO_PLANEMASK flags as described in
    Section 2.0 are valid as well as the following:


    HARDWARE_PATTERN_PROGRAMMED_ORIGIN

      If the hardware supports programmable pattern offsets then
      this option should be set.  

    HARDWARE_PATTERN_SCREEN_ORIGIN

      Some hardware wants the pattern offset specified with respect to the
      upper left-hand corner of the primitive being drawn.  Other hardware 
      needs the option HARDWARE_PATTERN_SCREEN_ORIGIN set to indicate that 
      all pattern offsets should be referenced to the upper left-hand 
      corner of the screen.  HARDWARE_PATTERN_SCREEN_ORIGIN is preferable 
      since this is more natural for the X-Window system and offsets will 
      have to be recalculated for each Subsequent function otherwise.

    NO_TRANSPARENCY
    TRANSPARENCY_GXCOPY_ONLY

      These mean the same as for the ScreenToScreenCopy functions.


    The following table describes the meanings of patx and paty passed
    to the SetupFor and Subsequent fields:

    HARDWARE_PATTERN_PROGRAMMED_ORIGIN && HARDWARE_PATTERN_SCREEN_ORIGIN
	
	SetupFor: patx and paty hold the x,y location of the unrotated 
		  pattern.

	Subsequent: patx and paty hold the pattern offset.  For the case
		    of HARDWARE_PATTERN_SCREEN_ORIGIN all Subsequent calls
		    have the same offset so only the first call will need
		    to be observed.

    
    HARDWARE_PATTERN_PROGRAMMED_ORIGIN only

	SetupFor: patx and paty hold the x,y location of the unrotated
		  pattern.

	Subsequent: patx and paty hold the pattern offset. 

    HARDWARE_PATTERN_SCREEN_ORIGIN

	SetupFor: patx and paty hold the x,y location of the rotated pattern.

	Subsequent: patx and paty hold the same location as the SetupFor
		    function so these can be ignored.

    neither flag

	SetupFor: patx and paty hold the x,y location of the unrotated
		  pattern.  This can be ignored.

	Subsequent: patx and paty hold the x,y location of the rotated
		    pattern.

    Additional information about cached patterns...
    All 8x8 color patterns are cached in offscreen video memory so
    the pixmap cache must be enabled to use them. The first pattern
    starts at the cache slot boundary which is set by the 
    CachePixelGranularity field used to configure the pixmap cache.
    One should ensure that the CachePixelGranularity reflects any 
    alignment restrictions that the accelerator may put on 8x8 pattern 
    storage locations.  When HARDWARE_PATTERN_PROGRAMMED_ORIGIN is set 
    there is only one pattern stored.  When this flag is not set,
    all 64 rotations off the pattern are accessible but it is assumed
    that the accelerator is capable of accessing data stored on 8
    pixel boundaries.  If the accelerator has stricter alignment 
    requirements than this the dirver will need to provide its own 
    8x8 color pattern caching routines. 


void SetupForColor8x8PatternFill(ScrnInfoPtr pScrn, int patx, int paty,
        	int rop, unsigned int planemask, int trans_color)

    SetupForColor8x8PatternFill indicates that any combination of the 
    following  may follow it.

	SubsequentColor8x8PatternFillRect
	SubsequentColor8x8PatternFillTrap	(not implemented yet)

    For the meanings of patx and paty, see the table above.  Trans_color
    means the same as for the ScreenToScreenCopy functions.


 
void SubsequentColor8x8PatternFillRect( ScrnInfoPtr pScrn,
        	int patx, int paty, int x, int y, int w, int h)

     Fill a rectangle of dimensions "w" by "h" with origin at (x,y) 
     using the parameters give by the last SetupForColor8x8PatternFill
     call.  The meanings of patx and paty can be determined by the
     table above.

void SubsequentColor8x8PatternFillTrap( ScrnInfoPtr pScrn,
     			   int patx, int paty, int y, int h, 
     			   int left, int dxL, int dyL, int eL,
     			   int right, int dxR, int dyR, int eR )

    For the meanings of patx and paty, see the table above. 
    The rest of the fields have the same meanings as those in the 
    SubsequentSolidFillTrap function. 



2.8  Image Writes

    XAA provides a mechanism for transfering full color pixel data from
    system memory to video memory through the accelerator.  This is 
    useful for dealing with alignment issues and performing raster ops
    on the data when writing it to the framebuffer.  As with color
    expansion rectangles, there is a direct and indirect method.  The
    direct method sends all data through a memory mapped aperature.
    The indirect method sends the data to an intermediated buffer scanline 
    at a time.

    The direct and indirect methods have separate flags fields, the
    ImageWriteFlags and ScanlineImageWriteFlags respectively.
    Flags specific to one method or the other are described in sections 
    2.8.1 and 2.8.2 but for both cases the GXCOPY_ONLY, ROP_NEEDS_SOURCE
    and NO_PLANEMASK flags described in Section 2.0 are valid as well as
    the following:

    NO_GXCOPY

      In order to have accelerated image transfers faster than the 
      software versions for GXcopy, the engine needs to support clipping,
      be using the direct method and have a large enough image transfer
      range so that CPU_TRANSFER_BASE_FIXED doesn't need to be set.
      If these are not supported, then it is unlikely that transfering
      the data through the accelerator will be of any advantage for the
      simple case of GXcopy.  In fact, it may be much slower.  For such
      cases it's probably best to set the NO_GXCOPY flag so that 
      Image writes will only be used for the more complicated rops.

    /* transparency restrictions */

    NO_TRANSPARENCY
     
      This indicates that the accelerator does not support skipping
      of color keyed pixels when copying from the source to the destination.

    TRANSPARENCY_GXCOPY_ONLY

      This indicates that the accelerator supports skipping of color keyed
      pixels only when the rop is GXcopy.

    /* clipping  (optional) */
    
    LEFT_EDGE_CLIPPING
 
      This indicates that the accelerator supports omission of up to
      3 pixels on the left edge of the rectangle to be filled.  This
      is beneficial since it allows transfer from the source pixmap to
      always occur from DWORD boundaries. 

    LEFT_EDGE_CLIPPING_NEGATIVE_X

      This flag indicates that the accelerator can fill areas with
      image write data even if the value of x origin is negative (off of
      the screen on the left edge).


2.8.1 The Direct Method

    Using the direct method of ImageWrite XAA will send all
    bitmap data to the accelerator serially through an memory mapped
    transfer window defined by the following two fields:

      unsigned char *ImageWriteBase

        This indicates the memory address of the beginning of the aperture.

      int ImageWriteRange

        This indicates the size in bytes of the aperture.

    The driver should specify how the transfered data should be padded.
    There are options for both the padding of each Y scanline and for the
    total transfer to the aperature.
    One of the following two flags must be set:

      CPU_TRANSFER_PAD_DWORD

        This indicates that the total transfer (sum of all scanlines) sent
        to the aperature must be DWORD padded.  This is the default behavior.

      CPU_TRANSFER_PAD_QWORD 

	This indicates that the total transfer (sum of all scanlines) sent
	to the aperature must be QWORD padded.  With this set, XAA will send
        an extra DWORD to the aperature when needed to ensure that only
        an even number of DWORDs are sent.

    And then there are the flags for padding of each scanline:

      SCANLINE_PAD_DWORD

	This indicates that each Y scanline should be DWORD padded.
        This is the only option available and is the default.

    Finally, there is the CPU_TRANSFER_BASE_FIXED flag which indicates
    that the aperture is a single register rather than a range of
    registers, and XAA should write all of the data to the first DWORD.
    XAA will automatically select CPU_TRANSFER_BASE_FIXED if the 
    ImageWriteRange is not large enough to accomodate an entire scanline.   


void SetupForImageWrite(ScrnInfoPtr pScrn, int rop, unsigned int planemask,
        			int trans_color, int bpp, int depth)

     If trans_color is not -1 then trans_color indicates the transparency
     color key and pixels with color trans_color passed through the 
     aperature should not be transfered to the screen but should be 
     skipped.  Bpp and depth indicate the bits per pixel and depth of
     the source pixmap.  Trans_color is always -1 if the NO_TRANSPARENCY
     flag is set.


void SubsequentImageWriteRect(ScrnInfoPtr pScrn, 
				int x, int y, int w, int h, int skipleft)

     
     Data passed through the aperature should be copied to a rectangle
     of width "w" and height "h" with origin (x,y).  If LEFT_EDGE_CLIPPING
     has been enabled, skipleft will correspond to the number of pixels
     on the left edge that should not be drawn.  Skipleft is zero 
     otherwise.

     It can be arranged for XAA to call Sync() after it is through 
     calling the Subsequent functions by setting SYNC_AFTER_IMAGE_WRITE 
     in the  ImageWriteFlags.  This can provide the driver with an
     oportunity to reset a clipping window if needed.

2.8.2  The Indirect Method

     Using the indirect method, XAA will render the pixel data scanline
     at a time to one or more buffers.  These buffers may be memory
     mapped apertures or just intermediate storage.

     int NumScanlineImageWriteBuffers

       This indicates the number of buffers available.

     unsigned char **ScanlineImageWriteBuffers

       This is an array of pointers to the memory locations of each buffer.
       Each buffer is expected to be large enough to accommodate scanlines
       the width of the screen.  That is:

         pScrn->VirtualX * pScreen->bitsPerPixel/8   bytes or more.

       If LEFT_EDGE_CLIPPING_NEGATIVE_X is set, add an additional 4
       bytes to that requirement in 8 and 16bpp, 12 bytes in 24bpp.
  
     Scanlines are always DWORD padded.

void SetupForScanlineImageWrite(ScrnInfoPtr pScrn, int rop, 
				unsigned int planemask, int trans_color, 
				int bpp, int depth)

     If trans_color is not -1 then trans_color indicates the transparency
     color key and pixels with color trans_color in the buffer should not 
     be transfered to the screen but should be skipped.  Bpp and depth 
     indicate the bits per pixel and depth of the source bitmap.  
     Trans_color is always -1 if the NO_TRANSPARENCY flag is set.


void SubsequentImageWriteRect(ScrnInfoPtr pScrn, 
				int x, int y, int w, int h, int skipleft)

     
void SubsequentImageWriteScanline(ScrnInfoPtr pScrn, int bufno)


    When SubsequentImageWriteRect is called, XAA will begin
    transfering the source data scanline at a time, calling  
    SubsequentImageWriteScanline after each scanline.  If more than
    one buffer is available, XAA will cycle through the buffers.
    Subsequent scanlines will use the next buffer and go back to the
    buffer 0 again when the last buffer is reached.  The index into
    the ScanlineImageWriteBuffers array is presented as "bufno"
    with each SubsequentImageWriteScanline call.

    The skipleft field is the same as for the direct method.

    The indirect method can be use to send the source data directly 
    to a memory mapped aperture represented by a single image write
    buffer, scanline at a time, but more commonly it is used to place 
    the data into offscreen video memory so that the accelerator can 
    blit it to the visible screen from there.  In the case where the
    accelerator permits rendering into offscreen video memory while
    the accelerator is active, several buffers can be used so that
    XAA can be placing source data into the next buffer while the
    accelerator is blitting the current buffer.  For cases where
    the accelerator requires some special manipulation of the source
    data first, the buffers can be in system memory.  The CPU can
    manipulate these buffers and then send the data to the accelerator.


2.9 Clipping

    XAA supports hardware clipping rectangles.  To use clipping
    in this way it is expected that the graphics accelerator can
    clip primitives with verticies anywhere in the 16 bit signed 
    coordinate system. 

void SetClippingRectangle ( ScrnInfoPtr pScrn,
        		int left, int top, int right, int bottom)

void DisableClipping (ScrnInfoPtr pScrn)

    When SetClippingRectangle is called, all hardware rendering
    following it should be clipped to the rectangle specified
    until DisableClipping is called.

    The ClippingFlags field indicates which operations this sort
    of Set/Disable pairing can be used with.  Any of the following
    flags may be OR'd together.

	HARDWARE_CLIP_SCREEN_TO_SCREEN_COLOR_EXPAND
	HARDWARE_CLIP_SCREEN_TO_SCREEN_COPY
	HARDWARE_CLIP_MONO_8x8_FILL
	HARDWARE_CLIP_COLOR_8x8_FILL
	HARDWARE_CLIP_SOLID_FILL
	HARDWARE_CLIP_DASHED_LINE
	HARDWARE_CLIP_SOLID_LINE



3)  XAA PIXMAP CACHE

   /* NOTE:  XAA has no knowledge of framebuffer particulars so until
	the framebuffer is able to render into offscreen memory, usage
	of the pixmap cache requires that the driver provide ImageWrite
	routines or a WritePixmap or WritePixmapToCache replacement so
	that patterns can even be placed in the cache.

      ADDENDUM: XAA can now load the pixmap cache without requiring
	that the driver supply an ImageWrite function, but this can
	only be done on linear framebuffers.  If you have a linear
	framebuffer, set LINEAR_FRAMEBUFFER in the XAAInfoRec.Flags
	field and XAA will then be able to upload pixmaps into the
	cache without the driver providing functions to do so.
   */


   The XAA pixmap cache provides a mechanism for caching of patterns
   in offscreen video memory so that tiled fills and in some cases
   stippling can be done by blitting the source patterns from offscreen
   video memory. The pixmap cache also provides the mechanism for caching 
   of 8x8 color and mono hardware patterns.  Any unused offscreen video
   memory gets used for the pixmap cache and that information is 
   provided by the XFree86 Offscreen Memory Manager. XAA registers a 
   callback with the manager so that it can be informed of any changes 
   in the offscreen memory configuration.  The driver writer does not 
   need to deal with any of this since it is all automatic.  The driver 
   merely needs to initialize the Offscreen Memory Manager as described 
   in the DESIGN document and set the PIXMAP_CACHE flag in the 
   XAAInfoRec.Flags field.  The Offscreen Memory Manager initialization 
   must occur before XAA is initialized or else pixmap cache 
   initialization will fail.  

   PixmapCacheFlags is an XAAInfoRec field which allows the driver to
   control pixmap cache behavior to some extent.  Currently only one
   flag is defined:

   DO_NOT_BLIT_STIPPLES

     This indicates that the stippling should not be done by blitting
     from the pixmap cache.  This does not apply to 8x8 pattern fills. 


   CachePixelGranularity is an optional field.  If the hardware requires
   that a 8x8 patterns have some particular pixel alignment it should
   be reflected in this field.  Ignoring this field or setting it to
   zero or one means there are no alignment issues.


4)  OFFSCREEN PIXMAPS

   XAA has the ability to store pixmap drawables in offscreen video 
   memory and render into them with full hardware acceleration.  Placement
   of pixmaps in the cache is done automatically on a first-come basis and 
   only if there is room.  To enable this feature, set the OFFSCREEN_PIXMAPS
   flag in the XAAInfoRec.Flags field.  This is only available when a
   ScreenToScreenCopy function is provided, when the Offscreen memory 
   manager has been initialized and when the LINEAR_FRAMEBUFFER flag is
   also set.

   int maxOffPixWidth
   int maxOffPixHeight

       These two fields allow the driver to limit the maximum dimensions
     of an offscreen pixmap.  If one of these is not set, it is assumed
     that there is no limit on that dimension.  Note that if an offscreen
     pixmap with a particular dimension is allowed, then your driver will be
     expected to render primitives as large as that pixmap.  

$XFree86: xc/programs/Xserver/hw/xfree86/xaa/XAA.HOWTO,v 1.12 2000/04/12 14:44:42 tsi Exp $