DynaPDF Manual - Page 648
Previous Page 647 Index Next Page 649

Function Reference
Page 648 of 874
The above values are already pre-computed and can be taken from the structure TPDFImage.
An image can be used several times in a document, e.g. if it represents a logo or other repeating
contents. To avoid the extraction of unnecessay duplicates a duplicate check should be performed
before extracting an image. If the image is not an inline image, the member ObjectPtr of the structure
TPDFImage represents a unique pointer to the image object.
Inline images are fully defined in the content stream and cannot be used repeatly. A duplicate check
is not required for these images. The only way to compare two inline images is to compare the
image data.
A duplicate check should also be performed for templates. Templates are normally used for
repeating contents such as fixed page backgrounds, logos and so on. To skip a template the
TBeginTemplate calback function must return 1.
Physical organization of images
The visible appearance and the physical structure how an image is stored in a PDF file is sometimes
somewhat confusing. Images can be split into bands or tiles. There are various reasons why this can
be done but if an image was split into smaller pieces then it is very difficult to restore the original
image. DynaPDF does not provide algorithms which try to identify pieces of a larger image.
It is usually best to ignore images which are less than two units high. For example, applications like
Microsoft Word split images often into separate scan lines or smaller pieces if the image contained
transparent areas. The resulting PDF file contains then hundreds or thousands of very small images.
Because such small pieces are not really meaningful when viewed alone, the application can either
try to reconstruct the original image, or if this is not possible, such images should be skipped.
If the content parser returns lots of very small images then it is usually best to provide a fallback
that renders the entire page with RenderPage() or RenderPageToImage().
Image coordinate space
An image occupies a rectangle in image space w units wide and h units high, where w and h are the
width and height of the image in samples or pixels. Each sample occupies one square unit. The
coordinate origin (0, 0) is at the upper-left corner of the image, with coordinates ranging from 0 to w
horizontally and 0 to h vertically.
The image’s sample data is ordered by row, with the horizontal coordinate varying most rapidly.
The upper-left corner of the first sample is at coordinates (0, 0), the second at (1, 0), and so on
through the last sample of the first row, whose upper-left corner is at (0, 0) and whose upper-right
corner is at (w - 1, 0). The next samples after that are at coordinates (0, 1), (1, 1), and so on to the final
sample of the image, whose upper-left corner is at (0, 0) and whose lower-right corner is at (w-1, h-
1).
The correspondence between image space and user space is constant: the unit square of user space,
bounded by user coordinates (0, 0) and (1, 1), corresponds to the boundary of the image in image
Previous topic: Image Extraction
Next topic: Helper functions, 16 bit images