DynaPDF Manual - Page 514

Previous Page 513   Index   Next Page 515

Function Reference
Page 514 of 874
at this position. The half space width should be used because the fonts of documents which emulate
space characters with kerning space contain often no space character. DynaPDF sets a default space
width in this case which can be too large if a condensed font is used.
However, the array form is just one possible format to enable kerning between characters. Due to
several reasons the array form is sometimes not used. Many PDF drivers update the text position
with text positioning operators instead. This technique produces not only much greater content
streams it splits text records also into separate ones. This complicates the identification of word
boundaries a lot because each record is returned in a separate GetPageText() call. We need now the
coordinates to determine whether the text must be assigned to the same line. If the text is not rotated
this is not a big deal but if the coordinate system is rotated or if it contains other transformations
some further math is required to determine whether a text record must be assigned to the current
line.
We want now take a look into a PDF content stream to determine how an arbitrary text can be
stored in a PDF file. The following text can be stored in many different ways and it is important to
understand that many variants are possible and exist in real PDF files.
The rendered result of the string "The fox eats the lazy mouse." looks quite normal:
The fox eats the lazy mouse.
However, a PDF driver does not necessarily store this text in one record, there are many possible
variants:
%This is the easiest variant, one record contains the entire text line.
%It would be returned in one GetPageText() call as one coherent kerning
%record.
(The fox eats the lazy mouse.)Tj
%This version emulates the spaces with kerning space.
%It would be returned in one GetPageText() call with 6 kerning records.
[(The)-280(fox)-280(eats)-280(the)-280(lazy)-280(mouse.)]TJ
%This version uses PDF positioning operators to emulate spaces.
%It produces 6 separate GetPageText() calls.
(The)Tj
2.8 0 Td
(fox)Tj
2.8 0 Td
(eats)Tj
2.8 0 Td
(the)Tj
2.8 0 Td
(lazy)Tj
2.8 0 Td
(mouse.)Tj
 

Previous topic: Organization of text objects

Next topic: Possible encoding issues, How to calculate the absolute string position?