<?php
//getting new instance
$pdfFile = new_pdf();
PDF_open_file($pdfFile, " ");
//document info
pdf_set_info($pdfFile, "Auther", "Ahmed Elbshry");
pdf_set_info($pdfFile, "Creator", "Ahmed Elbshry");
pdf_set_info($pdfFile, "Title", "PDFlib");
pdf_set_info($pdfFile, "Subject", "Using PDFlib");
//starting our page and define the width and highet of the document
pdf_begin_page($pdfFile, 595, 842);
//check if Arial font is found, or exit
if($font = PDF_findfont($pdfFile, "Arial", "winansi", 1)) {
PDF_setfont($pdfFile, $font, 12);
} else {
echo ("Font Not Found!");
PDF_end_page($pdfFile);
PDF_close($pdfFile);
PDF_delete($pdfFile);
exit();
}
//start writing from the point 50,780
PDF_show_xy($pdfFile, "This Text In Arial Font", 50, 780);
PDF_end_page($pdfFile);
PDF_close($pdfFile);
//store the pdf document in $pdf
$pdf = PDF_get_buffer($pdfFile);
//get the len to tell the browser about it
$pdflen = strlen($pdfFile);
//telling the browser about the pdf document
header("Content-type: application/pdf");
header("Content-length: $pdflen");
header("Content-Disposition: inline; filename=phpMade.pdf");
//output the document
print($pdf);
//delete the object
PDF_delete($pdfFile);
?>
Fonctions PDF
Remarque sur les fonctions obsolètes de la PDFlib
Depuis PHP 4.0.5, l'extension PHP pour la PDFlib est officiellement supportée par PDFlib GmbH. Cela signifie que toutes les fonctions décrites dans le manuel de référence de la PDFlib (PDFlib V3.0 ou supérieur) sont supportées par PHP 4 avec exactement la même signification et les mêmes paramètres. Cependant, avec la PDFlib V5.0.4 ou supérieure, tous les paramètres doivent être spécifiés. Pour des raisons de compatibilité, l'implémentation de la PDFlib supporte la plupart des fonctions obsolètes, mais elles doivent être remplacées par leur nouvelle version. PDFlib GmbH ne fournira aucun support pour les problèmes survenant lors de l'utilisation de ces fonctions obsolètes. La documentation de cette section indique les anciennes fonctions comme "obsolètes" et donne la fonction qui doit être utilisée à la place.
Sommaire
- PDF_activate_item — Active un élément de structure ou un autre élément de contenu
- PDF_add_annotation — [Obsolète] Ajoute une annotation
- PDF_add_bookmark — [Obsolète] Ajoute un signet dans la page courante
- PDF_add_launchlink — [Obsolète] Ajoute une annotation de lancement dans la page PDF courante
- PDF_add_locallink — [Obsolète] Ajoute une annotation de lien dans la page PDF courante
- PDF_add_nameddest — Crée une destination nommée
- PDF_add_note — [Obsolète] Ajoute une annotation dans la page PDF courante
- PDF_add_outline — [Obsolète] Ajoute un signet dans la page courante
- PDF_add_pdflink — [Obsolète] Ajoute une annotation sur un lien vers un fichier dans la page PDF courante
- PDF_add_table_cell — Ajoute une cellule à un nouveau tableau ou un tableau existant
- PDF_add_textflow — Crée un flux de texte ou ajoute du texte à un flux de texte existant
- PDF_add_thumbnail — [Obsolète] Ajoute une miniature sur la page PDF courante
- PDF_add_weblink — [Obsolète] Ajoute un lien web sur la page PDF courante
- PDF_arc — Dessine un arc de cercle PDF dans le sens anti-horaire
- PDF_arcn — Dessine un arc de cercle dans le sens horaire
- PDF_attach_file — [Obsolète] Ajoute un fichier attaché à la page PDF
- PDF_begin_document — Crée un nouveau fichier PDF
- PDF_begin_font — Commence une définition de police de type 3
- PDF_begin_glyph — Commence une définition de glyphe pour les polices de type 3
- PDF_begin_item — Ouvre un élément de structure ou un autre élément de contenu
- PDF_begin_layer — Commence une interface
- PDF_begin_page_ext — Commence une nouvelle page
- PDF_begin_page — [Obsolète] Initialise une nouvelle page de document PDF
- PDF_begin_pattern — Initialise un nouveau pattern PDF
- PDF_begin_template_ext — Commence une définition de gabarit PDF
- PDF_begin_template — Initialise un nouveau gabarit PDF (obsolète)
- PDF_circle — Dessine un cercle dans un document PDF
- PDF_clip — Modifie le chemin de clipping PDF
- PDF_close_image — Ferme une image dans un document PDF
- PDF_close_pdi_page — Ferme la page PDF
- PDF_close_pdi — Ferme le fichier PDF d'entrée (obsolète)
- PDF_close — [Obsolète] Ferme le fichier PDF
- PDF_closepath_fill_stroke — Termine le chemin, dessine les bords et remplit la forme
- PDF_closepath_stroke — Termine le chemin et dessine les bords
- PDF_closepath — Termine le chemin PDF courant
- PDF_concat — Concatène une matrice avec le CTM
- PDF_continue_text — Affiche du texte à la prochaine ligne PDF
- PDF_create_3dview — Crée une vue 3D
- PDF_create_action — Crée une action pour des objets ou des événements
- PDF_create_annotation — Crée une annotation rectangulaire
- PDF_create_bookmark — Crée un signet
- PDF_create_field — Crée un champ de formulaire
- PDF_create_fieldgroup — Crée un groupe de champs dans un formulaire
- PDF_create_gstate — Crée un objet graphique
- PDF_create_pvf — Crée un fichier PDFlib virtuel
- PDF_create_textflow — Crée un objet de flux de texte
- PDF_curveto — Dessine une courbe de Bezier
- PDF_define_layer — Crée une définition d'interface
- PDF_delete_pvf — Efface un fichier virtuel PDFlib
- PDF_delete_table — Efface un tableau
- PDF_delete_textflow — Efface un objet de flux de texte
- PDF_delete — Efface un objet PDF
- PDF_encoding_set_char — Ajoute un nom de glyphe et/ou une valeur Unicode
- PDF_end_document — Ferme un fichier PDF
- PDF_end_font — Termine une définition de police de type 3
- PDF_end_glyph — Termine la définition d'un glyphe pour les polices de type 3
- PDF_end_item — Ferme la structure d'un élément ou un autre élément de contenu
- PDF_end_layer — Désactive toutes les interfaces actives
- PDF_end_page_ext — Termine une page
- PDF_end_page — Termine la page PDF courante
- PDF_end_pattern — Termine le pattern PDF
- PDF_end_template — Termine le gabarit PDF
- PDF_endpath — Termine le chemin courant
- PDF_fill_imageblock — Remplit un bloc d'image avec des données variables
- PDF_fill_pdfblock — Remplit un bloc de contenu avec des données variables
- PDF_fill_stroke — Remplit et passe le pinceau sur le chemin PDF courant
- PDF_fill_textblock — Remplit un bloc de texte avec des données variables
- PDF_fill — Remplit le chemin PDF courant avec la couleur courante
- PDF_findfont — [Obsolète] Prépare une police pour utilisation ultérieure
- PDF_fit_image — Place une image ou un gabarit PDF
- PDF_fit_pdi_page — Place une page PDF importée
- PDF_fit_table — Place un tableau sur la page
- PDF_fit_textflow — Formate un flux de texte dans un espace rectangulaire
- PDF_fit_textline — Place un simple ligne de texte
- PDF_get_apiname — Récupère le nom d'une fonction de l'API qui a échouée
- PDF_get_buffer — Lit le tampon contenant le fichier PDF généré
- PDF_get_errmsg — Récupère le texte d'une erreur
- PDF_get_errnum — Récupère un numéro d'erreur
- PDF_get_font — [Obsolète] Charge une police
- PDF_get_fontname — [Obsolète] Lit le nom de la police
- PDF_get_fontsize — [Obsolète] Gère les polices
- PDF_get_image_height — [Obsolète] Retourne la hauteur d'une image
- PDF_get_image_width — [Obsolète] Retourne la largeur d'une image
- PDF_get_majorversion — [Obsolète] Retourne le numéro de version majeur de PDFlib
- PDF_get_minorversion — [Obsolète] Retourne le numéro de version mineure de PDFlib
- PDF_get_parameter — Lit certains paramètres
- PDF_get_pdi_parameter — Lit des paramètres textuels dans le document PDI (obsolète)
- PDF_get_pdi_value — Lit des paramètres numériques dans le document PDF d'entrée (obsolète)
- PDF_get_value — Lit certains paramètres numériques
- PDF_info_font — Récupère des informations détaillées sur une police chargée
- PDF_info_matchbox — Récupère les informations d'une boîte
- PDF_info_table — Récupère les informations d'un tableau
- PDF_info_textflow — Récupère le statut d'un flux de texte
- PDF_info_textline — Effectue le formatage d'une ligne de texte et récupère la matrice
- PDF_initgraphics — Remet à zéro l'environnement graphique PDF
- PDF_lineto — Dessine une ligne PDF
- PDF_load_3ddata — Charge un modèle 3D
- PDF_load_font — Cherche et prépare une police
- PDF_load_iccprofile — Cherche et prépare un profile ICC
- PDF_load_image — Ouvre un fichier image
- PDF_makespotcolor — Place un point de couleur PDF
- PDF_moveto — Place le point courant PDF
- PDF_new — Crée un nouvel objet PDF
- PDF_open_ccitt — [Obsolète] Ouvre une image contenant des données brutes CCITT
- PDF_open_file — [Obsolète] Ouvre un nouveau fichier PDF
- PDF_open_gif — [Obsolète] Ouvre une image GIF
- PDF_open_image_file — [Obsolète] Lit une image depuis un fichier
- PDF_open_image — [Obsolète] Ouvre une image
- PDF_open_jpeg — [Obsolète] Ouvre une image JPEG
- PDF_open_memory_image — [Non supporté] Ouvre une image créée en mémoire par PHP
- PDF_open_pdi_page — Prépare une page
- PDF_open_pdi — Ouvre un fichier PDF (obsolète)
- PDF_open_tiff — [Obsolète] Ouvre une image TIFF
- PDF_pcos_get_number — Récupère la valeur du chemin pCOS
- PDF_pcos_get_stream — Récupère le contenu du chemin pCOS
- PDF_pcos_get_string — Récupère la valeur du chemin pCOS
- PDF_place_image — [Obsolète] Place une image dans la page
- PDF_place_pdi_page — [Obsolète] Place une page dans le document
- PDF_process_pdi — Traite un document PDF importé
- PDF_rect — Dessine un rectangle
- PDF_restore — Rétablit l'ancien environnement graphique PDF
- PDF_resume_page — Réouvre une page
- PDF_rotate — Configure la rotation
- PDF_save — Sauve l'environnement graphique courant
- PDF_scale — Configure l'échelle du document
- PDF_set_border_color — [Obsolète] Configure la couleur des bords autour des liens et annotations
- PDF_set_border_dash — [Obsolète] Configure le style des lignes autour des liens des annotations
- PDF_set_border_style — [Obsolète] Choisit le style de bord autour des liens et annotations
- PDF_set_char_spacing — [Obsolète] Configure l'espacement des caractères
- PDF_set_duration — [Obsolète] Configure la durée entre deux pages
- PDF_set_gstate — Active un objet graphique
- PDF_set_horiz_scaling — [Obsolète] Configure l'échelle horizontale du texte
- PDF_set_info_author — [Obsolète] Remplit le champ d'auteur du document
- PDF_set_info_creator — [Obsolète] Remplit le champ de créateur du document
- PDF_set_info_keywords — [Obsolète] Remplit le champ de mots-clés du document
- PDF_set_info_subject — [Obsolète] Remplit le champ de sujet du document
- PDF_set_info_title — [Obsolète] Remplit le champ de titre du document
- PDF_set_info — Remplit un champ de l'en-tête de document PDF
- PDF_set_layer_dependency — Définit les liens entre les interfaces
- PDF_set_leading — [Obsolète] Configure la distance entre deux lignes de texte
- PDF_set_parameter — Modifie certains paramètres
- PDF_set_text_matrix — [Obsolète] Configure la matrice de texte
- PDF_set_text_pos — Modifie la position du texte
- PDF_set_text_rendering — [Obsolète] Détermine le rendu du texte
- PDF_set_text_rise — [Obsolète] Configure l'élévation de texte
- PDF_set_value — Modifie certains paramètres numériques
- PDF_set_word_spacing — [Obsolète] Configure l'espace entre deux mots
- PDF_setcolor — Configure la couleur de dessin et de remplissage
- PDF_setdash — Configure le mode de pointillé
- PDF_setdashpattern — Définit un modèle de masque
- PDF_setflat — Configure la position à plat (flatness)
- PDF_setfont — Configure la police courante
- PDF_setgray_fill — [Obsolète] Configure la couleur de remplissage à un niveau de gris
- PDF_setgray_stroke — [Obsolète] Configure la couleur de dessin à un niveau de gris
- PDF_setgray — [Obsolète] Configure la couleur de dessin et de remplissage à un niveau de gris
- PDF_setlinecap — Configure le paramètre de linecap
- PDF_setlinejoin — Configure le paramètre de linejoin
- PDF_setlinewidth — Configure la largeur de ligne
- PDF_setmatrix — Configure la matrice de transformation
- PDF_setmiterlimit — Configure la "miter limit"
- PDF_setpolydash — [Obsolète] Configure des pointillés complexes
- PDF_setrgbcolor_fill — [Obsolète] Choisit la couleur utilisée pour le remplissage
- PDF_setrgbcolor_stroke — [Obsolète] Choisit la couleur utilisée pour le dessin
- PDF_setrgbcolor — [Obsolète] Choisit la couleur rgb de remplissage et de dessin
- PDF_shading_pattern — Définit un masque d'ombrage
- PDF_shading — Définit un dégradé
- PDF_shfill — Remplit un espace avec un dégradé
- PDF_show_boxed — [Obsolète] Affiche le texte dans un cadre
- PDF_show_xy — Affiche un texte à une position donnée
- PDF_show — Affiche le texte à la position courante
- PDF_skew — Incline le système de coordonnées
- PDF_stringwidth — Retourne la largeur d'un texte avec la police courante
- PDF_stroke — Dessine la ligne le long du chemin
- PDF_suspend_page — Suspend une page
- PDF_translate — Effectue une translation de l'origine du système de coordonnées
- PDF_utf16_to_utf8 — Convertit une chaîne UTF-16 en UTF-8
- PDF_utf32_to_utf16 — Convertie une chaîne UTF-32 en UTF-16
- PDF_utf8_to_utf16 — Convertit une chaîne UTF-8 en UTF-16
Fonctions PDF
09-Oct-2008 11:20
20-Jan-2008 03:16
/*
Folks, There is an excellent tutorial from Rasmus Lerdorf available at (It does not support I.E.)
http://talks.php.net/show/osconpdf/
Where PHP Mastermind Guru (Father) explained nicely about text, fonts, images and their attributes with working snippets.
Another tutorial can be found at
www.devshed.com/c/a/PHP/Building-PDF-Documents-with-PHP-5
Hence following is the various size of PDF Document.
Origin is at the lower left and the basic unit is the DTP pt.
1 pt = 1/72 inch = 0.35277777778 mm
Some common page sizes
Format Width Height
US-Letter 612 792
US-Legal 612 1008
US-Ledger 1224 792
11x17 792 1224
A0 2380 3368
A1 1684 2380
A2 1190 1684
A3 842 1190
A4 595 842
A5 421 595
A6 297 421
B5 501 709
*/
10-Jan-2008 09:54
For those of us that do not want to pay for a commercial license to use PDFlib I suggest TCPDF:
http://tcpdf.sf.net
TCPDF is an Open Source PHP class for generating PDF files on-the-fly without requiring external extensions. This class is already adopted by a large number of php projects such as phpMyAdmin, Drupal, Joomla, Xoops, TCExam, etc.
Starting from 2.1 version TCPDF supports UTF-8 Unicode and bidirectional languages such as Arabic and Hebrew.
21-Nov-2007 02:06
To get this to work on Windows do not use escapeshellcmd()
From online help:
Following characters are preceded by a backslash: #&;`|*?~<>^()[]{}$\, \x0A and \xFF. ' and " are escaped only if they are not paired. In Windows, all these characters plus % are replaced by a space instead.
So you are probably passing duff paths to pdf2text.exe
Removing escapeshellcmd worked for me. Just make darned sure you are in control of what is being passed through to your system call.
18-Nov-2007 09:25
To extend alex's example earlier, you can use a couple of switches inside the pdf doc to give you the total number of pages, without using any ext. I would have added the whole code, however the site keeps on saying "line is too long... yadayada".
Open the doc using fopen("$file", "rb"); (for reading)
Test the first approx 1000b for the following regex
<?php
if(preg_match("/\/N\s+([0-9]+)/", $contents, $found)) {
return $found[1];
}
?>
If that doesn't return anything, you have to read the rest of the file:
<?php
preg_match_all("/\/Type\s*\/Pages\s*\/Kids\s+
\[.*?\]\s*\/Count\s+([0-9]+)/");
?>
This may return more than one, so look through for the highest value, which is the total number of pages in your doc.
06-Nov-2007 12:37
The other issue with DOMpdf is that it has some pretty painful flaws.
You have to supply full paths to everything (images, includes, javascript files, etc). And boy, do i mean everything.
Even then, it is not 100% sound. If you have complex sites, it cannot handle it. It instead breaks the design and only provides you with about a million broken images.
Don't get me wrong, it's GREAT for use with lower-end more simple sites, but if you have a site that say, has a javascript navigation, flash, and a bunch of container divs, it's really not going to do the job.
The above library seems to be the best fit, as about the only way to get high-end sites to work is just to manually write it out yourself using the functions above.
Sorry to bust anyone's bubble. Good luck.
24-Oct-2007 12:23
http://www.fpdf.org/ is also quite good. Np lib install is required
-Shelon Padmore
23-Oct-2007 10:13
There is XPDF Win32 binary package at SourceForge for pdftotext purpose that works.
I've tried php codes below but didn't work.
23-Aug-2007 02:08
domPDF is not so great PDF creator becouse don't support foreign charachters.
15-Aug-2007 11:00
I seriously tried to get PDF parsing to work to use it in the indexing for fulltext search for a document management. But none of the pdf2text functions below worked for my test cases (among them an openoffice generated pdf file and a file generated by fpdf).
But I found a REALLY WORKING SOLUTION! On linux systems, install the XPDF package. It comes with a tool called pdftotext. Use php code similar to the following to get the text content of your pdf files:
<?php
$file = "test.pdf";
$outpath = preg_replace("/\.pdf$/", "", $file).".txt";
system("pdftotext ".escapeshellcmd($file), $ret);
if ($ret == 0)
{
$value = file_get_contents($outpath);
unlink($outpath);
print $value;
}
if ($ret == 127)
print "Could not find pdftotext tool.";
if ($ret == 1)
print "Could not find pdf file.";
?>
The solution works on all test cases and is much more powerful than any of the previous pure php functions posted here, although only available on linux.
15-Aug-2007 01:49
http://www.digitaljunkies.ca/dompdf/index.php
PHP5 class that converts HTML to PDF. From the website:
"At its heart, dompdf is (mostly) CSS2.1 compliant HTML layout and rendering engine written in PHP. It is a style-driven renderer: it will download and read external stylesheets, inline style tags, and the style attributes of individual HTML elements. It also supports most presentational HTML attributes."
19-Jul-2007 01:19
Easiest way to get the text of a pdf is to install xpdf (on redhat yum -y install xpdf)
then run xpdftotext your.pdf - which will then generate your.txt.
03-Jul-2007 04:28
For FPDF there also is an addon (FPDI) available, which let you import existing PDF documents:
http://www.setasign.de/products/pdf-php-solutions/fpdi/
01-Jun-2007 02:22
Totally free open source alternative is also available without any license cost at
http://fpdf.org/
03-May-2007 07:51
For those of us that do not want to pay for a commercial license to use PDFlib in a closed-source project, there are at least two good alternatives: FPDF and TCPDF
http://www.fpdf.org/
PHP4 and PHP5 support
http://sourceforge.net/projects/pdf-php
PHP5 support only
30-Mar-2007 07:09
I am trying to extract the text from PDF files and use it to feed a search engine (Intranet tool). I tried several functions "PDF2TXT" posted below, but not they do not produce the expected result. At least, all words need to be separated by spaces (then used as keywords), and the "junk" codes removed (for example: binary data, pictures...). I start modifying the interesting function posted by Swen, and here is the my current version that starts to work quite well (with PDF version 1.2). Sorry for having a quite different style of programming. Luc
<?php
// Patch for pdf2txt() posted Sven Schuberth
// Add/replace following code (cannot post full program, size limitation)
// handles the verson 1.2
// New version of handleV2($data), only one line changed
function handleV2($data){
// grab objects and then grab their contents (chunks)
$a_obj = getDataArray($data,"obj","endobj");
foreach($a_obj as $obj){
$a_filter = getDataArray($obj,"<<",">>");
if (is_array($a_filter)){
$j++;
$a_chunks[$j]["filter"] = $a_filter[0];
$a_data = getDataArray($obj,"stream\r\n","endstream");
if (is_array($a_data)){
$a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
}
}
}
// decode the chunks
foreach($a_chunks as $chunk){
// look at each chunk and decide how to decode it - by looking at the contents of the filter
$a_filter = split("/",$chunk["filter"]);
if ($chunk["data"]!=""){
// look at the filter to find out which encoding has been used
if (substr($chunk["filter"],"FlateDecode")!==false){
$data =@ gzuncompress($chunk["data"]);
if (trim($data)!=""){
// CHANGED HERE, before: $result_data .= ps2txt($data);
$result_data .= PS2Text_New($data);
} else {
//$result_data .= "x";
}
}
}
}
return $result_data;
}
// New function - Extract text from PS codes
function ExtractPSTextElement($SourceString)
{
$CurStartPos = 0;
while (($CurStartText = strpos($SourceString, '(', $CurStartPos)) !== FALSE)
{
// New text element found
if ($CurStartText - $CurStartPos > 8) $Spacing = ' ';
else {
$SpacingSize = substr($SourceString, $CurStartPos, $CurStartText - $CurStartPos);
if ($SpacingSize < -25) $Spacing = ' '; else $Spacing = '';
}
$CurStartText++;
$StartSearchEnd = $CurStartText;
while (($CurStartPos = strpos($SourceString, ')', $StartSearchEnd)) !== FALSE)
{
if (substr($SourceString, $CurStartPos - 1, 1) != '\\') break;
$StartSearchEnd = $CurStartPos + 1;
}
if ($CurStartPos === FALSE) break; // something wrong happened
// Remove ending '-'
if (substr($Result, -1, 1) == '-')
{
$Spacing = '';
$Result = substr($Result, 0, -1);
}
// Add to result
$Result .= $Spacing . substr($SourceString, $CurStartText, $CurStartPos - $CurStartText);
$CurStartPos++;
}
// Add line breaks (otherwise, result is one big line...)
return $Result . "\n";
}
// Global table for codes replacement
$TCodeReplace = array ('\(' => '(', '\)' => ')');
// New function, replacing old "pd2txt" function
function PS2Text_New($PS_Data)
{
global $TCodeReplace;
// Catch up some codes
if (ord($PS_Data[0]) < 10) return '';
if (substr($PS_Data, 0, 8) == '/CIDInit') return '';
// Some text inside (...) can be found outside the [...] sets, then ignored
// => disable the processing of [...] is the easiest solution
$Result = ExtractPSTextElement($PS_Data);
// echo "Code=$PS_Data\nRES=$Result\n\n";
// Remove/translate some codes
return strtr($Result, $TCodeReplace);
}
?>
29-Mar-2007 08:38
I've improved the codesnipped for the pdf2txt version 1.2.
Now its possible the translate pdf version >1.2 into plain text.
Sven
<?php
// Function : pdf2txt()
// Arguments : $filename - Filename of the PDF you want to extract
// Description : Reads a pdf file, extracts data streams, and manages
// their translation to plain text - returning the plain
// text at the end
// Authors : Jonathan Beckett, 2005-05-02
// : Sven Schuberth, 2007-03-29
function pdf2txt($filename){
$data = getFileData($filename);
$s=strpos($data,"%")+1;
$version=substr($data,$s,strpos($data,"%",$s)-1);
if(substr_count($version,"PDF-1.2")==0)
return handleV3($data);
else
return handleV2($data);
}
// handles the verson 1.2
function handleV2($data){
// grab objects and then grab their contents (chunks)
$a_obj = getDataArray($data,"obj","endobj");
foreach($a_obj as $obj){
$a_filter = getDataArray($obj,"<<",">>");
if (is_array($a_filter)){
$j++;
$a_chunks[$j]["filter"] = $a_filter[0];
$a_data = getDataArray($obj,"stream\r\n","endstream");
if (is_array($a_data)){
$a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
}
}
}
// decode the chunks
foreach($a_chunks as $chunk){
// look at each chunk and decide how to decode it - by looking at the contents of the filter
$a_filter = split("/",$chunk["filter"]);
if ($chunk["data"]!=""){
// look at the filter to find out which encoding has been used
if (substr($chunk["filter"],"FlateDecode")!==false){
$data =@ gzuncompress($chunk["data"]);
if (trim($data)!=""){
$result_data .= ps2txt($data);
} else {
//$result_data .= "x";
}
}
}
}
return $result_data;
}
//handles versions >1.2
function handleV3($data){
// grab objects and then grab their contents (chunks)
$a_obj = getDataArray($data,"obj","endobj");
$result_data="";
foreach($a_obj as $obj){
//check if it a string
if(substr_count($obj,"/GS1")>0){
//the strings are between ( and )
preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);
if(is_array($field))
foreach($field as $data)
$result_data.=$data[1];
}
}
return $result_data;
}
function ps2txt($ps_data){
$result = "";
$a_data = getDataArray($ps_data,"[","]");
if (is_array($a_data)){
foreach ($a_data as $ps_text){
$a_text = getDataArray($ps_text,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
} else {
// the data may just be in raw format (outside of [] tags)
$a_text = getDataArray($ps_data,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
return $result;
}
function getFileData($filename){
$handle = fopen($filename,"rb");
$data = fread($handle, filesize($filename));
fclose($handle);
return $data;
}
function getDataArray($data,$start_word,$end_word){
$start = 0;
$end = 0;
unset($a_result);
while ($start!==false && $end!==false){
$start = strpos($data,$start_word,$end);
if ($start!==false){
$end = strpos($data,$end_word,$start);
if ($end!==false){
// data is between start and end
$a_result[] = substr($data,$start,$end-$start+strlen($end_word));
}
}
}
return $a_result;
}
?>
22-Aug-2006 05:35
Here is a function to test whether a file is a PDF without using any external library.
<?php
define('PDF_MAGIC', "\\x25\\x50\\x44\\x46\\x2D");
function is_pdf($filename) {
return (file_get_contents($filename, false, null, 0, strlen(PDF_MAGIC)) === PDF_MAGIC) ? true : false;
}
?>
It's not checking if the whole file is valid, just if the correct header is present at the beginning of the file.
17-Jul-2006 11:01
domPDF is also a great PDF creation interface. it basically converts your code to CSS and then builds the PDF from that with the absolute positions, and what not...
12-Jan-2006 09:55
I was having trouble with streaming inline PDf's using PHP 5.0.2, Apache 2.0.54.
This is my code:
<?
header("Pragma: public");
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: must-revalidate");
header("Content-type: application/pdf");
header("Content-Length: ".filesize($file));
header("Content-disposition: inline; filename=$file");
header("Accept-Ranges: ".filesize($file));
readfile($file);
exit();
?>
It would work fine in Mozilla Firefox (1.0.7) but with IE (6.0.2800.1106) it would not bring up the Adobe Reader plugin and instead ask me to save it or open it as a PHP file.
Oddly enough, I turned off ZLib.compression and it started working. I guess the compression is confusing IE. I tried leaving out the content-length header thinking maybe it was unmatched filesize (uncompressed number vs actual received compressed size), but then without it it screws up Firefox too.
What I ended up doing was disabling Zlib compression for the PDF output pages using ini_set:
<?
ini_set('zlib.output_compression','Off');
?>
Maybe this will help someone. Will post over in the PDF section as well.
03-Nov-2005 08:01
I was searching for a lowcost/opensource option for combining static html files [as templates] and dynamic output from perl or php routines etc. And the sooner or later I found out that this was the most stable, 'speedest' and customizeable way to produce usable pdf 's with nice formatting :
1] create html page output [perl-> html output, direct html output from any app or php echo's etc. [sort these html files locally]
2] parse all html [inluding webimages links, tables font formatting etc] to [E]PS files with the perl app : html2ps [as mentioned beneath]
http://user.it.uu.se/~jan/html2ps.html [sort all ps files by future pdf page positions]
3] use the free ps2pdf/ps2pdfwr linux application
http://www.ps2pdf.com/convert/index.htm [uses gostscript, ghostview libs and so on etc]
Has great formatting options like headers, footers, numbering etc
[sort pdf files]
4] convert all pdf files to 1 pdf file with : pdftk [pdftoolkit], deliveres optional compressions/encryption, background stamps etc
One should ask why using different scripts :
- combination perl/php is great : perl is speedier at some issues like conversion to ps files in my experience
- ps to pdf is quickier then direct php to pdf [in my exp.!]
- I have total control over every files whenever i change html files as a template I use only editors or other app. for it [online or offline].
p.s. I had to make a opensource solution for creating simpel report analyses that's based on things like :
- first page [name / title / #/ date]
- some static info [like introduction, copyrights etc]
- some dynamic info [outputted from php->dbase queries] combined
with html tags/images etc.
And this all mixed [so seperated in files for transparancy]. Also the 3 way manner : data-> html, html->ps, ps->pdf, is easier and quickier to program or adjust in every step.
Correct me if i'm wrong [mail me to]
ing. Valentijn Langendorff
Design & Technologist
08-Oct-2005 04:30
After one hole day understanding how pdflib works i got the conclusion that its enough hard to draw just with words to furthermore for drawing a line maybe you will need something like four lines of code, so i did my own functions to do the life easier and the code more understable to modify and draw. I also made a function that will draw a rect with the corners round and the posibility even to fill it ;)
You can get it from http://www.deulos.com/pdf_php.php
feel free to make suggestions or whatever u like ;o)
some code that can be very helpful for starters.
<?php
// Declare PDF File
$pdf = pdf_new();
PDF_open_file($pdf);
// Set Document Properties
PDF_set_info($pdf, "author", "Alexander Pas");
PDF_set_info($pdf, "title", "PDF by PHP Example");
PDF_set_info($pdf, "creator", "Alexander Pas");
PDF_set_info($pdf, "subject", "Testing Code");
// Get fonts to use
pdf_set_parameter($pdf, "FontOutline", "Arial=arial.ttf"); // get a custom font
$font1 = PDF_findfont($pdf, "Helvetica-Bold", "winansi", 0); // declare default font
$font2 = PDF_findfont($pdf, "Arial", "winansi", 1); // declare custom font & embed into file
/*
You can use the following Fontypes 14 safely (the default fonts)
Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique
Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique
Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic
Symbol, ZapfDingbats
*/
// make the images
$image1 = PDF_open_image_file($pdf, "gif", "image.gif"); //supported filetypes are: jpeg, tiff, gif, png.
//Make First Page
PDF_begin_page($pdf, 450, 450); // page width and height.
$bookmark = PDF_add_bookmark($pdf, "Front"); // add a top level bookmark.
PDF_setfont($pdf, $font1, 12); // use this font from now on.
PDF_show_xy($pdf, "First Page!", 5, 225); // show this text measured from the left top.
pdf_place_image($pdf, $image1, 255, 5, 1); // last number will schale it.
PDF_end_page($pdf); // End of Page.
//Make Second Page
PDF_begin_page($pdf, 450, 225); // page width and height.
$bookmark1 = PDF_add_bookmark($pdf, "Chapter1", $bookmark); // add a nested bookmark. (can be nested multiple times.)
PDF_setfont($pdf, $font2, 12); // use this font from now on.
PDF_show_xy($pdf, "Chapter1!", 225, 5);
PDF_add_bookmark($pdf, "Chapter1.1", $bookmark1); // add a nested bookmark (already in a nested one).
PDF_setfont($pdf, $font1, 12);
PDF_show_xy($pdf, "Chapter1.1", 225, 5);
PDF_end_page($pdf);
// Finish the PDF File
PDF_close($pdf); // End Of PDF-File.
$output = PDF_get_buffer($pdf); // assemble the file in a variable.
// Output Area
header("Content-type: application/pdf"); //set filetype to pdf.
header("Content-Length: ".strlen($output)); //content length
header("Content-Disposition: attachment; filename=test.pdf"); // you can use inline or attachment.
echo $output; // actual print area!
// Cleanup
PDF_delete($pdf);
?>
05-Sep-2005 07:22
Yet another addition to the PDF text extraction code last posted by jorromer. The code only seemed to work for PDF 1.2 (Acrobat 3.x) or below. This pdfExtractText function uses regular expressions to cover cases I have found in PDF 1.3 and 1.4 documents. The code also handles closing brackets in the text stream, which were ignored by the previous version. My regular expression skills are somewhat lacking, so improvements may possible by a more skilled programmer. I'm sure there are still cases that this function will not handle, but I haven't come across any yet...
<?php
function pdf2string($sourcefile) {
$fp = fopen($sourcefile, 'rb');
$content = fread($fp, filesize($sourcefile));
fclose($fp);
$searchstart = 'stream';
$searchend = 'endstream';
$pdfText = '';
$pos = 0;
$pos2 = 0;
$startpos = 0;
while ($pos !== false && $pos2 !== false) {
$pos = strpos($content, $searchstart, $startpos);
$pos2 = strpos($content, $searchend, $startpos + 1);
if ($pos !== false && $pos2 !== false){
if ($content[$pos] == 0x0d && $content[$pos + 1] == 0x0a) {
$pos += 2;
} else if ($content[$pos] == 0x0a) {
$pos++;
}
if ($content[$pos2 - 2] == 0x0d && $content[$pos2 - 1] == 0x0a) {
$pos2 -= 2;
} else if ($content[$pos2 - 1] == 0x0a) {
$pos2--;
}
$textsection = substr(
$content,
$pos + strlen($searchstart) + 2,
$pos2 - $pos - strlen($searchstart) - 1
);
$data = @gzuncompress($textsection);
$pdfText .= pdfExtractText($data);
$startpos = $pos2 + strlen($searchend) - 1;
}
}
return preg_replace('/(\s)+/', ' ', $pdfText);
}
function pdfExtractText($psData){
if (!is_string($psData)) {
return '';
}
$text = '';
// Handle brackets in the text stream that could be mistaken for
// the end of a text field. I'm sure you can do this as part of the
// regular expression, but my skills aren't good enough yet.
$psData = str_replace('\)', '##ENDBRACKET##', $psData);
$psData = str_replace('\]', '##ENDSBRACKET##', $psData);
preg_match_all(
'/(T[wdcm*])[\s]*(\[([^\]]*)\]|\(([^\)]*)\))[\s]*Tj/si',
$psData,
$matches
);
for ($i = 0; $i < sizeof($matches[0]); $i++) {
if ($matches[3][$i] != '') {
// Run another match over the contents.
preg_match_all('/\(([^)]*)\)/si', $matches[3][$i], $subMatches);
foreach ($subMatches[1] as $subMatch) {
$text .= $subMatch;
}
} else if ($matches[4][$i] != '') {
$text .= ($matches[1][$i] == 'Tc' ? ' ' : '') . $matches[4][$i];
}
}
// Translate special characters and put back brackets.
$trans = array(
'...' => '…',
'\205' => '…',
'\221' => chr(145),
'\222' => chr(146),
'\223' => chr(147),
'\224' => chr(148),
'\226' => '-',
'\267' => '•',
'\(' => '(',
'\[' => '[',
'##ENDBRACKET##' => ')',
'##ENDSBRACKET##' => ']',
chr(133) => '-',
chr(141) => chr(147),
chr(142) => chr(148),
chr(143) => chr(145),
chr(144) => chr(146),
);
$text = strtr($text, $trans);
return $text;
}
?>
If you want to display the number of pages (for example: page 1 of 3) then the following code could be helpful:
<?php
...
$pdf->begin_page_ext(842,595 , "");
.. add text,images,...
$pdf->suspend_page("");
$pdf->begin_page_ext(842,595 , "");
.. add text,images,...
$pdf->suspend_page("");
... create all pages
$pdf->resume_page("pagenumber 1");
... add number of pages to page 1
$pdf->end_page_ext("");
$pdf->resume_page("pagenumber 2");
... add number of pages to page 2
$pdf->end_page_ext("");
...
?>
07-Jun-2005 07:51
I recently use mattb code below for the extraction of text from PDF files. I modify this code for only extract text fields.
Hope i can help some one
Here is the Function
<?php