I need to extract the headings of my PDF file which start with # symbols through PHP. I don’t know how to do it. Here is my PDF file link:
https://afxwebdesign.com/order.pdf
I have tried this script:
<?php
// Load the PDF file
$pdfFile = 'order.pdf';
// Use a PDF parsing library like TCPDF or FPDI to extract text
// Code snippet using TCPDF
require_once('tcpdf.php');
require_once('vendor/setasign/fpdi/src/autoload.php');
use setasignFpdiTcpdfFpdi;
$pdf = new Fpdi();
$pageCount = $pdf->setSourceFile($pdfFile);
for ($pageNo = 1; $pageNo <= $pageCount; $pageNo++) {
$templateId = $pdf->importPage($pageNo);
$text = $pdf->getPageContent($pageNo);
preg_match_all('/^#[^#].*$/m', $text, $headings);
foreach ($headings[0] as $heading) {
echo $heading . "n";
}
}
$pdf->close();
?>
But it’s not working – it throws this error:
Fatal error: Uncaught Error: Call to undefined method
setasignFpdiTcpdfFpdi::getPageContent() in
C:xampphtdocspdfextractindex.php:17 Stack trace: #0 {main} thrown
in C:xampphtdocspdfextractindex.php on line 17