Questions tagged [pdfbox]

The Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities.

The Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities.

Features:

  • PDF to text extraction
  • Merge PDF Documents
  • PDF Document Encryption/Decryption
  • Lucene Search Engine Integration
  • Fill in form data FDF and XFDF
  • Create a PDF from a text file
  • Create images from PDF pages
  • Print a PDF
  • PDF/A validation

Official Website: http://pdfbox.apache.org/

Latest release: 2.0.21 released on 2020-08-20

Useful Links:

3085 questions
76
votes
7 answers

How to merge two PDF files into one in Java?

I want to merge many PDF files into one using PDFBox and this is what I've done: PDDocument document = new PDDocument(); for (String pdfFile: pdfFiles) { PDDocument part = PDDocument.load(pdfFile); List list =…
Lipis
  • 19,958
  • 18
  • 88
  • 117
75
votes
19 answers

Parsing PDF files (especially with tables) with PDFBox

I need to parse a PDF file which contains tabular data. I'm using PDFBox to extract the file text to parse the result (String) later. The problem is that the text extraction doesn't work as I expected for tabular data. For example, I have a file…
Matheus Moreira
  • 2,219
  • 4
  • 23
  • 31
62
votes
5 answers

Convert PDF files to images with PDFBox

Can someone give me an example on how to use Apache PDFBox to convert a PDF file in different images (one for each page of the PDF)?
user3423568
  • 651
  • 1
  • 6
  • 4
56
votes
6 answers

convert pdf to svg

I want to convert PDF to SVG please suggest some libraries/executable that will be able to do this efficiently. I have written my own java program using the apache PDFBox and Batik libraries - PDDocument document = PDDocument.load( pdfFile…
user434541
  • 1,245
  • 2
  • 13
  • 21
44
votes
6 answers

How to generate multiple lines in PDF using Apache pdfbox

I am using Pdfbox to generate PDF files using Java. The problem is that when i add long text contents in the document, it is not displayed properly. Only a part of it is displayed. That too in a single line. I want text to be in multiple lines. My…
Ronald James
  • 607
  • 1
  • 5
  • 11
42
votes
2 answers

How to center a text using PDFBox

My question is very simple: how can I center a text on a PDF, using PDFBox? I don't know the string in advance, I can't find the middle by trial. The string doesn't always have the same width. I need either: A method that can center the text,…
SteeveDroz
  • 5,148
  • 5
  • 28
  • 57
42
votes
5 answers

PDF find out if text is underlined or a table cell

I have been playing around with PdfBox and PDFTextStripperByArea method. I was able to extract information if the text is bold or italic, but I'm unable to get the underline information. As far as I understand it in PDF, underline is done by drawing…
Drejc
  • 13,466
  • 15
  • 65
  • 101
34
votes
5 answers

How to get raw text from pdf file using java

I have some pdf files, Using pdfbox i have converted them into text and stored into text files, Now from the text files i want to remove Hyperlinks All special characters Blank lines headers footers of pdf files “1)”,“2)”, “a)”, “bullets”, etc. I…
user2609542
  • 701
  • 3
  • 12
  • 19
32
votes
2 answers

How to create Table using Apache PDFBox

We are planning to migrate our pdf generation utilities from iText to PDFBox (Due to licensing issues in iText). With some effort, I was able to write and position text, draw lines etc. But creating Tables with text embedded in Table cells is a…
Anil
  • 1,585
  • 2
  • 16
  • 25
31
votes
3 answers

PDFBox - find page dimensions

How can I find(in mm) the width and the height of a pdf page using PDFBox? Currently, I'm using this: System.out.println(page.getMediaBox().getHeight()); System.out.println(page.getMediaBox().getWidth()); but the result is(not in mm): 842.0 595.22
John Smith
  • 1,146
  • 4
  • 15
  • 35
30
votes
8 answers

extract images from pdf using pdfbox

I m trying to extract images from a pdf using pdfbox. The example pdf here But i m getting blank images only. The code i m trying:- public static void main(String[] args) { PDFImageExtract obj = new PDFImageExtract(); try { …
Pradyut Bhattacharya
  • 4,881
  • 13
  • 46
  • 78
29
votes
3 answers

Can't add an image to a pdf using PDFBox

I'm writing a java app that creates a pdf from scratch using the pdfbox library. I need to place a jpg image in one of the page. I'm using this code: PDDocument document = new PDDocument(); PDPage page = new…
Davide Gualano
  • 12,155
  • 9
  • 43
  • 63
29
votes
5 answers

How to extract text from a PDF file with Apache PDFBox

I would like to extract text from a given PDF file with Apache PDFBox. I wrote this code: PDFTextStripper pdfStripper = null; PDDocument pdDoc = null; COSDocument cosDoc = null; File file = new File(filepath); PDFParser parser = new PDFParser(new…
Benben
  • 1,235
  • 5
  • 17
  • 30
25
votes
3 answers

How to add PDFBox to an Android project or suggest alternative

I'm attempting to open an existing pdf file and then add another page to the pdf document from within an Android application. On the added page, I need to add some text and an image. I am wanting to give PDFBox a try. Other solutions such as…
Dittimon
  • 964
  • 2
  • 13
  • 27
25
votes
4 answers

Apache PDFBox Java library - Is there an API for creating tables?

I am using the Apache PDFBox java library to create PDFs. Is there a way to create a data-table using pdfbox? If there is no such API to do it, I would require to manually draw the table using drawLine etc., Any suggestions on how to go about this?
Keshav
  • 4,248
  • 8
  • 26
  • 48
1
2 3
99 100