Python library to extract 'epub' information

Question

I'm trying to create a epub uploader to iBook in python. I need a python lib to extract book information. Before implementing this by myself I wonder if anyone know a already made python lib that does it.

I am voting to leave this question open, since it seems that at the time of asking, there was no library to implement the required functionality, and I think that the accepted answer contains valuable code. — Gustav Bertram, Dec 05 '13 at 09:09
The comment is not for you, but for the people voting to close the question. There is no reason to unaccept the answer, particularly as it solved your problem. — Gustav Bertram, Dec 10 '13 at 13:42
Closing does not mean deleting, the answer is attracting link only answers and maybe spam in future. — bummi, May 11 '15 at 05:19

Hugh Bothwell · Accepted Answer · 2010-06-25T14:06:41.037

An .epub file is a zip-encoded file containing a META-INF directory, which contains a file named container.xml, which points to another file usually named Content.opf, which indexes all the other files which make up the e-book (summary based on http://www.jedisaber.com/eBooks/tutorial.asp ; full spec at http://www.idpf.org/2007/opf/opf2.0/download/ )

The following Python code will extract the basic meta-information from an .epub file and return it as a dict.

import zipfile
from lxml import etree

def get_epub_info(fname):
    ns = {
        'n':'urn:oasis:names:tc:opendocument:xmlns:container',
        'pkg':'http://www.idpf.org/2007/opf',
        'dc':'http://purl.org/dc/elements/1.1/'
    }

    # prepare to read from the .epub file
    zip = zipfile.ZipFile(fname)

    # find the contents metafile
    txt = zip.read('META-INF/container.xml')
    tree = etree.fromstring(txt)
    cfname = tree.xpath('n:rootfiles/n:rootfile/@full-path',namespaces=ns)[0]

    # grab the metadata block from the contents metafile
    cf = zip.read(cfname)
    tree = etree.fromstring(cf)
    p = tree.xpath('/pkg:package/pkg:metadata',namespaces=ns)[0]

    # repackage the data
    res = {}
    for s in ['title','language','creator','date','identifier']:
        res[s] = p.xpath('dc:%s/text()'%(s),namespaces=ns)[0]

    return res

Sample output:

{
    'date': '2009-12-26T17:03:31',
    'identifier': '25f96ff0-7004-4bb0-b1f2-d511ca4b2756',
    'creator': 'John Grisham',
    'language': 'UND',
    'title': 'Ford County'
}

Sure enough, epubs are zip files with a different extension. :) — Brōtsyorfuzthrāx, Sep 20 '18 at 04:14

Alex Martelli · Answer 2 · 2010-06-25T01:09:23.880

3

Something like epub-tools, for example? But that's mostly about writing epub format (from various possible sources), as is epubtools (similar spelling, different project). For reading it, I'd try the companion project threepress, a Django app for showing epub books on a browser -- haven't looked at that code, but I imagine that in order to show the book it must surely first be able to read it;-).

edited Jun 25 '10 at 01:09

answered Jun 25 '10 at 01:03

Alex Martelli

762,786
156
1,160
1,345

epub-tools and epubtools seems to be epub generators. – xiamx Jun 26 '10 at 21:31
1

@xiamx, yes, "mostly about writing" as I said -- so, have you tried the threepress code? – Alex Martelli Jun 27 '10 at 02:08

score 2 · Answer 3 · answered Jun 05 '12 at 12:09

2

Check out the epub module. It looks like an easy option.

answered Jun 05 '12 at 12:09

marbdq

1,197
7
5

Nicholas O'Deegan · Answer 4 · 2013-02-09T21:27:03.400

1

I wound up here after looking for something similar and was inspired by Mr. Bothwell's code snippet to start my own project. If anyone is interested ... http://epubzilla.odeegan.com/

edited Feb 09 '13 at 21:27

answered Feb 09 '13 at 03:37

Nicholas O'Deegan

39
1
5

quite useful you link – embert Apr 20 '14 at 13:20
Downvoting cause site fails to load. Discarded project I guess. – OMY Nov 02 '20 at 09:03

Python library to extract 'epub' information

4 Answers4