invoice2data 0.4.4__tar.gz → 0.4.6__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {invoice2data-0.4.4/src/invoice2data.egg-info → invoice2data-0.4.6}/PKG-INFO +110 -114
- {invoice2data-0.4.4 → invoice2data-0.4.6}/README.md +54 -98
- invoice2data-0.4.6/pyproject.toml +199 -0
- invoice2data-0.4.6/setup.cfg +4 -0
- invoice2data-0.4.6/src/invoice2data/__init__.py +3 -0
- invoice2data-0.4.6/src/invoice2data/__main__.py +353 -0
- invoice2data-0.4.6/src/invoice2data/extract/__init__.py +1 -0
- invoice2data-0.4.6/src/invoice2data/extract/invoice_template.py +386 -0
- invoice2data-0.4.6/src/invoice2data/extract/loader.py +153 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/parsers/__init__.py +1 -1
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/parsers/__interface__.py +1 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/parsers/lines.py +120 -30
- invoice2data-0.4.6/src/invoice2data/extract/parsers/regex.py +119 -0
- invoice2data-0.4.6/src/invoice2data/extract/parsers/static.py +27 -0
- invoice2data-0.4.6/src/invoice2data/extract/plugins/__init__.py +1 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/plugins/__interface__.py +1 -2
- invoice2data-0.4.6/src/invoice2data/extract/plugins/lines.py +31 -0
- invoice2data-0.4.6/src/invoice2data/extract/plugins/tables.py +221 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/au/au.com.opal.yml +5 -5
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.accor.invest.ibis.yml +11 -11
- invoice2data-0.4.6/src/invoice2data/extract/templates/be/be.accor.invest.novotel.yml +72 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.boucherie.pochet.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.cebeo.yml +4 -4
- invoice2data-0.4.6/src/invoice2data/extract/templates/be/be.eg_retail.yml +87 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.lampiris.facture-dacompte.yml +8 -8
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.lampiris.factuur.yml +6 -6
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.lampiris.regularisation.yml +7 -7
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.melchior-vins.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.proximus.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.scarlet.yml +8 -8
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.securex.social.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/ch/ch.pcengines.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.AzureInterior.yml +12 -12
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.amazon.aws.yml +7 -7
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.apple.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.apps4rent.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.binarylife.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.bloomberg.yml +2 -2
- invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.cloudflare.yml +80 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.cloudns.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.datadoghq.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.digitalocean.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.envato.yml +1 -1
- invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.eur.aliexpress.json +44 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.expressvpn.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.expressvpn_prio6.yml +3 -3
- invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.flipkart.WSRetail.json +14 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.ftserussell.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.github.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.globalsign.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.google.adwords.hk.yml +4 -4
- invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.hetzner.yml +90 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.hobohost.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.jamiepro.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.linode.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.microsoftonline.hk-v2017.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.microsoftonline.hk.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.mongodb.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.namesilo.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.newrelic.yml +7 -7
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.nl.lenovo.digitalriver.yml +12 -12
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.nmmn.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.nodisto.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.nyse.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.oyo.invoice.yml +1 -1
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.packtpub.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.pixartprinting.yml +2 -2
- invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.runbox.yml +65 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.sammymaystone.yml +6 -6
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.scaleway.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.textmaster.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.tmx.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.travis-ci.yml +1 -1
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.twitter.de.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.twitter.uk.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.twitter.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.upwork.yml +1 -1
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.usersnap.yml +2 -2
- invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.vultr.yml +81 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.amazon.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.bettina-kast.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.digikey.com.yml +18 -18
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.hosteurope.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.notebooksbilligerBillPay.yml +5 -5
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.ovh.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.qualityhosting.yml +5 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.united-domains.yml +4 -4
- invoice2data-0.4.6/src/invoice2data/extract/templates/es/com.mob-barcelona.caterina.yml +19 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/es/com.pepephone.yml +6 -3
- invoice2data-0.4.6/src/invoice2data/extract/templates/es/es.amazon.yml +18 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/es/es.digimobile.yml +19 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/es/es.supplies24.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/co.mooncard.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.adobe.ie.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.akretion.fr.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.amazon.aws.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.ateliercopieservice.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.chauffeur-prive.yml +6 -6
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.coriolis.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.easyjet.fr.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.eaudugrandlyon.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.godaddy.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.google.ie.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.hootsuite.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.jeanbesson.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.ldlc.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.linkedin.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.mention.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.microsoft.ie.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.myflyingbox.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.officetimeline.yml +1 -1
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.orange-business.mobile.yml +3 -3
- invoice2data-0.4.6/src/invoice2data/extract/templates/fr/com.ovh.fr.yml +17 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.rs-online.fr.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.saur.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.soyoustart.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.vinci-autoroutes.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/dolibarr.generique.yml +7 -7
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/eu.trainline.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.actn.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.airfrance.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.also.yml +5 -5
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.amazon.yml +4 -6
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.assurance-epargne-pension.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.bouyguestelecom.adsl-fiber.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.bouyguestelecom.mobile.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.butagaz.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.chronopost.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.dirafi.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.domaine-achat.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.easytrip.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.edf.entreprises.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.edf.pme.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.finagaz.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.fountain.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.free.adsl-fiber.yml +6 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.free.mobile.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.free.mobile2.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.futur.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.ge-iroise.yml +2 -2
- invoice2data-0.4.6/src/invoice2data/extract/templates/fr/fr.google.yml +19 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.greffe-tc-lyon.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.hiscox.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.internetsatellite.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.jpg.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.kubii.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.laposte.boutique.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.laposte.coliposte.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.lecab.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.leroymerlin.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.maaf.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.mediapart.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.moneo-resto.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.mouser.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.mycelium-roulement.yml +2 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.napsis.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.nexity.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.orange.fibre.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.orange.fixedline.yml +4 -4
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.prestaclic.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.publicationannoncelegale.yml +6 -6
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.sfr.adsl-fiber.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.sfr.mobile.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.sosh.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.teledec.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.topoffice.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/net.online.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/net.scaleway.yml +2 -2
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.accor.rhine.opco hotels.json +171 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.action.yml +16 -16
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.agrisneltank.json +66 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.albron.yml +16 -16
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.anwb.yml +48 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.argos.json +65 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.be.coolblue.yml +7 -7
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.begra.yml +10 -10
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.blokker.yml +13 -14
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.bouwmans.yml +85 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.bp.yml +59 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.buijtendijk.yml +104 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.bunq.yml +26 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.cpe.yml +2 -2
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.esso_eg_services.yml +72 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.esso_eg_services_v2.yml +81 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.fedex.json +51 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.ferbox.yml +9 -10
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.fletcher.yml +103 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.gamma.yml +16 -16
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.goos.yml +8 -9
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.gulf.yml +119 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.ipparking.paleiskwartier.yml +75 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.karwei.yml +15 -15
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.kav.yml +10 -11
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.koffiehenk.yml +9 -9
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.kuwait-q8.json +84 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.makro.json +143 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.marktplaats.json +106 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.megekko.json +143 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.momentsenmore.yml +74 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.ns.invoice.yml +79 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.odido.json +116 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.ok.yml +87 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.parkmobile.yml +47 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.praxis.yml +14 -14
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.reclameland.yml +19 -20
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.saeco.philips.eluscious.yml +15 -15
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.shell_nederland.yml +74 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.shell_schellenkens.yml +76 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.simpel.yml +1 -1
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.tango.json +67 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.total_express.yml +91 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.total_ototol.yml +109 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.total_servauto_ned.json +84 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.transip.yml +8 -8
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.tuynder.yml +11 -12
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.valk.exclusief.hotel.json +198 -0
- invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.valk.exclusief.restaurant.json +166 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.vistaprint.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.vodafone.yml +1 -1
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.wasco.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.weid.yml +10 -11
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.yezzer.yml +7 -8
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.zinkunie.yml +2 -2
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.bmw-fs.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.insert.subiekt-gt.yml +10 -10
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.insert.subiekt-nexo.yml +3 -3
- invoice2data-0.4.6/src/invoice2data/extract/templates/pl/pl.ksef.yml +29 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.orlen.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.p4.yml +3 -3
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.paypro.yml +4 -4
- invoice2data-0.4.6/src/invoice2data/extract/utils.py +33 -0
- invoice2data-0.4.6/src/invoice2data/input/__init__.py +1 -0
- invoice2data-0.4.6/src/invoice2data/input/gvision.py +110 -0
- invoice2data-0.4.6/src/invoice2data/input/ocrmypdf.py +126 -0
- invoice2data-0.4.6/src/invoice2data/input/pdfminer_wrapper.py +49 -0
- invoice2data-0.4.6/src/invoice2data/input/pdfplumber.py +66 -0
- invoice2data-0.4.6/src/invoice2data/input/pdftotext.py +76 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/input/tesseract.py +68 -53
- invoice2data-0.4.6/src/invoice2data/input/text.py +16 -0
- invoice2data-0.4.6/src/invoice2data/output/__init__.py +1 -0
- invoice2data-0.4.6/src/invoice2data/output/to_csv.py +55 -0
- invoice2data-0.4.6/src/invoice2data/output/to_json.py +61 -0
- invoice2data-0.4.6/src/invoice2data/output/to_xml.py +101 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6/src/invoice2data.egg-info}/PKG-INFO +110 -114
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data.egg-info/SOURCES.txt +51 -6
- invoice2data-0.4.6/src/invoice2data.egg-info/entry_points.txt +2 -0
- invoice2data-0.4.6/src/invoice2data.egg-info/requires.txt +46 -0
- invoice2data-0.4.6/tests/test_cli.py +444 -0
- invoice2data-0.4.6/tests/test_extraction.py +72 -0
- invoice2data-0.4.6/tests/test_gvision.py +89 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/tests/test_invoice_template.py +55 -52
- {invoice2data-0.4.4 → invoice2data-0.4.6}/tests/test_lib.py +61 -43
- invoice2data-0.4.6/tests/test_loader.py +182 -0
- invoice2data-0.4.6/tests/test_main.py +18 -0
- invoice2data-0.4.4/MANIFEST.in +0 -4
- invoice2data-0.4.4/setup.cfg +0 -78
- invoice2data-0.4.4/setup.py +0 -17
- invoice2data-0.4.4/src/invoice2data/__init__.py +0 -1
- invoice2data-0.4.4/src/invoice2data/extract/invoice_template.py +0 -285
- invoice2data-0.4.4/src/invoice2data/extract/loader.py +0 -109
- invoice2data-0.4.4/src/invoice2data/extract/parsers/regex.py +0 -84
- invoice2data-0.4.4/src/invoice2data/extract/parsers/static.py +0 -19
- invoice2data-0.4.4/src/invoice2data/extract/plugins/__init__.py +0 -0
- invoice2data-0.4.4/src/invoice2data/extract/plugins/lines.py +0 -15
- invoice2data-0.4.4/src/invoice2data/extract/plugins/tables.py +0 -99
- invoice2data-0.4.4/src/invoice2data/extract/templates/fr/com.ovh.fr.yml +0 -18
- invoice2data-0.4.4/src/invoice2data/extract/templates/nl/nl.bunq.yml +0 -26
- invoice2data-0.4.4/src/invoice2data/input/__init__.py +0 -0
- invoice2data-0.4.4/src/invoice2data/input/gvision.py +0 -87
- invoice2data-0.4.4/src/invoice2data/input/ocrmypdf.py +0 -146
- invoice2data-0.4.4/src/invoice2data/input/pdfminer_wrapper.py +0 -56
- invoice2data-0.4.4/src/invoice2data/input/pdfplumber.py +0 -48
- invoice2data-0.4.4/src/invoice2data/input/pdftotext.py +0 -57
- invoice2data-0.4.4/src/invoice2data/input/text.py +0 -5
- invoice2data-0.4.4/src/invoice2data/main.py +0 -327
- invoice2data-0.4.4/src/invoice2data/output/__init__.py +0 -0
- invoice2data-0.4.4/src/invoice2data/output/to_csv.py +0 -60
- invoice2data-0.4.4/src/invoice2data/output/to_json.py +0 -61
- invoice2data-0.4.4/src/invoice2data/output/to_xml.py +0 -69
- invoice2data-0.4.4/src/invoice2data.egg-info/entry_points.txt +0 -2
- invoice2data-0.4.4/src/invoice2data.egg-info/requires.txt +0 -11
- invoice2data-0.4.4/tests/test_cli.py +0 -334
- invoice2data-0.4.4/tests/test_extraction.py +0 -63
- invoice2data-0.4.4/tests/test_loader.py +0 -117
- /invoice2data-0.4.4/LICENSE.txt → /invoice2data-0.4.6/LICENSE.md +0 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/au/au.com.telstra.yml +0 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.namecheap.yml +0 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.farnell.yml +0 -0
- /invoice2data-0.4.4/src/invoice2data/extract/__init__.py → /invoice2data-0.4.6/src/invoice2data/py.typed +0 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data.egg-info/dependency_links.txt +0 -0
- {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data.egg-info/top_level.txt +0 -0
|
@@ -1,39 +1,100 @@
|
|
|
1
|
-
Metadata-Version: 2.
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
2
|
Name: invoice2data
|
|
3
|
-
Version: 0.4.
|
|
3
|
+
Version: 0.4.6
|
|
4
4
|
Summary: Python parser to extract data from pdf invoice
|
|
5
|
-
Home-page: https://github.com/invoice-x/invoice2data
|
|
6
5
|
Author: Manuel Riel
|
|
7
|
-
License: MIT
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
Classifier:
|
|
14
|
-
Classifier: Operating System :: POSIX
|
|
15
|
-
Classifier: Operating System :: Unix
|
|
16
|
-
Classifier: Operating System :: Microsoft :: Windows
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: homepage, https://github.com/invoice-x/invoice2data
|
|
8
|
+
Project-URL: repository, https://github.com/invoice-x/invoice2data
|
|
9
|
+
Project-URL: documentation, https://invoice2data.readthedocs.io
|
|
10
|
+
Project-URL: Changelog, https://github.com/invoice-x/invoice2data/releases
|
|
11
|
+
Keywords: python,data-mining,accounting,invoice,pdf,parcing
|
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
13
|
Classifier: License :: OSI Approved :: MIT License
|
|
18
|
-
Classifier:
|
|
14
|
+
Classifier: Operating System :: OS Independent
|
|
19
15
|
Classifier: Programming Language :: Python :: 3.8
|
|
20
16
|
Classifier: Programming Language :: Python :: 3.9
|
|
21
17
|
Classifier: Programming Language :: Python :: 3.10
|
|
22
18
|
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
23
21
|
Classifier: Topic :: Office/Business :: Financial
|
|
24
22
|
Classifier: Topic :: Office/Business :: Financial :: Accounting
|
|
23
|
+
Classifier: Environment :: Console
|
|
24
|
+
Classifier: Intended Audience :: Financial and Insurance Industry
|
|
25
|
+
Classifier: Intended Audience :: Developers
|
|
26
|
+
Classifier: Topic :: Office/Business :: Financial :: Accounting
|
|
27
|
+
Classifier: Topic :: Office/Business :: Financial
|
|
28
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
29
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
30
|
+
Requires-Python: >=3.8
|
|
25
31
|
Description-Content-Type: text/markdown
|
|
26
|
-
|
|
27
|
-
|
|
32
|
+
License-File: LICENSE.md
|
|
33
|
+
Requires-Dist: click>=8.0.1
|
|
34
|
+
Requires-Dist: dateparser>=1.2.0
|
|
35
|
+
Requires-Dist: PyYAML>=6.0
|
|
36
|
+
Requires-Dist: regex>=2025.2.10; python_version >= "3.12"
|
|
37
|
+
Requires-Dist: regex>=2024.4.16; python_version < "3.12"
|
|
38
|
+
Provides-Extra: defusedxml
|
|
39
|
+
Requires-Dist: defusedxml==0.7.1; extra == "defusedxml"
|
|
40
|
+
Provides-Extra: dev
|
|
41
|
+
Requires-Dist: pygments==2.18.0; extra == "dev"
|
|
42
|
+
Requires-Dist: cffi==1.17.1; extra == "dev"
|
|
43
|
+
Requires-Dist: furo==2024.5.6; extra == "dev"
|
|
44
|
+
Requires-Dist: mypy==1.10.1; extra == "dev"
|
|
45
|
+
Requires-Dist: myst-parser==3.0.1; extra == "dev"
|
|
46
|
+
Requires-Dist: pre-commit==3.5.0; extra == "dev"
|
|
47
|
+
Requires-Dist: pre-commit-hooks==4.6.0; extra == "dev"
|
|
48
|
+
Requires-Dist: safety==3.2.4; extra == "dev"
|
|
49
|
+
Requires-Dist: sphinx==7.1.2; extra == "dev"
|
|
50
|
+
Requires-Dist: sphinx-autobuild==2021.3.14; extra == "dev"
|
|
51
|
+
Requires-Dist: sphinx-click==6.0.0; extra == "dev"
|
|
52
|
+
Requires-Dist: typeguard==4.3.0; extra == "dev"
|
|
53
|
+
Requires-Dist: xdoctest>=0.15.10; extra == "dev"
|
|
54
|
+
Provides-Extra: googlevision
|
|
55
|
+
Requires-Dist: google-cloud-storage==2.18.2; extra == "googlevision"
|
|
56
|
+
Requires-Dist: google-cloud-vision==3.8.1; extra == "googlevision"
|
|
57
|
+
Provides-Extra: ocr
|
|
58
|
+
Requires-Dist: ghostscript==0.7; extra == "ocr"
|
|
59
|
+
Provides-Extra: ocrmypdf
|
|
60
|
+
Requires-Dist: ocrmypdf>=14.4.0; extra == "ocrmypdf"
|
|
61
|
+
Provides-Extra: pdfminer-six
|
|
62
|
+
Requires-Dist: pdfminer-six==20231228; extra == "pdfminer-six"
|
|
63
|
+
Provides-Extra: pdfplumber
|
|
64
|
+
Requires-Dist: pdfplumber==0.11.4; extra == "pdfplumber"
|
|
65
|
+
Provides-Extra: pyyaml
|
|
66
|
+
Requires-Dist: pyyaml==6.0.2; extra == "pyyaml"
|
|
67
|
+
Dynamic: license-file
|
|
28
68
|
|
|
29
69
|
# Data extractor for PDF invoices - invoice2data
|
|
30
70
|
|
|
71
|
+
[][read the docs]
|
|
31
72
|
[](https://github.com/invoice-x/invoice2data/actions)
|
|
32
73
|
[](https://pypi.python.org/pypi/invoice2data)
|
|
33
74
|
[](https://pypi.python.org/pypi/invoice2data)
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
75
|
+
[][license]
|
|
76
|
+
[][tests]
|
|
77
|
+
[][codecov]
|
|
78
|
+
[][pre-commit]
|
|
79
|
+
[](https://github.com/astral-sh/ruff)
|
|
80
|
+
|
|
81
|
+
[pypi status]: https://pypi.org/project/invoice2data/
|
|
82
|
+
[read the docs]: https://invoice2data.readthedocs.io/
|
|
83
|
+
[tests]: https://github.com/invoice-x/invoice2data/actions?workflow=Tests
|
|
84
|
+
[codecov]: https://app.codecov.io/gh/invoice-x/invoice2data
|
|
85
|
+
[pre-commit]: https://github.com/pre-commit/pre-commit
|
|
86
|
+
[ruff badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
|
|
87
|
+
[ruff project]: https://github.com/charliermarsh/ruff
|
|
88
|
+
|
|
89
|
+
A command line tool and Python library that automates the extraction of key information from invoices to support your accounting
|
|
90
|
+
process. The library is very flexible and can be used on other types of business documents as well.
|
|
91
|
+
|
|
92
|
+
In essence, invoice2data simplifies the process of getting data from invoices by:
|
|
93
|
+
|
|
94
|
+
Automating text extraction: No more manual copying and pasting.
|
|
95
|
+
Using templates for structure: Handles different invoice layouts.
|
|
96
|
+
Providing structured output: Makes the data ready for analysis or further processing.
|
|
97
|
+
This makes it a valuable tool for businesses and developers dealing with a large volume of invoices, saving time and reducing errors associated with manual data entry.
|
|
37
98
|
|
|
38
99
|
1. extracts text from PDF files using different techniques, like
|
|
39
100
|
`pdftotext`, `text`, `ocrmypdf`, `pdfminer`, `pdfplumber` or OCR -- `tesseract`, or
|
|
@@ -59,92 +120,11 @@ Go from PDF files to this:
|
|
|
59
120
|
{'date': (2014, 8, 3), 'invoice_number': '42183017', 'amount': 4.11, 'desc': 'Invoice 42183017 from Amazon Web Services'}
|
|
60
121
|
{'date': (2015, 1, 28), 'invoice_number': '12429647', 'amount': 101.0, 'desc': 'Invoice 12429647 from Envato'}
|
|
61
122
|
|
|
62
|
-
```mermaid
|
|
63
|
-
flowchart LR
|
|
64
|
-
|
|
65
|
-
InvoiceFile[fa:fa-file-invoice Invoicefile\n\npdf\nimage\ntext] --> Input-module(Input Module\n\npdftotext\ntext\npdfminer\npdfplumber\ntesseract\ngvision)
|
|
66
|
-
|
|
67
|
-
Input-module --> |Extracted Text| C{keyword\nmatching}
|
|
68
|
-
|
|
69
|
-
Invoice-Templates[(fa:fa-file-lines Invoice Templates)] --> C{keyword\nmatching}
|
|
70
|
-
|
|
71
|
-
C --> |Extracted Text + fa:fa-file-circle-check Template| E(Template Processing\n apply options from template\nremove accents, replaces etc...)
|
|
72
|
-
|
|
73
|
-
E --> |Optimized String|Plugins&Parsers(Call plugins + parsers)
|
|
74
|
-
|
|
75
|
-
subgraph Plugins&Parsers
|
|
76
|
-
|
|
77
|
-
direction BT
|
|
78
|
-
|
|
79
|
-
tables[fa:fa-table tables] ~~~ lines[fa:fa-grip-lines lines]
|
|
80
|
-
|
|
81
|
-
lines ~~~ regex[fa:fa-code regex]
|
|
82
|
-
|
|
83
|
-
regex ~~~ static[fa:fa-check static]
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
end
|
|
88
|
-
|
|
89
|
-
Plugins&Parsers --> |output| result[result\nfa:fa-file-csv,\njson,\nXML]
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
click Invoice-Templates https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md
|
|
94
|
-
|
|
95
|
-
click result https://github.com/invoice-x/invoice2data#usage
|
|
96
|
-
|
|
97
|
-
click Input-module https://github.com/invoice-x/invoice2data#installation-of-input-modules
|
|
98
|
-
|
|
99
|
-
click E https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#options
|
|
100
|
-
|
|
101
|
-
click tables https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#tables
|
|
102
|
-
|
|
103
|
-
click lines https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#lines
|
|
104
|
-
|
|
105
|
-
click regex https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#regex
|
|
106
|
-
|
|
107
|
-
click static https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#parser-static
|
|
108
|
-
|
|
109
|
-
```
|
|
110
|
-
|
|
111
|
-
## Installation
|
|
112
|
-
|
|
113
|
-
1. Install pdftotext
|
|
114
|
-
|
|
115
|
-
If possible get the latest
|
|
116
|
-
[xpdf/poppler-utils](https://poppler.freedesktop.org/) version. It's
|
|
117
|
-
included with macOS Homebrew, Debian and Ubuntu. Without it, `pdftotext`
|
|
118
|
-
won't parse tables in PDF correctly.
|
|
119
|
-
|
|
120
|
-
2. Install `invoice2data` using pip
|
|
121
|
-
|
|
122
|
-
pip install invoice2data
|
|
123
|
-
|
|
124
|
-
### Installation of input modules
|
|
125
|
-
|
|
126
|
-
An [tesseract](https://github.com/tesseract-ocr/tessdoc/blob/main/FAQ.md#how-do-i-get-tesseract) wrapper is included in auto language mode. It will test your input files against the languages installed on your system. To use it tesseract and imagemagick needs to be installed.
|
|
127
|
-
tesseract supports multiple OCR engine modes. By default the available engine installed on the system will be used.
|
|
128
|
-
|
|
129
|
-
Languages:
|
|
130
|
-
tesseract-ocr recognize more than [100 languages](https://github.com/tesseract-ocr/tessdata)
|
|
131
|
-
For Linux users, you can often find packages that provide language packs:
|
|
132
|
-
|
|
133
|
-
```
|
|
134
|
-
# Display a list of all Tesseract language packs
|
|
135
|
-
apt-cache search tesseract-ocr
|
|
136
|
-
|
|
137
|
-
# Debian/Ubuntu users
|
|
138
|
-
apt-get install tesseract-ocr-chi-sim # Example: Install Chinese Simplified language pack
|
|
139
|
-
|
|
140
|
-
# Arch Linux users
|
|
141
|
-
pacman -S tesseract-data-eng tesseract-data-deu # Example: Install the English and German language packs
|
|
142
|
-
|
|
143
|
-
```
|
|
144
123
|
|
|
145
124
|
## Usage
|
|
146
125
|
|
|
147
126
|
Basic usage. Process PDF files and write result to CSV.
|
|
127
|
+
Please see the [Command-line Reference] for details.
|
|
148
128
|
|
|
149
129
|
- `invoice2data invoice.pdf`
|
|
150
130
|
- `invoice2data invoice.txt`
|
|
@@ -158,7 +138,7 @@ Choose any of the following input readers:
|
|
|
158
138
|
- pdfminer.six `invoice2data --input-reader pdfminer invoice.pdf`
|
|
159
139
|
- pdfplumber `invoice2data --input-reader pdfplumber invoice.pdf`
|
|
160
140
|
- ocrmypdf `invoice2data --input-reader ocrmypdf invoice.pdf`
|
|
161
|
-
- gvision `invoice2data --input-reader gvision invoice.pdf` (needs `GOOGLE_APPLICATION_CREDENTIALS` env var)
|
|
141
|
+
- gvision `invoice2data --input-reader gvision invoice.pdf` (needs `GOOGLE_APPLICATION_CREDENTIALS` env var and a Google Cloud Bucket name. The bucket name can be set as an argument to the function ``to_text`` or as an Environment variable named ``GOOGLE_CLOUD_BUCKET_NAME`` )
|
|
162
142
|
|
|
163
143
|
Choose any of the following output formats:
|
|
164
144
|
|
|
@@ -208,13 +188,12 @@ Using in-house templates
|
|
|
208
188
|
templates = read_templates('/path/to/your/templates/')
|
|
209
189
|
result = extract_data(filename, templates=templates)
|
|
210
190
|
|
|
211
|
-
|
|
212
191
|
## Template system
|
|
213
192
|
|
|
214
193
|
See `invoice2data/extract/templates` for existing templates. Just extend
|
|
215
194
|
the list to add your own. If deployed by a bigger organisation, there
|
|
216
195
|
should be an interface to edit templates for new suppliers. 80-20 rule.
|
|
217
|
-
For a short tutorial on how to add new templates, see [
|
|
196
|
+
For a short tutorial on how to add new templates, see [tutorial.md](../docs/tutorial.md).
|
|
218
197
|
|
|
219
198
|
Templates are based on Yaml or JSON. They define one or more keywords to find
|
|
220
199
|
the right template, one or more exclude_keywords to further narrow it down
|
|
@@ -228,6 +207,7 @@ processing.
|
|
|
228
207
|
|
|
229
208
|
Example:
|
|
230
209
|
|
|
210
|
+
````yaml
|
|
231
211
|
issuer: Amazon Web Services, Inc.
|
|
232
212
|
keywords:
|
|
233
213
|
- Amazon Web Services
|
|
@@ -251,8 +231,10 @@ Example:
|
|
|
251
231
|
line: (.*)\$(\d+\.\d+)
|
|
252
232
|
skip_line: Note
|
|
253
233
|
last_line: VAT \*\*
|
|
234
|
+
````
|
|
254
235
|
|
|
255
236
|
The lines package has multiple settings:
|
|
237
|
+
|
|
256
238
|
- start > The pattern where the lines begin. This is typically the header row of the table. This row is not included in the line matching.
|
|
257
239
|
- end > The pattern denoting where the lines end. Typically some text at the very end or immediately below the table. Also not included in the line matching.
|
|
258
240
|
- first_line > Optional. This is the primary line item for each entry.
|
|
@@ -265,20 +247,20 @@ As an alternative json templates can be used. Which are natively better supporte
|
|
|
265
247
|
|
|
266
248
|
The performance with yaml templates can be greatly increased **10x** by using [libyaml](https://github.com/yaml/libyaml)
|
|
267
249
|
It can be installed on most distributions by:
|
|
268
|
-
`sudo apt-get libyaml-dev`
|
|
269
|
-
|
|
250
|
+
`sudo apt-get install libyaml-dev`
|
|
270
251
|
|
|
271
252
|
## Development
|
|
272
253
|
|
|
273
254
|
If you are interested in improving this project, have a look at our
|
|
274
|
-
[developer guide](
|
|
255
|
+
[developer guide](../docs/contributing.md) to get you started quickly.
|
|
275
256
|
|
|
276
257
|
## Roadmap and open tasks
|
|
277
258
|
|
|
278
259
|
- integrate with online OCR?
|
|
279
260
|
- try to 'guess' parameters for new invoice formats.
|
|
280
|
-
-
|
|
281
|
-
-
|
|
261
|
+
- apply machine learning to guess new parameters / template creation
|
|
262
|
+
- Data cleanup per field
|
|
263
|
+
- advanced table parsing with [pypdf_table_extraction](https://github.com/py-pdf/pypdf_table_extraction)
|
|
282
264
|
|
|
283
265
|
## Maintainers
|
|
284
266
|
|
|
@@ -286,14 +268,22 @@ If you are interested in improving this project, have a look at our
|
|
|
286
268
|
- [Alexis de Lattre](https://github.com/alexis-via)
|
|
287
269
|
- [bosd](https://github.com/bosd)
|
|
288
270
|
|
|
289
|
-
## Contributors
|
|
271
|
+
## Contributors and Credits
|
|
272
|
+
|
|
273
|
+
- [Harshit Joshi](https://github.com/duskybomb): As Google Summer of
|
|
274
|
+
Code student.
|
|
275
|
+
- [Holger Brunn](https://github.com/hbrunn): Add support for parsing
|
|
276
|
+
invoice items.
|
|
277
|
+
|
|
278
|
+
[pypi]: https://pypi.org/
|
|
279
|
+
[file an issue]: https://github.com/invoice-x/invoice2data/issues
|
|
280
|
+
[pip]: https://pip.pypa.io/
|
|
290
281
|
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
- [Holger Brunn](https://github.com/hbrunn): Add support for parsing
|
|
294
|
-
invoice items.
|
|
282
|
+
Contributions are very welcome.
|
|
283
|
+
To learn more, see the [Contributor Guide].
|
|
295
284
|
|
|
296
285
|
## Used By
|
|
286
|
+
|
|
297
287
|
- Odoo, OCA module [account_invoice_import_invoice2data](https://github.com/OCA/edi)
|
|
298
288
|
|
|
299
289
|
## Related Projects
|
|
@@ -306,3 +296,9 @@ If you are interested in improving this project, have a look at our
|
|
|
306
296
|
(Commercial)
|
|
307
297
|
- [CVision](http://www.cvisiontech.com/library/document-automation/forms-processing/extract-data-from-invoice.html)
|
|
308
298
|
(Commercial)
|
|
299
|
+
|
|
300
|
+
<!-- github-only -->
|
|
301
|
+
|
|
302
|
+
[license]: https://invoice2data.readthedocs.io/latest/license.html
|
|
303
|
+
[contributor guide]: https://invoice2data.readthedocs.io/latest/contributing.html
|
|
304
|
+
[command-line reference]: https://invoice2data.readthedocs.io/latest/usage.html
|
|
@@ -1,11 +1,32 @@
|
|
|
1
1
|
# Data extractor for PDF invoices - invoice2data
|
|
2
2
|
|
|
3
|
+
[][read the docs]
|
|
3
4
|
[](https://github.com/invoice-x/invoice2data/actions)
|
|
4
5
|
[](https://pypi.python.org/pypi/invoice2data)
|
|
5
6
|
[](https://pypi.python.org/pypi/invoice2data)
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
7
|
+
[][license]
|
|
8
|
+
[][tests]
|
|
9
|
+
[][codecov]
|
|
10
|
+
[][pre-commit]
|
|
11
|
+
[](https://github.com/astral-sh/ruff)
|
|
12
|
+
|
|
13
|
+
[pypi status]: https://pypi.org/project/invoice2data/
|
|
14
|
+
[read the docs]: https://invoice2data.readthedocs.io/
|
|
15
|
+
[tests]: https://github.com/invoice-x/invoice2data/actions?workflow=Tests
|
|
16
|
+
[codecov]: https://app.codecov.io/gh/invoice-x/invoice2data
|
|
17
|
+
[pre-commit]: https://github.com/pre-commit/pre-commit
|
|
18
|
+
[ruff badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
|
|
19
|
+
[ruff project]: https://github.com/charliermarsh/ruff
|
|
20
|
+
|
|
21
|
+
A command line tool and Python library that automates the extraction of key information from invoices to support your accounting
|
|
22
|
+
process. The library is very flexible and can be used on other types of business documents as well.
|
|
23
|
+
|
|
24
|
+
In essence, invoice2data simplifies the process of getting data from invoices by:
|
|
25
|
+
|
|
26
|
+
Automating text extraction: No more manual copying and pasting.
|
|
27
|
+
Using templates for structure: Handles different invoice layouts.
|
|
28
|
+
Providing structured output: Makes the data ready for analysis or further processing.
|
|
29
|
+
This makes it a valuable tool for businesses and developers dealing with a large volume of invoices, saving time and reducing errors associated with manual data entry.
|
|
9
30
|
|
|
10
31
|
1. extracts text from PDF files using different techniques, like
|
|
11
32
|
`pdftotext`, `text`, `ocrmypdf`, `pdfminer`, `pdfplumber` or OCR -- `tesseract`, or
|
|
@@ -31,92 +52,11 @@ Go from PDF files to this:
|
|
|
31
52
|
{'date': (2014, 8, 3), 'invoice_number': '42183017', 'amount': 4.11, 'desc': 'Invoice 42183017 from Amazon Web Services'}
|
|
32
53
|
{'date': (2015, 1, 28), 'invoice_number': '12429647', 'amount': 101.0, 'desc': 'Invoice 12429647 from Envato'}
|
|
33
54
|
|
|
34
|
-
```mermaid
|
|
35
|
-
flowchart LR
|
|
36
|
-
|
|
37
|
-
InvoiceFile[fa:fa-file-invoice Invoicefile\n\npdf\nimage\ntext] --> Input-module(Input Module\n\npdftotext\ntext\npdfminer\npdfplumber\ntesseract\ngvision)
|
|
38
|
-
|
|
39
|
-
Input-module --> |Extracted Text| C{keyword\nmatching}
|
|
40
|
-
|
|
41
|
-
Invoice-Templates[(fa:fa-file-lines Invoice Templates)] --> C{keyword\nmatching}
|
|
42
|
-
|
|
43
|
-
C --> |Extracted Text + fa:fa-file-circle-check Template| E(Template Processing\n apply options from template\nremove accents, replaces etc...)
|
|
44
|
-
|
|
45
|
-
E --> |Optimized String|Plugins&Parsers(Call plugins + parsers)
|
|
46
|
-
|
|
47
|
-
subgraph Plugins&Parsers
|
|
48
|
-
|
|
49
|
-
direction BT
|
|
50
|
-
|
|
51
|
-
tables[fa:fa-table tables] ~~~ lines[fa:fa-grip-lines lines]
|
|
52
|
-
|
|
53
|
-
lines ~~~ regex[fa:fa-code regex]
|
|
54
|
-
|
|
55
|
-
regex ~~~ static[fa:fa-check static]
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
end
|
|
60
|
-
|
|
61
|
-
Plugins&Parsers --> |output| result[result\nfa:fa-file-csv,\njson,\nXML]
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
click Invoice-Templates https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md
|
|
66
|
-
|
|
67
|
-
click result https://github.com/invoice-x/invoice2data#usage
|
|
68
|
-
|
|
69
|
-
click Input-module https://github.com/invoice-x/invoice2data#installation-of-input-modules
|
|
70
|
-
|
|
71
|
-
click E https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#options
|
|
72
|
-
|
|
73
|
-
click tables https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#tables
|
|
74
|
-
|
|
75
|
-
click lines https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#lines
|
|
76
|
-
|
|
77
|
-
click regex https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#regex
|
|
78
|
-
|
|
79
|
-
click static https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#parser-static
|
|
80
|
-
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
## Installation
|
|
84
|
-
|
|
85
|
-
1. Install pdftotext
|
|
86
|
-
|
|
87
|
-
If possible get the latest
|
|
88
|
-
[xpdf/poppler-utils](https://poppler.freedesktop.org/) version. It's
|
|
89
|
-
included with macOS Homebrew, Debian and Ubuntu. Without it, `pdftotext`
|
|
90
|
-
won't parse tables in PDF correctly.
|
|
91
|
-
|
|
92
|
-
2. Install `invoice2data` using pip
|
|
93
|
-
|
|
94
|
-
pip install invoice2data
|
|
95
|
-
|
|
96
|
-
### Installation of input modules
|
|
97
|
-
|
|
98
|
-
An [tesseract](https://github.com/tesseract-ocr/tessdoc/blob/main/FAQ.md#how-do-i-get-tesseract) wrapper is included in auto language mode. It will test your input files against the languages installed on your system. To use it tesseract and imagemagick needs to be installed.
|
|
99
|
-
tesseract supports multiple OCR engine modes. By default the available engine installed on the system will be used.
|
|
100
|
-
|
|
101
|
-
Languages:
|
|
102
|
-
tesseract-ocr recognize more than [100 languages](https://github.com/tesseract-ocr/tessdata)
|
|
103
|
-
For Linux users, you can often find packages that provide language packs:
|
|
104
|
-
|
|
105
|
-
```
|
|
106
|
-
# Display a list of all Tesseract language packs
|
|
107
|
-
apt-cache search tesseract-ocr
|
|
108
|
-
|
|
109
|
-
# Debian/Ubuntu users
|
|
110
|
-
apt-get install tesseract-ocr-chi-sim # Example: Install Chinese Simplified language pack
|
|
111
|
-
|
|
112
|
-
# Arch Linux users
|
|
113
|
-
pacman -S tesseract-data-eng tesseract-data-deu # Example: Install the English and German language packs
|
|
114
|
-
|
|
115
|
-
```
|
|
116
55
|
|
|
117
56
|
## Usage
|
|
118
57
|
|
|
119
58
|
Basic usage. Process PDF files and write result to CSV.
|
|
59
|
+
Please see the [Command-line Reference] for details.
|
|
120
60
|
|
|
121
61
|
- `invoice2data invoice.pdf`
|
|
122
62
|
- `invoice2data invoice.txt`
|
|
@@ -130,7 +70,7 @@ Choose any of the following input readers:
|
|
|
130
70
|
- pdfminer.six `invoice2data --input-reader pdfminer invoice.pdf`
|
|
131
71
|
- pdfplumber `invoice2data --input-reader pdfplumber invoice.pdf`
|
|
132
72
|
- ocrmypdf `invoice2data --input-reader ocrmypdf invoice.pdf`
|
|
133
|
-
- gvision `invoice2data --input-reader gvision invoice.pdf` (needs `GOOGLE_APPLICATION_CREDENTIALS` env var)
|
|
73
|
+
- gvision `invoice2data --input-reader gvision invoice.pdf` (needs `GOOGLE_APPLICATION_CREDENTIALS` env var and a Google Cloud Bucket name. The bucket name can be set as an argument to the function ``to_text`` or as an Environment variable named ``GOOGLE_CLOUD_BUCKET_NAME`` )
|
|
134
74
|
|
|
135
75
|
Choose any of the following output formats:
|
|
136
76
|
|
|
@@ -180,13 +120,12 @@ Using in-house templates
|
|
|
180
120
|
templates = read_templates('/path/to/your/templates/')
|
|
181
121
|
result = extract_data(filename, templates=templates)
|
|
182
122
|
|
|
183
|
-
|
|
184
123
|
## Template system
|
|
185
124
|
|
|
186
125
|
See `invoice2data/extract/templates` for existing templates. Just extend
|
|
187
126
|
the list to add your own. If deployed by a bigger organisation, there
|
|
188
127
|
should be an interface to edit templates for new suppliers. 80-20 rule.
|
|
189
|
-
For a short tutorial on how to add new templates, see [
|
|
128
|
+
For a short tutorial on how to add new templates, see [tutorial.md](../docs/tutorial.md).
|
|
190
129
|
|
|
191
130
|
Templates are based on Yaml or JSON. They define one or more keywords to find
|
|
192
131
|
the right template, one or more exclude_keywords to further narrow it down
|
|
@@ -200,6 +139,7 @@ processing.
|
|
|
200
139
|
|
|
201
140
|
Example:
|
|
202
141
|
|
|
142
|
+
````yaml
|
|
203
143
|
issuer: Amazon Web Services, Inc.
|
|
204
144
|
keywords:
|
|
205
145
|
- Amazon Web Services
|
|
@@ -223,8 +163,10 @@ Example:
|
|
|
223
163
|
line: (.*)\$(\d+\.\d+)
|
|
224
164
|
skip_line: Note
|
|
225
165
|
last_line: VAT \*\*
|
|
166
|
+
````
|
|
226
167
|
|
|
227
168
|
The lines package has multiple settings:
|
|
169
|
+
|
|
228
170
|
- start > The pattern where the lines begin. This is typically the header row of the table. This row is not included in the line matching.
|
|
229
171
|
- end > The pattern denoting where the lines end. Typically some text at the very end or immediately below the table. Also not included in the line matching.
|
|
230
172
|
- first_line > Optional. This is the primary line item for each entry.
|
|
@@ -237,20 +179,20 @@ As an alternative json templates can be used. Which are natively better supporte
|
|
|
237
179
|
|
|
238
180
|
The performance with yaml templates can be greatly increased **10x** by using [libyaml](https://github.com/yaml/libyaml)
|
|
239
181
|
It can be installed on most distributions by:
|
|
240
|
-
`sudo apt-get libyaml-dev`
|
|
241
|
-
|
|
182
|
+
`sudo apt-get install libyaml-dev`
|
|
242
183
|
|
|
243
184
|
## Development
|
|
244
185
|
|
|
245
186
|
If you are interested in improving this project, have a look at our
|
|
246
|
-
[developer guide](
|
|
187
|
+
[developer guide](../docs/contributing.md) to get you started quickly.
|
|
247
188
|
|
|
248
189
|
## Roadmap and open tasks
|
|
249
190
|
|
|
250
191
|
- integrate with online OCR?
|
|
251
192
|
- try to 'guess' parameters for new invoice formats.
|
|
252
|
-
-
|
|
253
|
-
-
|
|
193
|
+
- apply machine learning to guess new parameters / template creation
|
|
194
|
+
- Data cleanup per field
|
|
195
|
+
- advanced table parsing with [pypdf_table_extraction](https://github.com/py-pdf/pypdf_table_extraction)
|
|
254
196
|
|
|
255
197
|
## Maintainers
|
|
256
198
|
|
|
@@ -258,14 +200,22 @@ If you are interested in improving this project, have a look at our
|
|
|
258
200
|
- [Alexis de Lattre](https://github.com/alexis-via)
|
|
259
201
|
- [bosd](https://github.com/bosd)
|
|
260
202
|
|
|
261
|
-
## Contributors
|
|
203
|
+
## Contributors and Credits
|
|
262
204
|
|
|
263
|
-
-
|
|
264
|
-
|
|
265
|
-
-
|
|
266
|
-
|
|
205
|
+
- [Harshit Joshi](https://github.com/duskybomb): As Google Summer of
|
|
206
|
+
Code student.
|
|
207
|
+
- [Holger Brunn](https://github.com/hbrunn): Add support for parsing
|
|
208
|
+
invoice items.
|
|
209
|
+
|
|
210
|
+
[pypi]: https://pypi.org/
|
|
211
|
+
[file an issue]: https://github.com/invoice-x/invoice2data/issues
|
|
212
|
+
[pip]: https://pip.pypa.io/
|
|
213
|
+
|
|
214
|
+
Contributions are very welcome.
|
|
215
|
+
To learn more, see the [Contributor Guide].
|
|
267
216
|
|
|
268
217
|
## Used By
|
|
218
|
+
|
|
269
219
|
- Odoo, OCA module [account_invoice_import_invoice2data](https://github.com/OCA/edi)
|
|
270
220
|
|
|
271
221
|
## Related Projects
|
|
@@ -278,3 +228,9 @@ If you are interested in improving this project, have a look at our
|
|
|
278
228
|
(Commercial)
|
|
279
229
|
- [CVision](http://www.cvisiontech.com/library/document-automation/forms-processing/extract-data-from-invoice.html)
|
|
280
230
|
(Commercial)
|
|
231
|
+
|
|
232
|
+
<!-- github-only -->
|
|
233
|
+
|
|
234
|
+
[license]: https://invoice2data.readthedocs.io/latest/license.html
|
|
235
|
+
[contributor guide]: https://invoice2data.readthedocs.io/latest/contributing.html
|
|
236
|
+
[command-line reference]: https://invoice2data.readthedocs.io/latest/usage.html
|