invoice2data 0.4.4__tar.gz → 0.4.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (291) hide show
  1. {invoice2data-0.4.4/src/invoice2data.egg-info → invoice2data-0.4.6}/PKG-INFO +110 -114
  2. {invoice2data-0.4.4 → invoice2data-0.4.6}/README.md +54 -98
  3. invoice2data-0.4.6/pyproject.toml +199 -0
  4. invoice2data-0.4.6/setup.cfg +4 -0
  5. invoice2data-0.4.6/src/invoice2data/__init__.py +3 -0
  6. invoice2data-0.4.6/src/invoice2data/__main__.py +353 -0
  7. invoice2data-0.4.6/src/invoice2data/extract/__init__.py +1 -0
  8. invoice2data-0.4.6/src/invoice2data/extract/invoice_template.py +386 -0
  9. invoice2data-0.4.6/src/invoice2data/extract/loader.py +153 -0
  10. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/parsers/__init__.py +1 -1
  11. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/parsers/__interface__.py +1 -2
  12. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/parsers/lines.py +120 -30
  13. invoice2data-0.4.6/src/invoice2data/extract/parsers/regex.py +119 -0
  14. invoice2data-0.4.6/src/invoice2data/extract/parsers/static.py +27 -0
  15. invoice2data-0.4.6/src/invoice2data/extract/plugins/__init__.py +1 -0
  16. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/plugins/__interface__.py +1 -2
  17. invoice2data-0.4.6/src/invoice2data/extract/plugins/lines.py +31 -0
  18. invoice2data-0.4.6/src/invoice2data/extract/plugins/tables.py +221 -0
  19. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/au/au.com.opal.yml +5 -5
  20. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.accor.invest.ibis.yml +11 -11
  21. invoice2data-0.4.6/src/invoice2data/extract/templates/be/be.accor.invest.novotel.yml +72 -0
  22. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.boucherie.pochet.yml +4 -4
  23. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.cebeo.yml +4 -4
  24. invoice2data-0.4.6/src/invoice2data/extract/templates/be/be.eg_retail.yml +87 -0
  25. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.lampiris.facture-dacompte.yml +8 -8
  26. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.lampiris.factuur.yml +6 -6
  27. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.lampiris.regularisation.yml +7 -7
  28. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.melchior-vins.yml +4 -4
  29. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.proximus.yml +3 -3
  30. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.scarlet.yml +8 -8
  31. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/be/be.securex.social.yml +4 -4
  32. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/ch/ch.pcengines.yml +2 -2
  33. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.AzureInterior.yml +12 -12
  34. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.amazon.aws.yml +7 -7
  35. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.apple.yml +3 -3
  36. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.apps4rent.yml +3 -3
  37. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.binarylife.yml +3 -3
  38. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.bloomberg.yml +2 -2
  39. invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.cloudflare.yml +80 -0
  40. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.cloudns.yml +2 -2
  41. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.datadoghq.yml +2 -2
  42. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.digitalocean.yml +3 -3
  43. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.envato.yml +1 -1
  44. invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.eur.aliexpress.json +44 -0
  45. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.expressvpn.yml +2 -2
  46. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.expressvpn_prio6.yml +3 -3
  47. invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.flipkart.WSRetail.json +14 -0
  48. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.ftserussell.yml +2 -2
  49. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.github.yml +2 -2
  50. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.globalsign.yml +2 -2
  51. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.google.adwords.hk.yml +4 -4
  52. invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.hetzner.yml +90 -0
  53. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.hobohost.yml +2 -2
  54. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.jamiepro.yml +4 -4
  55. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.linode.yml +2 -2
  56. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.microsoftonline.hk-v2017.yml +3 -3
  57. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.microsoftonline.hk.yml +3 -3
  58. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.mongodb.yml +2 -2
  59. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.namesilo.yml +2 -2
  60. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.newrelic.yml +7 -7
  61. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.nl.lenovo.digitalriver.yml +12 -12
  62. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.nmmn.yml +4 -4
  63. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.nodisto.yml +2 -2
  64. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.nyse.yml +2 -2
  65. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.oyo.invoice.yml +1 -1
  66. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.packtpub.yml +2 -2
  67. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.pixartprinting.yml +2 -2
  68. invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.runbox.yml +65 -0
  69. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.sammymaystone.yml +6 -6
  70. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.scaleway.yml +3 -3
  71. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.textmaster.yml +2 -2
  72. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.tmx.yml +2 -2
  73. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.travis-ci.yml +1 -1
  74. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.twitter.de.yml +3 -3
  75. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.twitter.uk.yml +2 -2
  76. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.twitter.yml +3 -3
  77. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.upwork.yml +1 -1
  78. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.usersnap.yml +2 -2
  79. invoice2data-0.4.6/src/invoice2data/extract/templates/com/com.vultr.yml +81 -0
  80. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.amazon.yml +2 -2
  81. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.bettina-kast.yml +4 -4
  82. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.digikey.com.yml +18 -18
  83. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.hosteurope.yml +4 -4
  84. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.notebooksbilligerBillPay.yml +5 -5
  85. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.ovh.yml +2 -2
  86. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.qualityhosting.yml +5 -4
  87. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/de/de.united-domains.yml +4 -4
  88. invoice2data-0.4.6/src/invoice2data/extract/templates/es/com.mob-barcelona.caterina.yml +19 -0
  89. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/es/com.pepephone.yml +6 -3
  90. invoice2data-0.4.6/src/invoice2data/extract/templates/es/es.amazon.yml +18 -0
  91. invoice2data-0.4.6/src/invoice2data/extract/templates/es/es.digimobile.yml +19 -0
  92. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/es/es.supplies24.yml +2 -2
  93. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/co.mooncard.yml +2 -2
  94. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.adobe.ie.yml +2 -2
  95. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.akretion.fr.yml +2 -2
  96. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.amazon.aws.yml +2 -2
  97. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.ateliercopieservice.yml +2 -2
  98. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.chauffeur-prive.yml +6 -6
  99. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.coriolis.yml +2 -2
  100. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.easyjet.fr.yml +2 -2
  101. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.eaudugrandlyon.yml +2 -2
  102. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.godaddy.yml +2 -2
  103. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.google.ie.yml +2 -2
  104. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.hootsuite.yml +2 -2
  105. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.jeanbesson.yml +2 -2
  106. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.ldlc.yml +2 -2
  107. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.linkedin.yml +2 -2
  108. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.mention.yml +2 -2
  109. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.microsoft.ie.yml +2 -2
  110. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.myflyingbox.yml +2 -2
  111. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.officetimeline.yml +1 -1
  112. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.orange-business.mobile.yml +3 -3
  113. invoice2data-0.4.6/src/invoice2data/extract/templates/fr/com.ovh.fr.yml +17 -0
  114. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.rs-online.fr.yml +2 -2
  115. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.saur.yml +2 -2
  116. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.soyoustart.yml +2 -2
  117. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/com.vinci-autoroutes.yml +2 -2
  118. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/dolibarr.generique.yml +7 -7
  119. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/eu.trainline.yml +2 -2
  120. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.actn.yml +4 -4
  121. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.airfrance.yml +2 -2
  122. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.also.yml +5 -5
  123. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.amazon.yml +4 -6
  124. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.assurance-epargne-pension.yml +2 -2
  125. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.bouyguestelecom.adsl-fiber.yml +2 -2
  126. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.bouyguestelecom.mobile.yml +2 -2
  127. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.butagaz.yml +4 -4
  128. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.chronopost.yml +2 -2
  129. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.dirafi.yml +2 -2
  130. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.domaine-achat.yml +2 -2
  131. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.easytrip.yml +3 -3
  132. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.edf.entreprises.yml +2 -2
  133. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.edf.pme.yml +3 -3
  134. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.finagaz.yml +2 -2
  135. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.fountain.yml +2 -2
  136. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.free.adsl-fiber.yml +6 -3
  137. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.free.mobile.yml +3 -3
  138. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.free.mobile2.yml +3 -3
  139. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.futur.yml +2 -2
  140. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.ge-iroise.yml +2 -2
  141. invoice2data-0.4.6/src/invoice2data/extract/templates/fr/fr.google.yml +19 -0
  142. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.greffe-tc-lyon.yml +2 -2
  143. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.hiscox.yml +2 -2
  144. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.internetsatellite.yml +2 -2
  145. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.jpg.yml +2 -2
  146. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.kubii.yml +3 -3
  147. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.laposte.boutique.yml +2 -2
  148. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.laposte.coliposte.yml +2 -2
  149. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.lecab.yml +2 -2
  150. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.leroymerlin.yml +2 -2
  151. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.maaf.yml +2 -2
  152. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.mediapart.yml +2 -2
  153. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.moneo-resto.yml +2 -2
  154. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.mouser.yml +2 -2
  155. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.mycelium-roulement.yml +2 -3
  156. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.napsis.yml +2 -2
  157. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.nexity.yml +4 -4
  158. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.orange.fibre.yml +3 -3
  159. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.orange.fixedline.yml +4 -4
  160. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.prestaclic.yml +3 -3
  161. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.publicationannoncelegale.yml +6 -6
  162. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.sfr.adsl-fiber.yml +2 -2
  163. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.sfr.mobile.yml +2 -2
  164. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.sosh.yml +2 -2
  165. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.teledec.yml +2 -2
  166. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/fr.topoffice.yml +2 -2
  167. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/net.online.yml +2 -2
  168. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/fr/net.scaleway.yml +2 -2
  169. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.accor.rhine.opco hotels.json +171 -0
  170. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.action.yml +16 -16
  171. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.agrisneltank.json +66 -0
  172. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.albron.yml +16 -16
  173. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.anwb.yml +48 -0
  174. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.argos.json +65 -0
  175. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.be.coolblue.yml +7 -7
  176. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.begra.yml +10 -10
  177. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.blokker.yml +13 -14
  178. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.bouwmans.yml +85 -0
  179. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.bp.yml +59 -0
  180. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.buijtendijk.yml +104 -0
  181. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.bunq.yml +26 -0
  182. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.cpe.yml +2 -2
  183. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.esso_eg_services.yml +72 -0
  184. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.esso_eg_services_v2.yml +81 -0
  185. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.fedex.json +51 -0
  186. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.ferbox.yml +9 -10
  187. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.fletcher.yml +103 -0
  188. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.gamma.yml +16 -16
  189. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.goos.yml +8 -9
  190. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.gulf.yml +119 -0
  191. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.ipparking.paleiskwartier.yml +75 -0
  192. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.karwei.yml +15 -15
  193. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.kav.yml +10 -11
  194. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.koffiehenk.yml +9 -9
  195. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.kuwait-q8.json +84 -0
  196. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.makro.json +143 -0
  197. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.marktplaats.json +106 -0
  198. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.megekko.json +143 -0
  199. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.momentsenmore.yml +74 -0
  200. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.ns.invoice.yml +79 -0
  201. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.odido.json +116 -0
  202. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.ok.yml +87 -0
  203. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.parkmobile.yml +47 -0
  204. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.praxis.yml +14 -14
  205. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.reclameland.yml +19 -20
  206. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.saeco.philips.eluscious.yml +15 -15
  207. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.shell_nederland.yml +74 -0
  208. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.shell_schellenkens.yml +76 -0
  209. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.simpel.yml +1 -1
  210. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.tango.json +67 -0
  211. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.total_express.yml +91 -0
  212. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.total_ototol.yml +109 -0
  213. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.total_servauto_ned.json +84 -0
  214. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.transip.yml +8 -8
  215. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.tuynder.yml +11 -12
  216. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.valk.exclusief.hotel.json +198 -0
  217. invoice2data-0.4.6/src/invoice2data/extract/templates/nl/nl.valk.exclusief.restaurant.json +166 -0
  218. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.vistaprint.yml +2 -2
  219. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.vodafone.yml +1 -1
  220. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.wasco.yml +2 -2
  221. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.weid.yml +10 -11
  222. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.yezzer.yml +7 -8
  223. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.zinkunie.yml +2 -2
  224. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.bmw-fs.yml +3 -3
  225. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.insert.subiekt-gt.yml +10 -10
  226. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.insert.subiekt-nexo.yml +3 -3
  227. invoice2data-0.4.6/src/invoice2data/extract/templates/pl/pl.ksef.yml +29 -0
  228. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.orlen.yml +3 -3
  229. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.p4.yml +3 -3
  230. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/pl/pl.paypro.yml +4 -4
  231. invoice2data-0.4.6/src/invoice2data/extract/utils.py +33 -0
  232. invoice2data-0.4.6/src/invoice2data/input/__init__.py +1 -0
  233. invoice2data-0.4.6/src/invoice2data/input/gvision.py +110 -0
  234. invoice2data-0.4.6/src/invoice2data/input/ocrmypdf.py +126 -0
  235. invoice2data-0.4.6/src/invoice2data/input/pdfminer_wrapper.py +49 -0
  236. invoice2data-0.4.6/src/invoice2data/input/pdfplumber.py +66 -0
  237. invoice2data-0.4.6/src/invoice2data/input/pdftotext.py +76 -0
  238. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/input/tesseract.py +68 -53
  239. invoice2data-0.4.6/src/invoice2data/input/text.py +16 -0
  240. invoice2data-0.4.6/src/invoice2data/output/__init__.py +1 -0
  241. invoice2data-0.4.6/src/invoice2data/output/to_csv.py +55 -0
  242. invoice2data-0.4.6/src/invoice2data/output/to_json.py +61 -0
  243. invoice2data-0.4.6/src/invoice2data/output/to_xml.py +101 -0
  244. {invoice2data-0.4.4 → invoice2data-0.4.6/src/invoice2data.egg-info}/PKG-INFO +110 -114
  245. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data.egg-info/SOURCES.txt +51 -6
  246. invoice2data-0.4.6/src/invoice2data.egg-info/entry_points.txt +2 -0
  247. invoice2data-0.4.6/src/invoice2data.egg-info/requires.txt +46 -0
  248. invoice2data-0.4.6/tests/test_cli.py +444 -0
  249. invoice2data-0.4.6/tests/test_extraction.py +72 -0
  250. invoice2data-0.4.6/tests/test_gvision.py +89 -0
  251. {invoice2data-0.4.4 → invoice2data-0.4.6}/tests/test_invoice_template.py +55 -52
  252. {invoice2data-0.4.4 → invoice2data-0.4.6}/tests/test_lib.py +61 -43
  253. invoice2data-0.4.6/tests/test_loader.py +182 -0
  254. invoice2data-0.4.6/tests/test_main.py +18 -0
  255. invoice2data-0.4.4/MANIFEST.in +0 -4
  256. invoice2data-0.4.4/setup.cfg +0 -78
  257. invoice2data-0.4.4/setup.py +0 -17
  258. invoice2data-0.4.4/src/invoice2data/__init__.py +0 -1
  259. invoice2data-0.4.4/src/invoice2data/extract/invoice_template.py +0 -285
  260. invoice2data-0.4.4/src/invoice2data/extract/loader.py +0 -109
  261. invoice2data-0.4.4/src/invoice2data/extract/parsers/regex.py +0 -84
  262. invoice2data-0.4.4/src/invoice2data/extract/parsers/static.py +0 -19
  263. invoice2data-0.4.4/src/invoice2data/extract/plugins/__init__.py +0 -0
  264. invoice2data-0.4.4/src/invoice2data/extract/plugins/lines.py +0 -15
  265. invoice2data-0.4.4/src/invoice2data/extract/plugins/tables.py +0 -99
  266. invoice2data-0.4.4/src/invoice2data/extract/templates/fr/com.ovh.fr.yml +0 -18
  267. invoice2data-0.4.4/src/invoice2data/extract/templates/nl/nl.bunq.yml +0 -26
  268. invoice2data-0.4.4/src/invoice2data/input/__init__.py +0 -0
  269. invoice2data-0.4.4/src/invoice2data/input/gvision.py +0 -87
  270. invoice2data-0.4.4/src/invoice2data/input/ocrmypdf.py +0 -146
  271. invoice2data-0.4.4/src/invoice2data/input/pdfminer_wrapper.py +0 -56
  272. invoice2data-0.4.4/src/invoice2data/input/pdfplumber.py +0 -48
  273. invoice2data-0.4.4/src/invoice2data/input/pdftotext.py +0 -57
  274. invoice2data-0.4.4/src/invoice2data/input/text.py +0 -5
  275. invoice2data-0.4.4/src/invoice2data/main.py +0 -327
  276. invoice2data-0.4.4/src/invoice2data/output/__init__.py +0 -0
  277. invoice2data-0.4.4/src/invoice2data/output/to_csv.py +0 -60
  278. invoice2data-0.4.4/src/invoice2data/output/to_json.py +0 -61
  279. invoice2data-0.4.4/src/invoice2data/output/to_xml.py +0 -69
  280. invoice2data-0.4.4/src/invoice2data.egg-info/entry_points.txt +0 -2
  281. invoice2data-0.4.4/src/invoice2data.egg-info/requires.txt +0 -11
  282. invoice2data-0.4.4/tests/test_cli.py +0 -334
  283. invoice2data-0.4.4/tests/test_extraction.py +0 -63
  284. invoice2data-0.4.4/tests/test_loader.py +0 -117
  285. /invoice2data-0.4.4/LICENSE.txt → /invoice2data-0.4.6/LICENSE.md +0 -0
  286. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/au/au.com.telstra.yml +0 -0
  287. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/com/com.namecheap.yml +0 -0
  288. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data/extract/templates/nl/nl.farnell.yml +0 -0
  289. /invoice2data-0.4.4/src/invoice2data/extract/__init__.py → /invoice2data-0.4.6/src/invoice2data/py.typed +0 -0
  290. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data.egg-info/dependency_links.txt +0 -0
  291. {invoice2data-0.4.4 → invoice2data-0.4.6}/src/invoice2data.egg-info/top_level.txt +0 -0
@@ -1,39 +1,100 @@
1
- Metadata-Version: 2.1
1
+ Metadata-Version: 2.4
2
2
  Name: invoice2data
3
- Version: 0.4.4
3
+ Version: 0.4.6
4
4
  Summary: Python parser to extract data from pdf invoice
5
- Home-page: https://github.com/invoice-x/invoice2data
6
5
  Author: Manuel Riel
7
- License: MIT License
8
- Keywords: pdf,invoicing
9
- Classifier: Development Status :: 5 - Production/Stable
10
- Classifier: Environment :: MacOS X
11
- Classifier: Environment :: Console
12
- Classifier: Environment :: Win32 (MS Windows)
13
- Classifier: Operating System :: MacOS
14
- Classifier: Operating System :: POSIX
15
- Classifier: Operating System :: Unix
16
- Classifier: Operating System :: Microsoft :: Windows
6
+ License: MIT
7
+ Project-URL: homepage, https://github.com/invoice-x/invoice2data
8
+ Project-URL: repository, https://github.com/invoice-x/invoice2data
9
+ Project-URL: documentation, https://invoice2data.readthedocs.io
10
+ Project-URL: Changelog, https://github.com/invoice-x/invoice2data/releases
11
+ Keywords: python,data-mining,accounting,invoice,pdf,parcing
12
+ Classifier: Programming Language :: Python :: 3
17
13
  Classifier: License :: OSI Approved :: MIT License
18
- Classifier: Programming Language :: Python :: 3.7
14
+ Classifier: Operating System :: OS Independent
19
15
  Classifier: Programming Language :: Python :: 3.8
20
16
  Classifier: Programming Language :: Python :: 3.9
21
17
  Classifier: Programming Language :: Python :: 3.10
22
18
  Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
23
21
  Classifier: Topic :: Office/Business :: Financial
24
22
  Classifier: Topic :: Office/Business :: Financial :: Accounting
23
+ Classifier: Environment :: Console
24
+ Classifier: Intended Audience :: Financial and Insurance Industry
25
+ Classifier: Intended Audience :: Developers
26
+ Classifier: Topic :: Office/Business :: Financial :: Accounting
27
+ Classifier: Topic :: Office/Business :: Financial
28
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
29
+ Classifier: Development Status :: 5 - Production/Stable
30
+ Requires-Python: >=3.8
25
31
  Description-Content-Type: text/markdown
26
- Provides-Extra: test
27
- License-File: LICENSE.txt
32
+ License-File: LICENSE.md
33
+ Requires-Dist: click>=8.0.1
34
+ Requires-Dist: dateparser>=1.2.0
35
+ Requires-Dist: PyYAML>=6.0
36
+ Requires-Dist: regex>=2025.2.10; python_version >= "3.12"
37
+ Requires-Dist: regex>=2024.4.16; python_version < "3.12"
38
+ Provides-Extra: defusedxml
39
+ Requires-Dist: defusedxml==0.7.1; extra == "defusedxml"
40
+ Provides-Extra: dev
41
+ Requires-Dist: pygments==2.18.0; extra == "dev"
42
+ Requires-Dist: cffi==1.17.1; extra == "dev"
43
+ Requires-Dist: furo==2024.5.6; extra == "dev"
44
+ Requires-Dist: mypy==1.10.1; extra == "dev"
45
+ Requires-Dist: myst-parser==3.0.1; extra == "dev"
46
+ Requires-Dist: pre-commit==3.5.0; extra == "dev"
47
+ Requires-Dist: pre-commit-hooks==4.6.0; extra == "dev"
48
+ Requires-Dist: safety==3.2.4; extra == "dev"
49
+ Requires-Dist: sphinx==7.1.2; extra == "dev"
50
+ Requires-Dist: sphinx-autobuild==2021.3.14; extra == "dev"
51
+ Requires-Dist: sphinx-click==6.0.0; extra == "dev"
52
+ Requires-Dist: typeguard==4.3.0; extra == "dev"
53
+ Requires-Dist: xdoctest>=0.15.10; extra == "dev"
54
+ Provides-Extra: googlevision
55
+ Requires-Dist: google-cloud-storage==2.18.2; extra == "googlevision"
56
+ Requires-Dist: google-cloud-vision==3.8.1; extra == "googlevision"
57
+ Provides-Extra: ocr
58
+ Requires-Dist: ghostscript==0.7; extra == "ocr"
59
+ Provides-Extra: ocrmypdf
60
+ Requires-Dist: ocrmypdf>=14.4.0; extra == "ocrmypdf"
61
+ Provides-Extra: pdfminer-six
62
+ Requires-Dist: pdfminer-six==20231228; extra == "pdfminer-six"
63
+ Provides-Extra: pdfplumber
64
+ Requires-Dist: pdfplumber==0.11.4; extra == "pdfplumber"
65
+ Provides-Extra: pyyaml
66
+ Requires-Dist: pyyaml==6.0.2; extra == "pyyaml"
67
+ Dynamic: license-file
28
68
 
29
69
  # Data extractor for PDF invoices - invoice2data
30
70
 
71
+ [![Read the documentation at https://invoice2data.readthedocs.io/](https://img.shields.io/readthedocs/invoice2data/latest.svg?label=Read%20the%20Docs)][read the docs]
31
72
  [![invoice2data build status on GitHub Actions](https://github.com/invoice-x/invoice2data/workflows/Test/badge.svg)](https://github.com/invoice-x/invoice2data/actions)
32
73
  [![Version](https://img.shields.io/pypi/v/invoice2data.svg)](https://pypi.python.org/pypi/invoice2data)
33
74
  [![Support Python versions](https://img.shields.io/pypi/pyversions/invoice2data.svg)](https://pypi.python.org/pypi/invoice2data)
34
-
35
- A command line tool and Python library to support your accounting
36
- process.
75
+ [![License](https://img.shields.io/pypi/l/invoice2data)][license]
76
+ [![Tests](https://github.com/invoice-x/invoice2data/workflows/Tests/badge.svg)][tests]
77
+ [![Codecov](https://codecov.io/gh/invoice-x/invoice2data/branch/main/graph/badge.svg)][codecov]
78
+ [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)][pre-commit]
79
+ [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
80
+
81
+ [pypi status]: https://pypi.org/project/invoice2data/
82
+ [read the docs]: https://invoice2data.readthedocs.io/
83
+ [tests]: https://github.com/invoice-x/invoice2data/actions?workflow=Tests
84
+ [codecov]: https://app.codecov.io/gh/invoice-x/invoice2data
85
+ [pre-commit]: https://github.com/pre-commit/pre-commit
86
+ [ruff badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
87
+ [ruff project]: https://github.com/charliermarsh/ruff
88
+
89
+ A command line tool and Python library that automates the extraction of key information from invoices to support your accounting
90
+ process. The library is very flexible and can be used on other types of business documents as well.
91
+
92
+ In essence, invoice2data simplifies the process of getting data from invoices by:
93
+
94
+ Automating text extraction: No more manual copying and pasting.
95
+ Using templates for structure: Handles different invoice layouts.
96
+ Providing structured output: Makes the data ready for analysis or further processing.
97
+ This makes it a valuable tool for businesses and developers dealing with a large volume of invoices, saving time and reducing errors associated with manual data entry.
37
98
 
38
99
  1. extracts text from PDF files using different techniques, like
39
100
  `pdftotext`, `text`, `ocrmypdf`, `pdfminer`, `pdfplumber` or OCR -- `tesseract`, or
@@ -59,92 +120,11 @@ Go from PDF files to this:
59
120
  {'date': (2014, 8, 3), 'invoice_number': '42183017', 'amount': 4.11, 'desc': 'Invoice 42183017 from Amazon Web Services'}
60
121
  {'date': (2015, 1, 28), 'invoice_number': '12429647', 'amount': 101.0, 'desc': 'Invoice 12429647 from Envato'}
61
122
 
62
- ```mermaid
63
- flowchart LR
64
-
65
- InvoiceFile[fa:fa-file-invoice Invoicefile\n\npdf\nimage\ntext] --> Input-module(Input Module\n\npdftotext\ntext\npdfminer\npdfplumber\ntesseract\ngvision)
66
-
67
- Input-module --> |Extracted Text| C{keyword\nmatching}
68
-
69
- Invoice-Templates[(fa:fa-file-lines Invoice Templates)] --> C{keyword\nmatching}
70
-
71
- C --> |Extracted Text + fa:fa-file-circle-check Template| E(Template Processing\n apply options from template\nremove accents, replaces etc...)
72
-
73
- E --> |Optimized String|Plugins&Parsers(Call plugins + parsers)
74
-
75
- subgraph Plugins&Parsers
76
-
77
- direction BT
78
-
79
- tables[fa:fa-table tables] ~~~ lines[fa:fa-grip-lines lines]
80
-
81
- lines ~~~ regex[fa:fa-code regex]
82
-
83
- regex ~~~ static[fa:fa-check static]
84
-
85
-
86
-
87
- end
88
-
89
- Plugins&Parsers --> |output| result[result\nfa:fa-file-csv,\njson,\nXML]
90
-
91
-
92
-
93
- click Invoice-Templates https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md
94
-
95
- click result https://github.com/invoice-x/invoice2data#usage
96
-
97
- click Input-module https://github.com/invoice-x/invoice2data#installation-of-input-modules
98
-
99
- click E https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#options
100
-
101
- click tables https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#tables
102
-
103
- click lines https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#lines
104
-
105
- click regex https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#regex
106
-
107
- click static https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#parser-static
108
-
109
- ```
110
-
111
- ## Installation
112
-
113
- 1. Install pdftotext
114
-
115
- If possible get the latest
116
- [xpdf/poppler-utils](https://poppler.freedesktop.org/) version. It's
117
- included with macOS Homebrew, Debian and Ubuntu. Without it, `pdftotext`
118
- won't parse tables in PDF correctly.
119
-
120
- 2. Install `invoice2data` using pip
121
-
122
- pip install invoice2data
123
-
124
- ### Installation of input modules
125
-
126
- An [tesseract](https://github.com/tesseract-ocr/tessdoc/blob/main/FAQ.md#how-do-i-get-tesseract) wrapper is included in auto language mode. It will test your input files against the languages installed on your system. To use it tesseract and imagemagick needs to be installed.
127
- tesseract supports multiple OCR engine modes. By default the available engine installed on the system will be used.
128
-
129
- Languages:
130
- tesseract-ocr recognize more than [100 languages](https://github.com/tesseract-ocr/tessdata)
131
- For Linux users, you can often find packages that provide language packs:
132
-
133
- ```
134
- # Display a list of all Tesseract language packs
135
- apt-cache search tesseract-ocr
136
-
137
- # Debian/Ubuntu users
138
- apt-get install tesseract-ocr-chi-sim # Example: Install Chinese Simplified language pack
139
-
140
- # Arch Linux users
141
- pacman -S tesseract-data-eng tesseract-data-deu # Example: Install the English and German language packs
142
-
143
- ```
144
123
 
145
124
  ## Usage
146
125
 
147
126
  Basic usage. Process PDF files and write result to CSV.
127
+ Please see the [Command-line Reference] for details.
148
128
 
149
129
  - `invoice2data invoice.pdf`
150
130
  - `invoice2data invoice.txt`
@@ -158,7 +138,7 @@ Choose any of the following input readers:
158
138
  - pdfminer.six `invoice2data --input-reader pdfminer invoice.pdf`
159
139
  - pdfplumber `invoice2data --input-reader pdfplumber invoice.pdf`
160
140
  - ocrmypdf `invoice2data --input-reader ocrmypdf invoice.pdf`
161
- - gvision `invoice2data --input-reader gvision invoice.pdf` (needs `GOOGLE_APPLICATION_CREDENTIALS` env var)
141
+ - gvision `invoice2data --input-reader gvision invoice.pdf` (needs `GOOGLE_APPLICATION_CREDENTIALS` env var and a Google Cloud Bucket name. The bucket name can be set as an argument to the function ``to_text`` or as an Environment variable named ``GOOGLE_CLOUD_BUCKET_NAME`` )
162
142
 
163
143
  Choose any of the following output formats:
164
144
 
@@ -208,13 +188,12 @@ Using in-house templates
208
188
  templates = read_templates('/path/to/your/templates/')
209
189
  result = extract_data(filename, templates=templates)
210
190
 
211
-
212
191
  ## Template system
213
192
 
214
193
  See `invoice2data/extract/templates` for existing templates. Just extend
215
194
  the list to add your own. If deployed by a bigger organisation, there
216
195
  should be an interface to edit templates for new suppliers. 80-20 rule.
217
- For a short tutorial on how to add new templates, see [TUTORIAL.md](TUTORIAL.md).
196
+ For a short tutorial on how to add new templates, see [tutorial.md](../docs/tutorial.md).
218
197
 
219
198
  Templates are based on Yaml or JSON. They define one or more keywords to find
220
199
  the right template, one or more exclude_keywords to further narrow it down
@@ -228,6 +207,7 @@ processing.
228
207
 
229
208
  Example:
230
209
 
210
+ ````yaml
231
211
  issuer: Amazon Web Services, Inc.
232
212
  keywords:
233
213
  - Amazon Web Services
@@ -251,8 +231,10 @@ Example:
251
231
  line: (.*)\$(\d+\.\d+)
252
232
  skip_line: Note
253
233
  last_line: VAT \*\*
234
+ ````
254
235
 
255
236
  The lines package has multiple settings:
237
+
256
238
  - start > The pattern where the lines begin. This is typically the header row of the table. This row is not included in the line matching.
257
239
  - end > The pattern denoting where the lines end. Typically some text at the very end or immediately below the table. Also not included in the line matching.
258
240
  - first_line > Optional. This is the primary line item for each entry.
@@ -265,20 +247,20 @@ As an alternative json templates can be used. Which are natively better supporte
265
247
 
266
248
  The performance with yaml templates can be greatly increased **10x** by using [libyaml](https://github.com/yaml/libyaml)
267
249
  It can be installed on most distributions by:
268
- `sudo apt-get libyaml-dev`
269
-
250
+ `sudo apt-get install libyaml-dev`
270
251
 
271
252
  ## Development
272
253
 
273
254
  If you are interested in improving this project, have a look at our
274
- [developer guide](DEVELOP.md) to get you started quickly.
255
+ [developer guide](../docs/contributing.md) to get you started quickly.
275
256
 
276
257
  ## Roadmap and open tasks
277
258
 
278
259
  - integrate with online OCR?
279
260
  - try to 'guess' parameters for new invoice formats.
280
- - can apply machine learning to guess new parameters?
281
- - advanced table parsing with [camelot](https://github.com/camelot-dev/camelot)
261
+ - apply machine learning to guess new parameters / template creation
262
+ - Data cleanup per field
263
+ - advanced table parsing with [pypdf_table_extraction](https://github.com/py-pdf/pypdf_table_extraction)
282
264
 
283
265
  ## Maintainers
284
266
 
@@ -286,14 +268,22 @@ If you are interested in improving this project, have a look at our
286
268
  - [Alexis de Lattre](https://github.com/alexis-via)
287
269
  - [bosd](https://github.com/bosd)
288
270
 
289
- ## Contributors
271
+ ## Contributors and Credits
272
+
273
+ - [Harshit Joshi](https://github.com/duskybomb): As Google Summer of
274
+ Code student.
275
+ - [Holger Brunn](https://github.com/hbrunn): Add support for parsing
276
+ invoice items.
277
+
278
+ [pypi]: https://pypi.org/
279
+ [file an issue]: https://github.com/invoice-x/invoice2data/issues
280
+ [pip]: https://pip.pypa.io/
290
281
 
291
- - [Harshit Joshi](https://github.com/duskybomb): As Google Summer of
292
- Code student.
293
- - [Holger Brunn](https://github.com/hbrunn): Add support for parsing
294
- invoice items.
282
+ Contributions are very welcome.
283
+ To learn more, see the [Contributor Guide].
295
284
 
296
285
  ## Used By
286
+
297
287
  - Odoo, OCA module [account_invoice_import_invoice2data](https://github.com/OCA/edi)
298
288
 
299
289
  ## Related Projects
@@ -306,3 +296,9 @@ If you are interested in improving this project, have a look at our
306
296
  (Commercial)
307
297
  - [CVision](http://www.cvisiontech.com/library/document-automation/forms-processing/extract-data-from-invoice.html)
308
298
  (Commercial)
299
+
300
+ <!-- github-only -->
301
+
302
+ [license]: https://invoice2data.readthedocs.io/latest/license.html
303
+ [contributor guide]: https://invoice2data.readthedocs.io/latest/contributing.html
304
+ [command-line reference]: https://invoice2data.readthedocs.io/latest/usage.html
@@ -1,11 +1,32 @@
1
1
  # Data extractor for PDF invoices - invoice2data
2
2
 
3
+ [![Read the documentation at https://invoice2data.readthedocs.io/](https://img.shields.io/readthedocs/invoice2data/latest.svg?label=Read%20the%20Docs)][read the docs]
3
4
  [![invoice2data build status on GitHub Actions](https://github.com/invoice-x/invoice2data/workflows/Test/badge.svg)](https://github.com/invoice-x/invoice2data/actions)
4
5
  [![Version](https://img.shields.io/pypi/v/invoice2data.svg)](https://pypi.python.org/pypi/invoice2data)
5
6
  [![Support Python versions](https://img.shields.io/pypi/pyversions/invoice2data.svg)](https://pypi.python.org/pypi/invoice2data)
6
-
7
- A command line tool and Python library to support your accounting
8
- process.
7
+ [![License](https://img.shields.io/pypi/l/invoice2data)][license]
8
+ [![Tests](https://github.com/invoice-x/invoice2data/workflows/Tests/badge.svg)][tests]
9
+ [![Codecov](https://codecov.io/gh/invoice-x/invoice2data/branch/main/graph/badge.svg)][codecov]
10
+ [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)][pre-commit]
11
+ [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
12
+
13
+ [pypi status]: https://pypi.org/project/invoice2data/
14
+ [read the docs]: https://invoice2data.readthedocs.io/
15
+ [tests]: https://github.com/invoice-x/invoice2data/actions?workflow=Tests
16
+ [codecov]: https://app.codecov.io/gh/invoice-x/invoice2data
17
+ [pre-commit]: https://github.com/pre-commit/pre-commit
18
+ [ruff badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
19
+ [ruff project]: https://github.com/charliermarsh/ruff
20
+
21
+ A command line tool and Python library that automates the extraction of key information from invoices to support your accounting
22
+ process. The library is very flexible and can be used on other types of business documents as well.
23
+
24
+ In essence, invoice2data simplifies the process of getting data from invoices by:
25
+
26
+ Automating text extraction: No more manual copying and pasting.
27
+ Using templates for structure: Handles different invoice layouts.
28
+ Providing structured output: Makes the data ready for analysis or further processing.
29
+ This makes it a valuable tool for businesses and developers dealing with a large volume of invoices, saving time and reducing errors associated with manual data entry.
9
30
 
10
31
  1. extracts text from PDF files using different techniques, like
11
32
  `pdftotext`, `text`, `ocrmypdf`, `pdfminer`, `pdfplumber` or OCR -- `tesseract`, or
@@ -31,92 +52,11 @@ Go from PDF files to this:
31
52
  {'date': (2014, 8, 3), 'invoice_number': '42183017', 'amount': 4.11, 'desc': 'Invoice 42183017 from Amazon Web Services'}
32
53
  {'date': (2015, 1, 28), 'invoice_number': '12429647', 'amount': 101.0, 'desc': 'Invoice 12429647 from Envato'}
33
54
 
34
- ```mermaid
35
- flowchart LR
36
-
37
- InvoiceFile[fa:fa-file-invoice Invoicefile\n\npdf\nimage\ntext] --> Input-module(Input Module\n\npdftotext\ntext\npdfminer\npdfplumber\ntesseract\ngvision)
38
-
39
- Input-module --> |Extracted Text| C{keyword\nmatching}
40
-
41
- Invoice-Templates[(fa:fa-file-lines Invoice Templates)] --> C{keyword\nmatching}
42
-
43
- C --> |Extracted Text + fa:fa-file-circle-check Template| E(Template Processing\n apply options from template\nremove accents, replaces etc...)
44
-
45
- E --> |Optimized String|Plugins&Parsers(Call plugins + parsers)
46
-
47
- subgraph Plugins&Parsers
48
-
49
- direction BT
50
-
51
- tables[fa:fa-table tables] ~~~ lines[fa:fa-grip-lines lines]
52
-
53
- lines ~~~ regex[fa:fa-code regex]
54
-
55
- regex ~~~ static[fa:fa-check static]
56
-
57
-
58
-
59
- end
60
-
61
- Plugins&Parsers --> |output| result[result\nfa:fa-file-csv,\njson,\nXML]
62
-
63
-
64
-
65
- click Invoice-Templates https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md
66
-
67
- click result https://github.com/invoice-x/invoice2data#usage
68
-
69
- click Input-module https://github.com/invoice-x/invoice2data#installation-of-input-modules
70
-
71
- click E https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#options
72
-
73
- click tables https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#tables
74
-
75
- click lines https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#lines
76
-
77
- click regex https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#regex
78
-
79
- click static https://github.com/invoice-x/invoice2data/blob/master/TUTORIAL.md#parser-static
80
-
81
- ```
82
-
83
- ## Installation
84
-
85
- 1. Install pdftotext
86
-
87
- If possible get the latest
88
- [xpdf/poppler-utils](https://poppler.freedesktop.org/) version. It's
89
- included with macOS Homebrew, Debian and Ubuntu. Without it, `pdftotext`
90
- won't parse tables in PDF correctly.
91
-
92
- 2. Install `invoice2data` using pip
93
-
94
- pip install invoice2data
95
-
96
- ### Installation of input modules
97
-
98
- An [tesseract](https://github.com/tesseract-ocr/tessdoc/blob/main/FAQ.md#how-do-i-get-tesseract) wrapper is included in auto language mode. It will test your input files against the languages installed on your system. To use it tesseract and imagemagick needs to be installed.
99
- tesseract supports multiple OCR engine modes. By default the available engine installed on the system will be used.
100
-
101
- Languages:
102
- tesseract-ocr recognize more than [100 languages](https://github.com/tesseract-ocr/tessdata)
103
- For Linux users, you can often find packages that provide language packs:
104
-
105
- ```
106
- # Display a list of all Tesseract language packs
107
- apt-cache search tesseract-ocr
108
-
109
- # Debian/Ubuntu users
110
- apt-get install tesseract-ocr-chi-sim # Example: Install Chinese Simplified language pack
111
-
112
- # Arch Linux users
113
- pacman -S tesseract-data-eng tesseract-data-deu # Example: Install the English and German language packs
114
-
115
- ```
116
55
 
117
56
  ## Usage
118
57
 
119
58
  Basic usage. Process PDF files and write result to CSV.
59
+ Please see the [Command-line Reference] for details.
120
60
 
121
61
  - `invoice2data invoice.pdf`
122
62
  - `invoice2data invoice.txt`
@@ -130,7 +70,7 @@ Choose any of the following input readers:
130
70
  - pdfminer.six `invoice2data --input-reader pdfminer invoice.pdf`
131
71
  - pdfplumber `invoice2data --input-reader pdfplumber invoice.pdf`
132
72
  - ocrmypdf `invoice2data --input-reader ocrmypdf invoice.pdf`
133
- - gvision `invoice2data --input-reader gvision invoice.pdf` (needs `GOOGLE_APPLICATION_CREDENTIALS` env var)
73
+ - gvision `invoice2data --input-reader gvision invoice.pdf` (needs `GOOGLE_APPLICATION_CREDENTIALS` env var and a Google Cloud Bucket name. The bucket name can be set as an argument to the function ``to_text`` or as an Environment variable named ``GOOGLE_CLOUD_BUCKET_NAME`` )
134
74
 
135
75
  Choose any of the following output formats:
136
76
 
@@ -180,13 +120,12 @@ Using in-house templates
180
120
  templates = read_templates('/path/to/your/templates/')
181
121
  result = extract_data(filename, templates=templates)
182
122
 
183
-
184
123
  ## Template system
185
124
 
186
125
  See `invoice2data/extract/templates` for existing templates. Just extend
187
126
  the list to add your own. If deployed by a bigger organisation, there
188
127
  should be an interface to edit templates for new suppliers. 80-20 rule.
189
- For a short tutorial on how to add new templates, see [TUTORIAL.md](TUTORIAL.md).
128
+ For a short tutorial on how to add new templates, see [tutorial.md](../docs/tutorial.md).
190
129
 
191
130
  Templates are based on Yaml or JSON. They define one or more keywords to find
192
131
  the right template, one or more exclude_keywords to further narrow it down
@@ -200,6 +139,7 @@ processing.
200
139
 
201
140
  Example:
202
141
 
142
+ ````yaml
203
143
  issuer: Amazon Web Services, Inc.
204
144
  keywords:
205
145
  - Amazon Web Services
@@ -223,8 +163,10 @@ Example:
223
163
  line: (.*)\$(\d+\.\d+)
224
164
  skip_line: Note
225
165
  last_line: VAT \*\*
166
+ ````
226
167
 
227
168
  The lines package has multiple settings:
169
+
228
170
  - start > The pattern where the lines begin. This is typically the header row of the table. This row is not included in the line matching.
229
171
  - end > The pattern denoting where the lines end. Typically some text at the very end or immediately below the table. Also not included in the line matching.
230
172
  - first_line > Optional. This is the primary line item for each entry.
@@ -237,20 +179,20 @@ As an alternative json templates can be used. Which are natively better supporte
237
179
 
238
180
  The performance with yaml templates can be greatly increased **10x** by using [libyaml](https://github.com/yaml/libyaml)
239
181
  It can be installed on most distributions by:
240
- `sudo apt-get libyaml-dev`
241
-
182
+ `sudo apt-get install libyaml-dev`
242
183
 
243
184
  ## Development
244
185
 
245
186
  If you are interested in improving this project, have a look at our
246
- [developer guide](DEVELOP.md) to get you started quickly.
187
+ [developer guide](../docs/contributing.md) to get you started quickly.
247
188
 
248
189
  ## Roadmap and open tasks
249
190
 
250
191
  - integrate with online OCR?
251
192
  - try to 'guess' parameters for new invoice formats.
252
- - can apply machine learning to guess new parameters?
253
- - advanced table parsing with [camelot](https://github.com/camelot-dev/camelot)
193
+ - apply machine learning to guess new parameters / template creation
194
+ - Data cleanup per field
195
+ - advanced table parsing with [pypdf_table_extraction](https://github.com/py-pdf/pypdf_table_extraction)
254
196
 
255
197
  ## Maintainers
256
198
 
@@ -258,14 +200,22 @@ If you are interested in improving this project, have a look at our
258
200
  - [Alexis de Lattre](https://github.com/alexis-via)
259
201
  - [bosd](https://github.com/bosd)
260
202
 
261
- ## Contributors
203
+ ## Contributors and Credits
262
204
 
263
- - [Harshit Joshi](https://github.com/duskybomb): As Google Summer of
264
- Code student.
265
- - [Holger Brunn](https://github.com/hbrunn): Add support for parsing
266
- invoice items.
205
+ - [Harshit Joshi](https://github.com/duskybomb): As Google Summer of
206
+ Code student.
207
+ - [Holger Brunn](https://github.com/hbrunn): Add support for parsing
208
+ invoice items.
209
+
210
+ [pypi]: https://pypi.org/
211
+ [file an issue]: https://github.com/invoice-x/invoice2data/issues
212
+ [pip]: https://pip.pypa.io/
213
+
214
+ Contributions are very welcome.
215
+ To learn more, see the [Contributor Guide].
267
216
 
268
217
  ## Used By
218
+
269
219
  - Odoo, OCA module [account_invoice_import_invoice2data](https://github.com/OCA/edi)
270
220
 
271
221
  ## Related Projects
@@ -278,3 +228,9 @@ If you are interested in improving this project, have a look at our
278
228
  (Commercial)
279
229
  - [CVision](http://www.cvisiontech.com/library/document-automation/forms-processing/extract-data-from-invoice.html)
280
230
  (Commercial)
231
+
232
+ <!-- github-only -->
233
+
234
+ [license]: https://invoice2data.readthedocs.io/latest/license.html
235
+ [contributor guide]: https://invoice2data.readthedocs.io/latest/contributing.html
236
+ [command-line reference]: https://invoice2data.readthedocs.io/latest/usage.html