PyPI - mergeron - Versions diffs - 2024.738973.0__tar.gz → 2024.739079.10__tar.gz - Mend

mergeron 2024.738973.0tar.gz → 2024.739079.10tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of mergeron might be problematic. Click here for more details.

Files changed (42) hide show

mergeron-2024.739079.10/PKG-INFO ADDED Viewed

@@ -0,0 +1,109 @@
+Metadata-Version: 2.1
+Name: mergeron
+Version: 2024.739079.10
+Summary: Merger Policy Analysis using Python
+License: MIT
+Keywords: merger policy analysis,merger guidelines,merger screening,policy presumptions,concentration standards,upward pricing pressure,GUPPI
+Author: Murthy Kambhampaty
+Author-email: smk@capeconomics.com
+Requires-Python: >=3.12,<4.0
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: End Users/Desktop
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Programming Language :: Python :: Implementation :: CPython
+Requires-Dist: aenum (>=3.1.15,<4.0.0)
+Requires-Dist: attrs (>=23.2)
+Requires-Dist: bs4 (>=0.0.1)
+Requires-Dist: certifi (>=2023.11.17)
+Requires-Dist: google-re2 (>=1.1)
+Requires-Dist: icecream (>=2.1.0)
+Requires-Dist: jinja2 (>=3.1)
+Requires-Dist: joblib (>=1.3)
+Requires-Dist: lxml (>=5.0)
+Requires-Dist: matplotlib (>=3.8)
+Requires-Dist: mpmath (>=1.3)
+Requires-Dist: msgpack (>=1.0)
+Requires-Dist: msgpack-numpy (>=0.4)
+Requires-Dist: numpy (>=1.26,<2.0)
+Requires-Dist: openpyxl (>=3.1.2)
+Requires-Dist: pendulum (>=3.0.0)
+Requires-Dist: requests (>=2.31)
+Requires-Dist: requests-toolbelt (>=1.0.0)
+Requires-Dist: scipy (>=1.12)
+Requires-Dist: sympy (>=1.12)
+Requires-Dist: tables (>=3.8)
+Requires-Dist: types-beautifulsoup4 (>=4.11.2)
+Requires-Dist: types-requests (>=2.31.0)
+Requires-Dist: xlrd (>=2.0.1,<3.0.0)
+Requires-Dist: xlsxwriter (>=3.1)
+Description-Content-Type: text/x-rst
+mergeron: Merger Policy Analysis using Python
+=============================================
+Download and analyze merger investigations data published by the U.S. Federal Trade Commission in various reports on extended merger investigations during 1996 to 2011. Model the sets of mergers conforming to various U.S. Horizontal Merger Guidelines standards. Analyze intrinsic clearance rates and intrinsic enforcement rates under Guidelines standards using generated data with specified distributions of market shares, price-cost margins, firm counts, and prices, optionally imposing restrictions impled by statutory filing thresholds and/or Bertrand-Nash oligopoly with MNL demand.
+Intrinsic clearance and enforcement rates are distinguished from *observed* clearance and enforcement rates in that the former do not reflect the effects of screening and deterrence as do the latter.
+Introduction
+------------
+Classes for specifying concentration standards (`mergeron.core.guidelines_boundaries.ConcentrationBoundary`) and diversion-ratio standards (`mergeron.core.guidelines_boundaries.DiversionRatioBoundary`), with automatic generation of boundary (as an array of share-pairs) and area, are provided in `mergeron.core.guidelines_boundaries`. This module also includes a function for generating plots of concentation and diversion-ratio boundaries, and functions for mapping GUPPI standards to concentration (ΔHHI) standards, and vice-versa.
+Methods for generating industry data under various distributions of shares, margins, and prices are included in, `mergeron.gen.data_generation`. Shares are drawn with uniform distribution with :math:`s_1 + s_2 \leqslant 1` and an unspecified number of firms. Alternatively, shares may be drawn from the Dirichlet distribution. When drawing shares from the Dirichlet distribution, the user can specify a fixed number for firms or provide a vector of weights specifying the frequency distribution over sequential firm counts, e.g., :code:`[133, 184, 134, 52, 32, 10, 12, 4, 3]` to specify shares drawn from Dirichlet distributions with 2 to 10 pre-merger firms distributed as in data for FTC merger investigations during 1996--2003 (See, for example, Table 4.1 of `FTC, Horizontal Merger Investigations Data, Fiscal Years 1996--2003 (Revised: August 31, 2004) <"https://www.ftc.gov/sites/default/files/documents/reports/horizontal-merger-investigation-data-fiscal-years-1996-2003/040831horizmergersdata96-03.pdf>`_). The user can specify recapture rates as, "proportional", "inside-out" --- i.e., consistent with merging-firms' in-market shares and a default recapture rate) --- or "outside-in" --- i.e., purchase probabilities are drawn at random for :math:`N+1` goods, from which are derived market shares and recapture rates for the :math:`N` goods in the putative market. Documentation on specifying the sampling strategy for market shares is at `mergeron.gen.ShareSpec`. Price-cost-margins may be specified as symmetric, i.i.d., or subject to equilibrium conditions for (profit-mazimization in) Bertrand-Nash oligopoly with MNL demand (see, `mergeron.gen.PCMSpec`). Prices may be specified as symmetric or asymmetric, and in the latter case, the direction of correlation between merging firm prices, if any, can also be specified (see, `mergeron.gen.PriceSpec`). Two alternative approaches for modeling statutory filing requirements (HSR filing thresholds) are implemented (see, `mergeron.gen.SSZConstants`). The full specification of a market sample is given in a `mergeron.gen.market_sample.MarketSample` object. Data are drawn by invoking `mergeron.gen.market_sample.MarketSample.generate_sample` which adds a `data` property of class, `mergeron.gen.MarketDataSample`. Enforcement or clearance counts are computed by invoking `mergeron.gen.market_sample.MarketSample.estimate_invres_counts`, which adds an `invres_counts` property of class `mergeron.gen.UPPTestsCounts`. For fast, parallel generation of enforcement or clearance counts over large market data samples that ordinarily would exceed available limits on machine memory, the user can invoke the method `estimate_invres_counts` on a `mergeron.gen.market_sample.MarketSample` object without first invoking `generate_sample`. Note, however, that this strategy discards the market sample in the interests of conserving memory and maintaining high performance.
+Methods for printing enforcement statistics based on FTC investigations data and test data are printed to screen or rendered to LaTex files (for processing into publication-quality tables) using methods provided in `mergeron.gen.enforcement_stats`.
+Programs demonstrating the analysis and reporting facilites provided by the sub-package, `mergeron.demo`.
+This package exposes methods employed for generating random numbers with selected continuous distribution over specified parameters, and with CPU multithreading on machines with multiple virtual, logical, or physical CPU cores. To access these directly:
+.. code-block:: python
+    import mergeron.core.pseudorandom_numbers as prng
+Also included are methods for estimating confidence intervals for proportions and for contrasts (differences) in proportions. (Although coded from scratch using the source literature, the APIs implemented in the module included here are designed for consistency with the APIs in, `statsmodels.stats.proportion` from the package, `statsmodels` (https://pypi.org/project/statsmodels/).) To access these directly:
+.. code-block:: python
+    import mergeron.core.proportions_tests as prci
+A recent version of Paul Tol's python module, `tol_colors.py` is redistributed within this package. Other than re-formatting and type annotation, the `mergeron.ext.tol_colors` module is re-distributed as downloaded from, https://personal.sron.nl/~pault/data/tol_colors.py. The `tol_colors.py` module is distributed under the Standard 3-clause BSD license. To access the `mergeron.ext.tol_colors` module directly:
+.. code-block:: python
+    import mergeron.ext.tol_colors as ptc
+Documentation for this package is in the form of the API Reference. Documentation for individual functions and classes is accessible within a python shell. For example:
+.. code-block:: python
+    import mergeron.core.market_sample as market_sample
+    help(market_sample.MarketSample)
+.. image:: https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json
+   :alt: Poetry
+   :target: https://python-poetry.org/
+.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
+   :alt: Ruff
+   :target: https://github.com/astral-sh/ruff
+.. image:: https://www.mypy-lang.org/static/mypy_badge.svg
+   :alt: Checked with mypy
+   :target: https://mypy-lang.org/
+.. image:: https://img.shields.io/badge/License-MIT-yellow.svg
+   :alt: License: MIT
+   :target: https://opensource.org/licenses/MIT

mergeron-2024.739079.10/README.rst ADDED Viewed

@@ -0,0 +1,61 @@
+mergeron: Merger Policy Analysis using Python
+=============================================
+Download and analyze merger investigations data published by the U.S. Federal Trade Commission in various reports on extended merger investigations during 1996 to 2011. Model the sets of mergers conforming to various U.S. Horizontal Merger Guidelines standards. Analyze intrinsic clearance rates and intrinsic enforcement rates under Guidelines standards using generated data with specified distributions of market shares, price-cost margins, firm counts, and prices, optionally imposing restrictions impled by statutory filing thresholds and/or Bertrand-Nash oligopoly with MNL demand.
+Intrinsic clearance and enforcement rates are distinguished from *observed* clearance and enforcement rates in that the former do not reflect the effects of screening and deterrence as do the latter.
+Introduction
+------------
+Classes for specifying concentration standards (`mergeron.core.guidelines_boundaries.ConcentrationBoundary`) and diversion-ratio standards (`mergeron.core.guidelines_boundaries.DiversionRatioBoundary`), with automatic generation of boundary (as an array of share-pairs) and area, are provided in `mergeron.core.guidelines_boundaries`. This module also includes a function for generating plots of concentation and diversion-ratio boundaries, and functions for mapping GUPPI standards to concentration (ΔHHI) standards, and vice-versa.
+Methods for generating industry data under various distributions of shares, margins, and prices are included in, `mergeron.gen.data_generation`. Shares are drawn with uniform distribution with :math:`s_1 + s_2 \leqslant 1` and an unspecified number of firms. Alternatively, shares may be drawn from the Dirichlet distribution. When drawing shares from the Dirichlet distribution, the user can specify a fixed number for firms or provide a vector of weights specifying the frequency distribution over sequential firm counts, e.g., :code:`[133, 184, 134, 52, 32, 10, 12, 4, 3]` to specify shares drawn from Dirichlet distributions with 2 to 10 pre-merger firms distributed as in data for FTC merger investigations during 1996--2003 (See, for example, Table 4.1 of `FTC, Horizontal Merger Investigations Data, Fiscal Years 1996--2003 (Revised: August 31, 2004) <"https://www.ftc.gov/sites/default/files/documents/reports/horizontal-merger-investigation-data-fiscal-years-1996-2003/040831horizmergersdata96-03.pdf>`_). The user can specify recapture rates as, "proportional", "inside-out" --- i.e., consistent with merging-firms' in-market shares and a default recapture rate) --- or "outside-in" --- i.e., purchase probabilities are drawn at random for :math:`N+1` goods, from which are derived market shares and recapture rates for the :math:`N` goods in the putative market. Documentation on specifying the sampling strategy for market shares is at `mergeron.gen.ShareSpec`. Price-cost-margins may be specified as symmetric, i.i.d., or subject to equilibrium conditions for (profit-mazimization in) Bertrand-Nash oligopoly with MNL demand (see, `mergeron.gen.PCMSpec`). Prices may be specified as symmetric or asymmetric, and in the latter case, the direction of correlation between merging firm prices, if any, can also be specified (see, `mergeron.gen.PriceSpec`). Two alternative approaches for modeling statutory filing requirements (HSR filing thresholds) are implemented (see, `mergeron.gen.SSZConstants`). The full specification of a market sample is given in a `mergeron.gen.market_sample.MarketSample` object. Data are drawn by invoking `mergeron.gen.market_sample.MarketSample.generate_sample` which adds a `data` property of class, `mergeron.gen.MarketDataSample`. Enforcement or clearance counts are computed by invoking `mergeron.gen.market_sample.MarketSample.estimate_invres_counts`, which adds an `invres_counts` property of class `mergeron.gen.UPPTestsCounts`. For fast, parallel generation of enforcement or clearance counts over large market data samples that ordinarily would exceed available limits on machine memory, the user can invoke the method `estimate_invres_counts` on a `mergeron.gen.market_sample.MarketSample` object without first invoking `generate_sample`. Note, however, that this strategy discards the market sample in the interests of conserving memory and maintaining high performance.
+Methods for printing enforcement statistics based on FTC investigations data and test data are printed to screen or rendered to LaTex files (for processing into publication-quality tables) using methods provided in `mergeron.gen.enforcement_stats`.
+Programs demonstrating the analysis and reporting facilites provided by the sub-package, `mergeron.demo`.
+This package exposes methods employed for generating random numbers with selected continuous distribution over specified parameters, and with CPU multithreading on machines with multiple virtual, logical, or physical CPU cores. To access these directly:
+.. code-block:: python
+    import mergeron.core.pseudorandom_numbers as prng
+Also included are methods for estimating confidence intervals for proportions and for contrasts (differences) in proportions. (Although coded from scratch using the source literature, the APIs implemented in the module included here are designed for consistency with the APIs in, `statsmodels.stats.proportion` from the package, `statsmodels` (https://pypi.org/project/statsmodels/).) To access these directly:
+.. code-block:: python
+    import mergeron.core.proportions_tests as prci
+A recent version of Paul Tol's python module, `tol_colors.py` is redistributed within this package. Other than re-formatting and type annotation, the `mergeron.ext.tol_colors` module is re-distributed as downloaded from, https://personal.sron.nl/~pault/data/tol_colors.py. The `tol_colors.py` module is distributed under the Standard 3-clause BSD license. To access the `mergeron.ext.tol_colors` module directly:
+.. code-block:: python
+    import mergeron.ext.tol_colors as ptc
+Documentation for this package is in the form of the API Reference. Documentation for individual functions and classes is accessible within a python shell. For example:
+.. code-block:: python
+    import mergeron.core.market_sample as market_sample
+    help(market_sample.MarketSample)
+.. image:: https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json
+   :alt: Poetry
+   :target: https://python-poetry.org/
+.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
+   :alt: Ruff
+   :target: https://github.com/astral-sh/ruff
+.. image:: https://www.mypy-lang.org/static/mypy_badge.svg
+   :alt: Checked with mypy
+   :target: https://mypy-lang.org/
+.. image:: https://img.shields.io/badge/License-MIT-yellow.svg
+   :alt: License: MIT
+   :target: https://opensource.org/licenses/MIT

{mergeron-2024.738973.0 → mergeron-2024.739079.10}/pyproject.toml RENAMED Viewed

@@ -1,8 +1,9 @@
 [tool.poetry]
 name = "mergeron"
-# See ./get_version_str.py
-version = "2024.738973.0"
-description = "Analysis of standards defined in Horizontal Merger Guidelines"
+authors = ["Murthy Kambhampaty <smk@capeconomics.com>"]
+description = "Merger Policy Analysis using Python"
+readme = "README.rst"
+license = "MIT"
 keywords = [
     "merger policy analysis",
     "merger guidelines",
@@ -12,9 +13,7 @@ keywords = [
     "upward pricing pressure",
     "GUPPI",
 ]
-authors = ["Murthy Kambhampaty <smk@capeconomics.com>"]
-license = "MIT"
-readme = "README.rst"
+version = "2024.739079.10"
 # Classifiers list: https://pypi.org/classifiers/
 classifiers = [
@@ -34,9 +33,11 @@ classifiers = [
 [tool.poetry.dependencies]
 # You may need to apply the fixes in, https://github.com/python-poetry/poetry/issues/3365
 # if poetry dependency resolution appears to hang (read the page at link to the end)
+aenum = "^3.1.15"
 attrs = ">=23.2"
 bs4 = ">=0.0.1"
 google-re2 = ">=1.1"
+icecream = ">=2.1.0"
 jinja2 = ">=3.1"
 joblib = ">=1.3"
 lxml = ">=5.0"
@@ -44,31 +45,33 @@ matplotlib = ">=3.8"
 mpmath = ">=1.3"
 msgpack = ">=1.0"
 msgpack-numpy = ">=0.4"
-numpy = ">=1.26"
+numpy = ">=1.26, <2.0"
 openpyxl = ">=3.1.2"
+pendulum = ">=3.0.0"
 python = "^3.12"
 requests = ">=2.31"
 scipy = ">=1.12"
 sympy = ">=1.12"
 tables = ">=3.8"
-xlrd = ">=2.0"
 xlsxwriter = ">=3.1"
 certifi = ">=2023.11.17"
 requests-toolbelt = ">=1.0.0"
-importlib-metadata = ">=7.0.1"
+types-requests = ">=2.31.0"
+types-beautifulsoup4 = ">=4.11.2"
+xlrd = "^2.0.1"                   # Needed to read margin data
 [tool.poetry.group.dev.dependencies]
 semver = ">=3.0"
-pytest = ">=8.0"
 mypy = ">=1.8"
-ruff = ">=0.2"
+ruff = ">=0.5"
+pytest = ">=8.0"
 sphinx = ">=7.2"
 sphinx-autodoc-typehints = ">=2.0.0"
 sphinx-autoapi = ">=3.0"
 sphinx-immaterial = ">=0.11"
 pipdeptree = ">=2.15.1"
-uv = ">=0.1.11"
+types-openpyxl = ">=3.0.0"
 [build-system]
 requires = ["poetry-core"]
@@ -118,19 +121,20 @@ select = [
     "I", # isort
     "W", # pycodestyle
     # plugins:
-    "B",   # flake8-bugbear
-    "C4",  # flake8-comprehensions
-    "ICN", # flake8-import-conventions
-    "NPY", # NumPy-specific rules
-    "PIE", # flake8-pie
-    "PL",  # pylint
-    "PTH", # flake8-use-pathlib
-    "S",   # flake8-bandit
-    "SIM", # flake8-simplify
-    "TID", # flake8-tidy-imports
-    "TCH", # flake8-type-checking
-    "UP",  # pyupgrade
-    "RUF", # ruff-specific
+    "B",    # flake8-bugbear
+    "C4",   # flake8-comprehensions
+    "FURB", # refurb
+    "ICN",  # flake8-import-conventions
+    "NPY",  # NumPy-specific rules
+    "PIE",  # flake8-pie
+    "PL",   # pylint
+    "PTH",  # flake8-use-pathlib
+    "S",    # flake8-bandit
+    "SIM",  # flake8-simplify
+    "TID",  # flake8-tidy-imports
+    "TCH",  # flake8-type-checking
+    "UP",   # pyupgrade
+    "RUF",  # ruff-specific
 ]
 ignore = [
@@ -144,16 +148,7 @@ ignore = [
     # flake8-bugbear opinionated (disabled by default in flake8)
     'B904',
     'B905',
-    # flake8-executable
-    "EXE002", # file executable but no shebang present
-    # pygrep-hooks
-    "PGH003",
-    # flake8-pie
-    "PIE790", # unnecessary 'pass' statement
-    # pylint
     "PLR2004", # avoid magic values
-    # flake8-simplify
-    "SIM102", # nested 'if' statements
     # flake8-type-checking
     "TCH001", # move application import into a type-checking block
     "TCH002", # move third-party import into a type-checking block
@@ -198,7 +193,7 @@ filterwarnings = [
     "ignore::DeprecationWarning:jinja2.lexer",
     "ignore::DeprecationWarning:joblib._utils",
     "ignore::DeprecationWarning:openpyxl.packaging.core",
-    "ignore::RuntimeWarning:mergeron.gen.investigations_stats",
+    "ignore::RuntimeWarning:mergeron.gen.enforcement_stats",
     "ignore::RuntimeWarning:mergeron.core.proportions_tests",
 ]
 tmp_path_retention_policy = "failed"

{mergeron-2024.738973.0 → mergeron-2024.739079.10}/src/mergeron/__init__.py RENAMED Viewed

@@ -1,12 +1,19 @@
 from __future__ import annotations
 import enum
-from importlib.metadata import version
 from pathlib import Path
+from typing import Any
+import numpy as np
+import pendulum  # type: ignore
+from icecream import argumentToString, ic, install  # type: ignore
+from numpy.typing import NDArray
 _PKG_NAME: str = Path(__file__).parent.stem
-__version__ = version(_PKG_NAME)
+VERSION = "2024.739079.10"
+__version__ = VERSION
 DATA_DIR: Path = Path.home() / _PKG_NAME
 """
@@ -14,11 +21,26 @@ Defines a subdirectory named for this package in the user's home path.
 If the subdirectory doesn't exist, it is created on package invocation.
 """
 if not DATA_DIR.is_dir():
     DATA_DIR.mkdir(parents=False)
+np.set_printoptions(precision=18)
+def _timestamper() -> str:
+    return f"{pendulum.now().strftime("%F %T.%f")} |>  "
+@argumentToString.register(np.ndarray)  # type: ignore
+def _(_obj: NDArray[Any]) -> str:
+    return f"ndarray, shape={_obj.shape}, dtype={_obj.dtype}"
+ic.configureOutput(prefix=_timestamper, includeContext=True)
+install()
 @enum.unique
 class RECConstants(enum.StrEnum):
     """Recapture rate - derivation methods."""
@@ -38,8 +60,11 @@ class UPPAggrSelector(enum.StrEnum):
     AVG = "average"
     CPA = "cross-product-share weighted average"
     CPD = "cross-product-share weighted distance"
+    CPG = "cross-product-share weighted geometric mean"
     DIS = "symmetrically-weighted distance"
+    GMN = "geometric mean"
     MAX = "max"
     MIN = "min"
     OSA = "own-share weighted average"
     OSD = "own-share weighted distance"
+    OSG = "own-share weighted geometric mean"

mergeron-2024.739079.10/src/mergeron/core/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+from .. import VERSION  # noqa: TID252
+__version__ = VERSION

{mergeron-2024.738973.0 → mergeron-2024.739079.10}/src/mergeron/core/damodaran_margin_data.py RENAMED Viewed

@@ -7,7 +7,8 @@ Data are downloaded or reused from a local copy, on demand.
 For terms of use of Prof. Damodaran's data, please see:
 https://pages.stern.nyu.edu/~adamodar/New_Home_Page/datahistory.html
-Important caveats:
+NOTES
+-----
 Prof. Damodaran notes that the data construction may not be
 consistent from iteration to iteration. He also notes that,
@@ -32,29 +33,30 @@ price-cost margins fall in the interval :math:`[0, 1]`.
 """
+import shutil
 from collections.abc import Mapping
-from importlib.metadata import version
+from importlib import resources
 from pathlib import Path
 from types import MappingProxyType
 import msgpack  # type:ignore
 import numpy as np
-import requests
+import urllib3
 from numpy.random import PCG64DXSM, Generator, SeedSequence
 from numpy.typing import NDArray
-from requests_toolbelt.downloadutils import stream  # type: ignore
 from scipy import stats  # type: ignore
 from xlrd import open_workbook  # type: ignore
-from .. import _PKG_NAME, DATA_DIR  # noqa: TID252
-__version__ = version(_PKG_NAME)
+from .. import _PKG_NAME, DATA_DIR, VERSION  # noqa: TID252
+__version__ = VERSION
 MGNDATA_ARCHIVE_PATH = DATA_DIR / "damodaran_margin_data_dict.msgpack"
+u3pm = urllib3.PoolManager()
-def scrape_data_table(
+def mgn_data_getter(
     _table_name: str = "margin",
     *,
     data_archive_path: Path | None = None,
@@ -68,32 +70,46 @@ def scrape_data_table(
     _data_archive_path = data_archive_path or MGNDATA_ARCHIVE_PATH
     _mgn_urlstr = f"https://pages.stern.nyu.edu/~adamodar/pc/datasets/{_table_name}.xls"
-    _mgn_path = _data_archive_path.parent.joinpath(f"damodaran_{_table_name}_data.xls")
+    _mgn_path = _data_archive_path.parent / f"damodaran_{_table_name}_data.xls"
     if _data_archive_path.is_file() and not data_download_flag:
         return MappingProxyType(msgpack.unpackb(_data_archive_path.read_bytes()))
     elif _mgn_path.is_file():
         _mgn_path.unlink()
-        _data_archive_path.unlink()
-    _REQ_TIMEOUT = (9.05, 27)
-    # NYU will eventually updates its server certificate, to one signed with
-    #   "InCommon RSA Server CA 2.pem", the step below will be obsolete. In
-    #   the interim, it is necessary to provide the certificate chain to the
-    #   root CA, so that the obsolete CA certificate is validated.
-    _INCOMMON_2014_CERT_CHAIN_PATH = (
-        Path(__file__).parent / "InCommon RSA Server CA cert chain.pem"
-    )
-    try:
-        _urlopen_handle = requests.get(_mgn_urlstr, timeout=_REQ_TIMEOUT, stream=True)
-    except requests.exceptions.SSLError:
-        _urlopen_handle = requests.get(
-            _mgn_urlstr,
-            timeout=_REQ_TIMEOUT,
-            stream=True,
-            verify=str(_INCOMMON_2014_CERT_CHAIN_PATH),
-        )
+        if _data_archive_path.is_file():
+            _data_archive_path.unlink()
-    _mgn_filename = stream.stream_response_to_file(_urlopen_handle, path=_mgn_path)
+    try:
+        _chunk_size = 1024 * 1024
+        with (
+            u3pm.request("GET", _mgn_urlstr, preload_content=False) as _urlopen_handle,
+            _mgn_path.open("wb") as _mgn_file,
+        ):
+            while True:
+                _data = _urlopen_handle.read(_chunk_size)
+                if not _data:
+                    break
+                _mgn_file.write(_data)
+        print(f"Downloaded {_mgn_urlstr} to {_mgn_path}.")
+    except urllib3.exceptions.MaxRetryError as _err:
+        if isinstance(_err.__cause__, urllib3.exceptions.SSLError):
+            # Works fine with other sites secured with certificates
+            # from the Internet2 CA, such as,
+            # https://snap.stanford.edu/data/web-Stanford.txt.gz
+            print(
+                f"WARNING: Could not establish secure connection to, {_mgn_urlstr}."
+                "Using bundled copy."
+            )
+            if not _mgn_path.is_file():
+                with resources.as_file(
+                    resources.files(f"{_PKG_NAME}.data").joinpath(
+                        "damodaran_margin_data.xls"
+                    )
+                ) as _mgn_data_archive_path:
+                    shutil.copy2(_mgn_data_archive_path, _mgn_path)
+        else:
+            raise _err
     _xl_book = open_workbook(_mgn_path, ragged_rows=True, on_demand=True)
     _xl_sheet = _xl_book.sheet_by_name("Industry Averages")
@@ -123,7 +139,7 @@ def mgn_data_builder(
     _mgn_tbl_dict: Mapping[str, Mapping[str, float | int]] | None = None, /
 ) -> tuple[NDArray[np.float64], NDArray[np.float64], NDArray[np.float64]]:
     if _mgn_tbl_dict is None:
-        _mgn_tbl_dict = scrape_data_table()
+        _mgn_tbl_dict = mgn_data_getter()
     _mgn_data_wts, _mgn_data_obs = (
         _f.flatten()
@@ -169,17 +185,19 @@ def mgn_data_builder(
     )
-def resample_mgn_data(
+def mgn_data_resampler(
     _sample_size: int | tuple[int, int] = (10**6, 2),
     /,
     *,
     seed_sequence: SeedSequence | None = None,
 ) -> NDArray[np.float64]:
     """
-    Generate the specified number of draws from the empirical distribution
-    for Prof. Damodaran's margin data using the estimated Gaussian KDE.
-    Margins for firms in finance, investment, insurance, reinsurance, and REITs
-    are excluded from the sample used to estimate the Gaussian KDE.
+    Generate draws from the empirical distribution bassed on Prof. Damodaran's margin data.
+    The empirical distribution is estimated using a Gaussian KDE; the bandwidth
+    selected using Silverman's rule is narrowed to reflect that the margin data
+    are multimodal. Margins for firms in finance, investment, insurance, reinsurance, and
+    REITs are excluded from the sample used to estimate the empirical distribution.
     Parameters
     ----------
@@ -198,28 +216,24 @@ def resample_mgn_data(
     _seed_sequence = seed_sequence or SeedSequence(pool_size=8)
-    _x, _w, _ = mgn_data_builder(scrape_data_table())
+    _x, _w, _ = mgn_data_builder(mgn_data_getter())
-    _mgn_kde = stats.gaussian_kde(_x, weights=_w)
+    _mgn_kde = stats.gaussian_kde(_x, weights=_w, bw_method="silverman")
+    _mgn_kde.set_bandwidth(bw_method=_mgn_kde.factor / 3.0)
-    def _generate_draws(
-        _mgn_kde: stats.gaussian_kde, _ssz: int, _seed_seq: SeedSequence
-    ) -> NDArray[np.float64]:
-        _seed = Generator(PCG64DXSM(_seed_sequence))
-        # We enlarge the sample, then truncate to
-        # the range between [0.0, 1.0)
-        ssz_up = int(_ssz / (_mgn_kde.integrate_box_1d(0.0, 1.0) ** 2))
-        sample_1 = _mgn_kde.resample(ssz_up, seed=_seed)[0]
+    if isinstance(_sample_size, int):
         return np.array(
-            sample_1[(sample_1 >= 0.0) & (sample_1 <= 1)][:_ssz], np.float64
+            _mgn_kde.resample(_sample_size, seed=Generator(PCG64DXSM(_seed_sequence)))[
+                0
+            ]
         )
-    if isinstance(_sample_size, int):
-        return _generate_draws(_mgn_kde, _sample_size, _seed_sequence)
-    else:
+    elif isinstance(_sample_size, tuple) and len(_sample_size) == 2:
         _ssz, _num_cols = _sample_size
         _ret_array = np.empty(_sample_size, np.float64)
         for _idx, _seed_seq in enumerate(_seed_sequence.spawn(_num_cols)):
-            _ret_array[:, _idx] = _generate_draws(_mgn_kde, _ssz, _seed_seq)
+            _ret_array[:, _idx] = _mgn_kde.resample(
+                _ssz, seed=Generator(PCG64DXSM(_seed_seq))
+            )[0]
         return _ret_array
+    else:
+        raise ValueError(f"Invalid sample size: {_sample_size!r}")

mergeron 2024.738973.0__tar.gz → 2024.739079.10__tar.gz

Potentially problematic release.

mergeron 2024.738973.0tar.gz → 2024.739079.10tar.gz