PyPI - renard-pipeline - Versions diffs - 0.4.0__tar.gz → 0.5.0__tar.gz - Mend

renard-pipeline 0.4.0tar.gz → 0.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of renard-pipeline might be problematic. Click here for more details.

Files changed (39) hide show

{renard_pipeline-0.4.0 → renard_pipeline-0.5.0}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,8 @@
 Metadata-Version: 2.1
 Name: renard-pipeline
-Version: 0.4.0
+Version: 0.5.0
 Summary: Relationships Extraction from NARrative Documents
+Home-page: https://github.com/CompNet/Renard
 License: GPL-3.0-only
 Author: Arthur Amalvy
 Author-email: arthur.amalvy@univ-avignon.fr
@@ -27,17 +28,21 @@ Requires-Dist: seqeval (==1.2.2)
 Requires-Dist: spacy (>=3.5.0,<4.0.0) ; extra == "spacy"
 Requires-Dist: spacy-transformers (>=1.2.1,<2.0.0) ; extra == "spacy"
 Requires-Dist: stanza (>=1.3.0,<2.0.0) ; extra == "stanza"
-Requires-Dist: tibert (>=0.3.0,<0.4.0)
+Requires-Dist: tibert (>=0.4.0,<0.5.0)
 Requires-Dist: torch (>=2.0.0,!=2.0.1)
 Requires-Dist: tqdm (>=4.62.3,<5.0.0)
 Requires-Dist: transformers (>=4.36.0,<5.0.0)
+Project-URL: Documentation, https://compnet.github.io/Renard/
+Project-URL: Repository, https://github.com/CompNet/Renard
 Description-Content-Type: text/markdown
 # Renard
+[![DOI](https://joss.theoj.org/papers/10.21105/joss.06574/status.svg)](https://doi.org/10.21105/joss.06574)
 Renard (Relationships Extraction from NARrative Documents) is a library for creating and using custom character networks extraction pipelines. Renard can extract dynamic as well as static character networks.
-![Character network extracted from "Pride and Prejudice"](./docs/pp_white_bg.svg)
+![The Renard logo](./docs/renard.svg)
 # Installation
@@ -46,6 +51,8 @@ You can install the latest version using pip:
 > pip install renard-pipeline
+Currently, Renard supports Python 3.8, 3.9 and 3.10.
 # Documentation
@@ -56,7 +63,32 @@ If you need local documentation, it can be generated using `Sphinx`. From the `d
 # Tutorial
-`renard_tutorial.py` is a tutorial in the `jupytext` format. You can open it as a notebook in Jupyter Notebook (or export it as a notebook with `jupytext --to ipynb renard-tutorial.py`).
+Renard's central concept is the `Pipeline`.A `Pipeline` is a list of `PipelineStep` that are run sequentially in order to extract a character graph from a document. Here is a simple example:
+```python
+from renard.pipeline import Pipeline
+from renard.pipeline.tokenization import NLTKTokenizer
+from renard.pipeline.ner import NLTKNamedEntityRecognizer
+from renard.pipeline.character_unification import GraphRulesCharacterUnifier
+from renard.pipeline.graph_extraction import CoOccurrencesGraphExtractor
+with open("./my_doc.txt") as f:
+	text = f.read()
+pipeline = Pipeline(
+	[
+		NLTKTokenizer(),
+		NLTKNamedEntityRecognizer(),
+		GraphRulesCharacterUnifier(min_appearance=10),
+		CoOccurrencesGraphExtractor(co_occurrences_dist=25)
+	]
+)
+out = pipeline(text)
+```
+For more information, see `renard_tutorial.py`, which is a tutorial in the `jupytext` format. You can open it as a notebook in Jupyter Notebook (or export it as a notebook with `jupytext --to ipynb renard-tutorial.py`).
 # Running tests
@@ -72,3 +104,25 @@ Expensive tests are disabled by default. These can be run by setting the environ
 see [the "Contributing" section of the documentation](https://compnet.github.io/Renard/contributing.html).
+# How to cite
+If you use Renard in your research project, please cite it as follows:
+```bibtex
+@Article{Amalvy2024,
+  doi	       = {10.21105/joss.06574},
+  year	       = {2024},
+  publisher    = {The Open Journal},
+  volume       = {9},
+  number       = {98},
+  pages	       = {6574},
+  author       = {Amalvy, A. and Labatut, V. and Dufour, R.},
+  title	       = {Renard: A Modular Pipeline for Extracting Character
+                  Networks from Narrative Texts},
+  journal      = {Journal of Open Source Software},
+}
+```
+We would be happy to hear about your usage of Renard, so don't hesitate to reach out!

renard_pipeline-0.5.0/README.md ADDED Viewed

@@ -0,0 +1,89 @@
+# Renard
+[![DOI](https://joss.theoj.org/papers/10.21105/joss.06574/status.svg)](https://doi.org/10.21105/joss.06574)
+Renard (Relationships Extraction from NARrative Documents) is a library for creating and using custom character networks extraction pipelines. Renard can extract dynamic as well as static character networks.
+![The Renard logo](./docs/renard.svg)
+# Installation
+You can install the latest version using pip:
+> pip install renard-pipeline
+Currently, Renard supports Python 3.8, 3.9 and 3.10.
+# Documentation
+Documentation, including installation instructions, can be found at https://compnet.github.io/Renard/
+If you need local documentation, it can be generated using `Sphinx`. From the `docs` directory, `make html` should create documentation under `docs/_build/html`.
+# Tutorial
+Renard's central concept is the `Pipeline`.A `Pipeline` is a list of `PipelineStep` that are run sequentially in order to extract a character graph from a document. Here is a simple example:
+```python
+from renard.pipeline import Pipeline
+from renard.pipeline.tokenization import NLTKTokenizer
+from renard.pipeline.ner import NLTKNamedEntityRecognizer
+from renard.pipeline.character_unification import GraphRulesCharacterUnifier
+from renard.pipeline.graph_extraction import CoOccurrencesGraphExtractor
+with open("./my_doc.txt") as f:
+	text = f.read()
+pipeline = Pipeline(
+	[
+		NLTKTokenizer(),
+		NLTKNamedEntityRecognizer(),
+		GraphRulesCharacterUnifier(min_appearance=10),
+		CoOccurrencesGraphExtractor(co_occurrences_dist=25)
+	]
+)
+out = pipeline(text)
+```
+For more information, see `renard_tutorial.py`, which is a tutorial in the `jupytext` format. You can open it as a notebook in Jupyter Notebook (or export it as a notebook with `jupytext --to ipynb renard-tutorial.py`).
+# Running tests
+`Renard` uses `pytest` for testing. To launch tests, use the following command :
+> poetry run python -m pytest tests
+Expensive tests are disabled by default. These can be run by setting the environment variable `RENARD_TEST_ALL` to `1`.
+# Contributing
+see [the "Contributing" section of the documentation](https://compnet.github.io/Renard/contributing.html).
+# How to cite
+If you use Renard in your research project, please cite it as follows:
+```bibtex
+@Article{Amalvy2024,
+  doi	       = {10.21105/joss.06574},
+  year	       = {2024},
+  publisher    = {The Open Journal},
+  volume       = {9},
+  number       = {98},
+  pages	       = {6574},
+  author       = {Amalvy, A. and Labatut, V. and Dufour, R.},
+  title	       = {Renard: A Modular Pipeline for Extracting Character
+                  Networks from Narrative Texts},
+  journal      = {Journal of Open Source Software},
+}
+```
+We would be happy to hear about your usage of Renard, so don't hesitate to reach out!

{renard_pipeline-0.4.0 → renard_pipeline-0.5.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "renard-pipeline"
-version = "0.4.0"
+version = "0.5.0"
 description = "Relationships Extraction from NARrative Documents"
 authors = ["Arthur Amalvy <arthur.amalvy@univ-avignon.fr>"]
 license = "GPL-3.0-only"
@@ -8,6 +8,9 @@ readme = "README.md"
 packages = [
     { include = "renard" }
 ]
+homepage = "https://github.com/CompNet/Renard"
+repository = "https://github.com/CompNet/Renard"
+documentation = "https://compnet.github.io/Renard/"
 [tool.poetry.dependencies]
 # optional dependencies
@@ -28,7 +31,7 @@ matplotlib = "^3.5.3"
 seqeval = "1.2.2"
 pandas = "^2.0.0"
 pytest = "^7.2.1"
-tibert = "^0.3.0"
+tibert = "^0.4.0"
 grimbert = "^0.1.0"
 datasets = "^2.16.1"

{renard_pipeline-0.4.0 → renard_pipeline-0.5.0}/renard/graph_utils.py RENAMED Viewed

@@ -70,10 +70,17 @@ def graph_with_names(
     else:
         name_style_fn = name_style
-    return nx.relabel_nodes(
-        G,
-        {character: name_style_fn(character) for character in G.nodes()},  # type: ignore
-    )
+    mapping = {}
+    for character in G.nodes():
+        # NOTE: it is *possible* to have a graph where nodes are not
+        # characters (for example, simple strings). Therefore, we are
+        # lenient here
+        try:
+            mapping[character] = name_style_fn(character)
+        except AttributeError:
+            mapping[character] = character
+    return nx.relabel_nodes(G, mapping)
 def layout_with_names(

{renard_pipeline-0.4.0 → renard_pipeline-0.5.0}/renard/ner_utils.py RENAMED Viewed

@@ -110,6 +110,10 @@ class NERDataset(Dataset):
         elt_context_mask = self._context_mask[index]
         for i in range(len(element)):
             w2t = batch.word_to_tokens(0, i)
+            # w2t can be None in case of truncation, which can happen
+            # if `element' is too long
+            if w2t is None:
+                continue
             mask_value = elt_context_mask[i]
             tokens_mask = [mask_value] * (w2t.end - w2t.start)
             batch["context_mask"][w2t.start : w2t.end] = tokens_mask

{renard_pipeline-0.4.0 → renard_pipeline-0.5.0}/renard/pipeline/character_unification.py RENAMED Viewed

@@ -61,6 +61,8 @@ def _assign_coreference_mentions(
     # we assign each chain to the character with highest name
     # occurence in it
     for chain in corefs:
+        if len(char_mentions) == 0:
+            break
         # determine the characters with the highest number of
         # occurences
         occ_counter = {}
@@ -98,8 +100,13 @@ class NaiveCharacterUnifier(PipelineStep):
             character for it to be valid
         """
         self.min_appearances = min_appearances
+        # a default value, will be est by _pipeline_init_
+        self.character_ner_tag = "PER"
         super().__init__()
+    def _pipeline_init_(self, lang: str, character_ner_tag: str, **kwargs):
+        self.character_ner_tag = character_ner_tag
     def __call__(
         self,
         text: str,
@@ -112,7 +119,7 @@ class NaiveCharacterUnifier(PipelineStep):
         :param tokens:
         :param entities:
         """
-        persons = [e for e in entities if e.tag == "PER"]
+        persons = [e for e in entities if e.tag == self.character_ner_tag]
         characters = defaultdict(list)
         for entity in persons:
@@ -159,6 +166,7 @@ class GraphRulesCharacterUnifier(PipelineStep):
         min_appearances: int = 0,
         additional_hypocorisms: Optional[List[Tuple[str, List[str]]]] = None,
         link_corefs_mentions: bool = False,
+        ignore_lone_titles: Optional[Set[str]] = None,
     ) -> None:
         """
         :param min_appearances: minimum number of appearances of a
@@ -173,20 +181,27 @@ class GraphRulesCharacterUnifier(PipelineStep):
             extract a lot of spurious links.  However, linking by
             coref is sometimes the only way to resolve a character
             alias.
+        :param ignore_lone_titles: a set of titles to ignore when
+            they stand on their own.  This avoids extracting false
+            positives characters such as 'Mr.' or 'Miss'.
         """
         self.min_appearances = min_appearances
         self.additional_hypocorisms = additional_hypocorisms
         self.link_corefs_mentions = link_corefs_mentions
+        self.ignore_lone_titles = ignore_lone_titles or set()
+        self.character_ner_tag = "PER"  # a default value, will be set by _pipeline_init
         super().__init__()
-    def _pipeline_init_(self, lang: str, progress_reporter: ProgressReporter):
+    def _pipeline_init_(self, lang: str, character_ner_tag: str, **kwargs):
         self.hypocorism_gazetteer = HypocorismGazetteer(lang=lang)
         if not self.additional_hypocorisms is None:
             for name, nicknames in self.additional_hypocorisms:
                 self.hypocorism_gazetteer._add_hypocorism_(name, nicknames)
-        return super()._pipeline_init_(lang, progress_reporter)
+        self.character_ner_tag = character_ner_tag
+        return super()._pipeline_init_(lang, **kwargs)
     def __call__(
         self,
@@ -196,12 +211,17 @@ class GraphRulesCharacterUnifier(PipelineStep):
     ) -> Dict[str, Any]:
         import networkx as nx
-        mentions = [m for m in entities if m.tag == "PER"]
-        mentions_str = [" ".join(m.tokens) for m in mentions]
+        mentions = [m for m in entities if m.tag == self.character_ner_tag]
+        mentions_str = set(
+            filter(
+                lambda m: not m in self.ignore_lone_titles,
+                map(lambda m: " ".join(m.tokens), mentions),
+            )
+        )
         # * create a graph where each node is a mention detected by NER
         G = nx.Graph()
-        for mention_str in set(mentions_str):
+        for mention_str in mentions_str:
             G.add_node(mention_str)
         # * HumanName local configuration - dependant on language

{renard_pipeline-0.4.0 → renard_pipeline-0.5.0}/renard/pipeline/characters_extraction.py RENAMED Viewed

@@ -1,7 +1,9 @@
+import sys
 import renard.pipeline.character_unification as cu
 print(
-    "[warning] the characters_extraction module is deprecated. Use character_unification instead."
+    "[warning] the characters_extraction module is deprecated. Use character_unification instead.",
+    file=sys.stderr,
 )
 Character = cu.Character

{renard_pipeline-0.4.0 → renard_pipeline-0.5.0}/renard/pipeline/core.py RENAMED Viewed

@@ -50,6 +50,13 @@ class Mention:
         self_dict["end_idx"] = self.end_idx + shift
         return self.__class__(**self_dict)
+    def __eq__(self, other: Mention) -> bool:
+        return (
+            self.tokens == other.tokens
+            and self.start_idx == other.start_idx
+            and self.end_idx == other.end_idx
+        )
     def __hash__(self) -> int:
         return hash(tuple(self.tokens) + (self.start_idx, self.end_idx))
@@ -72,11 +79,18 @@ class PipelineStep:
         """Initialize the :class:`PipelineStep` with a given configuration."""
         pass
-    def _pipeline_init_(self, lang: str, progress_reporter: ProgressReporter):
-        """Set the step configuration that is common to the whole pipeline.
+    def _pipeline_init_(
+        self, lang: str, progress_reporter: ProgressReporter, **kwargs
+    ) -> Optional[Dict[Pipeline.PipelineParameter, Any]]:
+        """Set the step configuration that is common to the whole
+        pipeline.
-        :param lang: ISO 639-3 language string
-        :param progress_report:
+        :param lang: the lang of the whole pipeline
+        :param progress_reporter:
+        :param kwargs: additional pipeline parameters.
+        :return: a step can return a dictionary of pipeline params if
+                 it wish to modify some of these.
         """
         supported_langs = self.supported_langs()
         if not supported_langs == "any" and not lang in supported_langs:
@@ -143,13 +157,14 @@ class PipelineState:
     #: input text
     text: Optional[str]
-    #: text split into chapters
-    chapters: Optional[List[str]] = None
+    #: text split into blocks of texts. When dynamic blocks are given,
+    #: the final network is dynamic, and split according to blocks.
+    dynamic_blocks: Optional[List[Tuple[int, int]]] = None
     #: text splitted in tokens
     tokens: Optional[List[str]] = None
-    #: text splitted in tokens, by chapter
-    chapter_tokens: Optional[List[List[str]]] = None
+    #: mapping from a character to its corresponding token
+    char2token: Optional[List[int]] = None
     #: text splitted into sentences, each sentence being a list of
     #: tokens
     sentences: Optional[List[List[str]]] = None
@@ -175,14 +190,12 @@ class PipelineState:
     #: network)
     character_network: Optional[Union[List[nx.Graph], nx.Graph]] = None
+    # aliases of self.character_network
     def get_characters_graph(self) -> Optional[Union[List[nx.Graph], nx.Graph]]:
-        print(
-            "[warning] the characters_graph attribute is deprecated, use character_network instead",
-            file=sys.stderr,
-        )
         return self.character_network
     characters_graph = property(get_characters_graph)
+    character_graph = property(get_characters_graph)
     def get_character(
         self, name: str, partial_match: bool = True
@@ -273,6 +286,9 @@ class PipelineState:
         cumulative: bool = False,
         stable_layout: bool = False,
         layout: Optional[CharactersGraphLayout] = None,
+        node_kwargs: Optional[List[Dict[str, Any]]] = None,
+        edge_kwargs: Optional[List[Dict[str, Any]]] = None,
+        label_kwargs: Optional[List[Dict[str, Any]]] = None,
     ):
         """Plot ``self.character_graph`` using reasonable default
         parameters, and save the produced figures in the specified
@@ -287,6 +303,9 @@ class PipelineState:
             timestep.  Characters' positions are based on the final
             cumulative graph layout.
         :param layout: pre-computed graph layout
+        :param node_kwargs: passed to :func:`nx.draw_networkx_nodes`
+        :param edge_kwargs: passed to :func:`nx.draw_networkx_nodes`
+        :param label_kwargs: passed to :func:`nx.draw_networkx_labels`
         """
         import matplotlib.pyplot as plt
@@ -310,13 +329,24 @@ class PipelineState:
             )
             layout = layout_nx_graph_reasonably(layout_graph)
+        node_kwargs = node_kwargs or [{} for _ in range(len(self.character_network))]
+        edge_kwargs = edge_kwargs or [{} for _ in range(len(self.character_network))]
+        label_kwargs = label_kwargs or [{} for _ in range(len(self.character_network))]
         for i, G in enumerate(graphs):
             _, ax = plt.subplots()
             local_layout = layout
             if not local_layout is None:
                 local_layout = layout_with_names(G, local_layout, name_style)
             G = graph_with_names(G, name_style=name_style)
-            plot_nx_graph_reasonably(G, ax=ax, layout=local_layout)
+            plot_nx_graph_reasonably(
+                G,
+                ax=ax,
+                layout=local_layout,
+                node_kwargs=node_kwargs[i],
+                edge_kwargs=edge_kwargs[i],
+                label_kwargs=label_kwargs[i],
+            )
             plt.savefig(f"{directory}/{i}.png")
             plt.close()
@@ -328,6 +358,9 @@ class PipelineState:
         ] = "most_frequent",
         layout: Optional[CharactersGraphLayout] = None,
         fig: Optional[plt.Figure] = None,
+        node_kwargs: Optional[Dict[str, Any]] = None,
+        edge_kwargs: Optional[Dict[str, Any]] = None,
+        label_kwargs: Optional[Dict[str, Any]] = None,
     ):
         """Plot ``self.character_graph`` using reasonable parameters,
         and save the produced figure to a file
@@ -337,6 +370,9 @@ class PipelineState:
         :param layout: pre-computed graph layout
         :param fig: if specified, this matplotlib figure will be used
             for plotting
+        :param node_kwargs: passed to :func:`nx.draw_networkx_nodes`
+        :param edge_kwargs: passed to :func:`nx.draw_networkx_nodes`
+        :param label_kwargs: passed to :func:`nx.draw_networkx_labels`
         """
         import matplotlib.pyplot as plt
@@ -354,7 +390,14 @@ class PipelineState:
             fig.set_dpi(300)
             fig.set_size_inches(24, 24)
         ax = fig.add_subplot(111)
-        plot_nx_graph_reasonably(G, ax=ax, layout=layout)
+        plot_nx_graph_reasonably(
+            G,
+            ax=ax,
+            layout=layout,
+            node_kwargs=node_kwargs,
+            edge_kwargs=edge_kwargs,
+            label_kwargs=label_kwargs,
+        )
         plt.savefig(path)
         plt.close()
@@ -368,6 +411,9 @@ class PipelineState:
         graph_start_idx: int = 1,
         stable_layout: bool = False,
         layout: Optional[CharactersGraphLayout] = None,
+        node_kwargs: Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] = None,
+        edge_kwargs: Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] = None,
+        label_kwargs: Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] = None,
     ):
         """Plot ``self.character_network`` using reasonable default
         parameters
@@ -393,6 +439,9 @@ class PipelineState:
             same position in space at each timestep.  Characters'
             positions are based on the final cumulative graph layout.
         :param layout: pre-computed graph layout
+        :param node_kwargs: passed to :func:`nx.draw_networkx_nodes`
+        :param edge_kwargs: passed to :func:`nx.draw_networkx_nodes`
+        :param label_kwargs: passed to :func:`nx.draw_networkx_labels`
         """
         import matplotlib.pyplot as plt
         from matplotlib.widgets import Slider
@@ -411,13 +460,30 @@ class PipelineState:
                 fig.set_dpi(300)
                 fig.set_size_inches(24, 24)
             ax = fig.add_subplot(111)
-            plot_nx_graph_reasonably(G, ax=ax, layout=layout)
+            assert not isinstance(node_kwargs, list)
+            assert not isinstance(edge_kwargs, list)
+            assert not isinstance(label_kwargs, list)
+            plot_nx_graph_reasonably(
+                G,
+                ax=ax,
+                layout=layout,
+                node_kwargs=node_kwargs,
+                edge_kwargs=edge_kwargs,
+                label_kwargs=label_kwargs,
+            )
             return
         if not isinstance(self.character_network, list):
             raise TypeError
         # self.character_network is a list: plot a dynamic graph
+        node_kwargs = node_kwargs or [{} for _ in range(len(self.character_network))]
+        assert isinstance(node_kwargs, list)
+        edge_kwargs = edge_kwargs or [{} for _ in range(len(self.character_network))]
+        assert isinstance(edge_kwargs, list)
+        label_kwargs = label_kwargs or [{} for _ in range(len(self.character_network))]
+        assert isinstance(label_kwargs, list)
         if fig is None:
             fig, ax = plt.subplots()
             assert not fig is None
@@ -433,12 +499,13 @@ class PipelineState:
         def update(slider_value):
             assert isinstance(self.character_network, list)
+            slider_i = int(slider_value) - 1
             character_networks = self.character_network
             if cumulative:
                 character_networks = cumulative_character_networks
-            G = character_networks[int(slider_value) - 1]
+            G = character_networks[slider_i]
             local_layout = layout
             if not local_layout is None:
@@ -446,7 +513,14 @@ class PipelineState:
             G = graph_with_names(G, name_style)
             ax.clear()
-            plot_nx_graph_reasonably(G, ax=ax, layout=local_layout)
+            plot_nx_graph_reasonably(
+                G,
+                ax=ax,
+                layout=local_layout,
+                node_kwargs=node_kwargs[slider_i],
+                edge_kwargs=edge_kwargs[slider_i],
+                label_kwargs=label_kwargs[slider_i],
+            )
             ax.set_xlim(-1.2, 1.2)
             ax.set_ylim(-1.2, 1.2)
@@ -467,6 +541,10 @@ class PipelineState:
 class Pipeline:
     """A flexible NLP pipeline"""
+    #: all the possible parameters of the whole pipeline, that are
+    #: shared between steps
+    PipelineParameter = Literal["lang", "progress_reporter", "character_ner_tag"]
     def __init__(
         self,
         steps: List[PipelineStep],
@@ -489,17 +567,27 @@ class Pipeline:
         self.progress_reporter = get_progress_reporter(progress_report)
         self.lang = lang
+        self.character_ner_tag = "PER"
         self.warn = warn
-    def _pipeline_init_steps(self, ignored_steps: Optional[List[str]] = None):
-        """
+    def _pipeline_init_steps_(self, ignored_steps: Optional[List[str]] = None):
+        """Initialise steps with global pipeline parameters.
         :param ignored_steps: a list of steps production.  All steps
             with a production in ``ignored_steps`` will be ignored.
         """
-        steps_progress_reporter = get_progress_reporter(self.progress_report)
+        steps_progress_reporter = self.progress_reporter.get_subreporter()
         steps = self._non_ignored_steps(ignored_steps)
+        pipeline_params = {
+            "progress_reporter": steps_progress_reporter,
+            "character_ner_tag": self.character_ner_tag,
+        }
         for step in steps:
-            step._pipeline_init_(self.lang, steps_progress_reporter)
+            step_additional_params = step._pipeline_init_(self.lang, **pipeline_params)
+            if not step_additional_params is None:
+                for key, value in step_additional_params.items():
+                    setattr(self, key, value)
+                    pipeline_params[key] = value
     def _non_ignored_steps(
         self, ignored_steps: Optional[List[str]]
@@ -542,13 +630,27 @@ class Pipeline:
                 return (
                     False,
                     [
-                        f"step {i + 1} ({step.__class__.__name__}) has unsatisfied needs (needs : {step.needs()}, available : {pipeline_state})"
+                        "".join(
+                            [
+                                f"step {i + 1} ({step.__class__.__name__}) has unsatisfied needs. "
+                                + f"needs: {step.needs()}. "
+                                + f"available: {pipeline_state}). "
+                                + f"missing: {step.needs() - pipeline_state}."
+                            ]
+                        ),
                     ],
                 )
             if not step.optional_needs().issubset(pipeline_state):
                 warnings.append(
-                    f"step {i + 1} ({step.__class__.__name__}) has unsatisfied optional needs : (optional needs : {step.optional_needs()}, available : {pipeline_state})"
+                    "".join(
+                        [
+                            f"step {i + 1} ({step.__class__.__name__}) has unsatisfied optional needs. "
+                            + f"needs: {step.optional_needs()}. "
+                            + f"available: {pipeline_state}). "
+                            + f"missing: {step.optional_needs() - pipeline_state}."
+                        ]
+                    )
                 )
             pipeline_state = pipeline_state.union(step.production())
@@ -575,9 +677,9 @@ class Pipeline:
             raise ValueError(warnings_or_errors)
         if self.warn:
             for warning in warnings_or_errors:
-                print(f"[warning] : {warning}")
+                print(f"[warning] : {warning}", file=sys.stderr)
-        self._pipeline_init_steps(ignored_steps)
+        self._pipeline_init_steps_(ignored_steps)
         state = PipelineState(text)
         # sets attributes to PipelineState dynamically. This ensures

renard-pipeline 0.4.0__tar.gz → 0.5.0__tar.gz

Potentially problematic release.

renard-pipeline 0.4.0tar.gz → 0.5.0tar.gz