pybibx 4.0.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. pybibx-4.0.4/LICENSE +14 -0
  2. pybibx-4.0.4/MANIFEST.in +1 -0
  3. pybibx-4.0.4/PKG-INFO +153 -0
  4. pybibx-4.0.4/README.md +142 -0
  5. pybibx-4.0.4/pybibx/__init__.py +1 -0
  6. pybibx-4.0.4/pybibx/base/__init__.py +1 -0
  7. pybibx-4.0.4/pybibx/base/pbx.py +5134 -0
  8. pybibx-4.0.4/pybibx/base/stws/Stopwords-Arabic.txt +751 -0
  9. pybibx-4.0.4/pybibx/base/stws/Stopwords-Bengali.txt +119 -0
  10. pybibx-4.0.4/pybibx/base/stws/Stopwords-Bulgarian.txt +259 -0
  11. pybibx-4.0.4/pybibx/base/stws/Stopwords-Chinese.txt +542 -0
  12. pybibx-4.0.4/pybibx/base/stws/Stopwords-Czech.txt +338 -0
  13. pybibx-4.0.4/pybibx/base/stws/Stopwords-English.txt +590 -0
  14. pybibx-4.0.4/pybibx/base/stws/Stopwords-Finnish.txt +747 -0
  15. pybibx-4.0.4/pybibx/base/stws/Stopwords-French.txt +497 -0
  16. pybibx-4.0.4/pybibx/base/stws/Stopwords-German.txt +592 -0
  17. pybibx-4.0.4/pybibx/base/stws/Stopwords-Greek.txt +664 -0
  18. pybibx-4.0.4/pybibx/base/stws/Stopwords-Hebrew.txt +194 -0
  19. pybibx-4.0.4/pybibx/base/stws/Stopwords-Hindi.txt +163 -0
  20. pybibx-4.0.4/pybibx/base/stws/Stopwords-Hungarian.txt +737 -0
  21. pybibx-4.0.4/pybibx/base/stws/Stopwords-Italian.txt +629 -0
  22. pybibx-4.0.4/pybibx/base/stws/Stopwords-Japanese.txt +330 -0
  23. pybibx-4.0.4/pybibx/base/stws/Stopwords-Korean.txt +722 -0
  24. pybibx-4.0.4/pybibx/base/stws/Stopwords-Marathi.txt +99 -0
  25. pybibx-4.0.4/pybibx/base/stws/Stopwords-Persian.txt +332 -0
  26. pybibx-4.0.4/pybibx/base/stws/Stopwords-Polish.txt +138 -0
  27. pybibx-4.0.4/pybibx/base/stws/Stopwords-Portuguese-br.txt +532 -0
  28. pybibx-4.0.4/pybibx/base/stws/Stopwords-Romanian.txt +282 -0
  29. pybibx-4.0.4/pybibx/base/stws/Stopwords-Russian.txt +422 -0
  30. pybibx-4.0.4/pybibx/base/stws/Stopwords-Slovak.txt +180 -0
  31. pybibx-4.0.4/pybibx/base/stws/Stopwords-Spanish.txt +452 -0
  32. pybibx-4.0.4/pybibx/base/stws/Stopwords-Swedish.txt +386 -0
  33. pybibx-4.0.4/pybibx/base/stws/Stopwords-Thai.txt +115 -0
  34. pybibx-4.0.4/pybibx/base/stws/Stopwords-Ukrainian.txt +77 -0
  35. pybibx-4.0.4/pybibx/base/stws/__init__.py +1 -0
  36. pybibx-4.0.4/pybibx.egg-info/PKG-INFO +153 -0
  37. pybibx-4.0.4/pybibx.egg-info/SOURCES.txt +41 -0
  38. pybibx-4.0.4/pybibx.egg-info/dependency_links.txt +1 -0
  39. pybibx-4.0.4/pybibx.egg-info/requires.txt +23 -0
  40. pybibx-4.0.4/pybibx.egg-info/top_level.txt +1 -0
  41. pybibx-4.0.4/pybibx.egg-info/zip-safe +1 -0
  42. pybibx-4.0.4/setup.cfg +4 -0
  43. pybibx-4.0.4/setup.py +45 -0
pybibx-4.0.4/LICENSE ADDED
@@ -0,0 +1,14 @@
1
+ Copyright © 2022 by Valdecy Pereira
2
+
3
+ pybibx is free software: you can redistribute it and/or modify
4
+ it under the terms of the GNU General Public License as published by
5
+ the Free Software Foundation, either version 3 of the License, or
6
+ (at your option) any later version.
7
+
8
+ pybibx is distributed in the hope that it will be useful,
9
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
10
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
+ GNU General Public License for more details.
12
+
13
+ You should have received a copy of the GNU General Public License
14
+ along with pybibx. If not, see <http://www.gnu.org/licenses/>.
@@ -0,0 +1 @@
1
+ recursive-include pybibx/base/stws *.txt
pybibx-4.0.4/PKG-INFO ADDED
@@ -0,0 +1,153 @@
1
+ Metadata-Version: 2.1
2
+ Name: pybibx
3
+ Version: 4.0.4
4
+ Summary: A Bibliometric and Scientometric Library Powered with Artificial Intelligence Tools
5
+ Home-page: https://github.com/Valdecy/pybibx
6
+ Author: Valdecy Pereira
7
+ Author-email: valdecy.pereira@gmail.com
8
+ License: GNU
9
+ Description-Content-Type: text/markdown
10
+ License-File: LICENSE
11
+
12
+ # pybibx
13
+
14
+ ## Introduction
15
+
16
+ A Bibliometric and Scientometric python library that uses the raw files generated by **Scopus** (.bib files or .csv files), **WOS (Web of Science)** (.bib files), and **PubMed** (.txt files) scientific databases. Also, Powered with Advanced AI Technologies for Analyzing Bibliometric, Scientometric Outcomes, and Textual Data
17
+
18
+ To export the correct file formats from Scopus, Web of Science, and PubMed, follow these steps:
19
+
20
+ - a) **Scopus**: Search, select articles, click "Export" choose "BibTeX" or "CSV", select all fields, click "Export" again.
21
+ - b) **WoS**: Search, select articles, click "Export" choose "Save to Other File Formats" select "BibTeX" select all fields, click "Send"
22
+ - c) **PubMed**: Search, select articles, click "Save" choose "PubMed" format, click "Save" to download a .txt file.
23
+
24
+ General Capabilities:
25
+ - a) Works with **Scopus** (.bib files or .csv files), **WOS** (.bib files) and **PubMed** (.txt files) databases
26
+ - b) Identification and Removal of duplicates
27
+ - c) Identification of documents per type
28
+ - d) Generates a Health Report to evaluate the quality of the .bib/.csv file
29
+ - e) Generates an **EDA (Exploratory Data Analysis)** Report: Publications Timespan, Total Number of Countries, Total Number of Institutions, Total Number of Sources, Total Number of References, Total Number of Languages (and also the number of docs for each language), Total Number of Documents, Average Documents per Author, Average Documents per Institution, Average Documents per Source, Average Documents per Year, Total Number of Authors, Total Number of Authors Keywords, Total Number of Authors Keywords Plus, Total Single-Authored Documents, Total Multi-Authored Documents, Average Collaboration Index, Max H-Index, Total Number of Citations, Average Citations per Author, Average Citations per Institution, Average Citations per Document, Average Citations per Source
30
+ - f) Creates an **ID (Identification)** for each Document, Authors, Sources, Institutions, Countries, Authors' Keywords, Keywords Plus. The IDs can be used in graphs/plots to obtain a cleaner visualization
31
+ - g) Creates a **WordCloud** from the Abstracts, Titles, Authors Keywords or Keywords Plus
32
+ - h) Creates a **N-Gram Bar Plot (interactive plot)** from the Abstracts, Titles, Authors Keywords or Keywords Plus
33
+ - i) Creates a **Projection (interactive plot)** of the documents based on the Abstracts, Titles, Authors Keywords or Keywords Plus
34
+ - j) Creates an **Evolution Plot (interactive plot)** based on Abstracts, Titles, Sources, Authors Keywords or Keywords Plus
35
+ - k) Creates an **Evolution Plot Complement (interactive plot)** based on Abstracts, Titles, Sources, Authors Keywords or Keywords Plus
36
+ - l) Creates a **Sankey Diagram (interactive plot)** with any combination of the following keys: Authors, Countries, Institutions, Journals, Auhors_Keywords, Keywords_Plus, and/or Languages
37
+ - m) Creates a **TreeMap** from the Authors, Countries, Institutions, Journals, Auhors_Keywords, or Keywords_Plus
38
+ - n) Creates an **Authors Productivity Plot (interactive plot)** It informs for each year the documents (IDs) published for each author
39
+ - o) Creates a **Countries Productivity Plot (interactive plot)** It informs for each year the documents (IDs) published for each country (each author's country)
40
+ - p) Creates a **Bar Plot** for the following statistics: Documents per Year, Citations per Year, Past Citations per Year, Lotka's Law, Sources per Documents, Sources per Citations, Authors per Documents, Authors per Citations, Authors per H-Index, Bradford's Law (Core Sources 1, 2 or 3), Institutions per Documents, Institutions per Citations, Countries per Documents, Countries per Citations, Language per Documents, Keywords Plus per Documents and Authors' Keywords per Documents
41
+
42
+ Network Capabilities:
43
+ - a) **Collaboration Plot** between Authors, Countries, Institutions, Authors' Keywords or Keywords Plus
44
+ - b) **Citation Analysis (interactive plot)** between Documents (Blue Nodes) and Citations (Red Nodes). Documents and Citations can be highlighted for better visualization
45
+ - c) **Collaboration Analysis (interactive plot)** between Authors, Countries, Institutions or **Adjacency Analysis (interactive plot)** between Authors' Keywords or Keywords Plus. Collaboration and Adjacency can be highlighted for better visualization
46
+ - d) **Similarity Analysis (interactive plot)** can be performed using coupling or cocitation methods
47
+ - e) **World Map Collaboration Analysis (interactive plot)** between Countries in a Map
48
+
49
+ Artificial Intelligence Capabilities:
50
+ - a) **Topic Modelling** using BERTopic to cluster documents by topic
51
+ - b) Visualize topics distribution
52
+ - c) Visualize topics by the most representative words
53
+ - d) Visualize documents projection and clusterization by topic
54
+ - e) Visualize topics heatmap
55
+ - f) Find the most representative documents from each topic
56
+ - g) Find the most representative topics according to a word
57
+ - h) Creates **W2V Embeddings** from Abstracts
58
+ - i) Find Documents based in words
59
+ - j) Calculates the cosine similarity between two words.
60
+ - k) Make operations between **W2V Embeddings**
61
+ - l) Visualize **W2V Embeddings** operations
62
+ - m) Creates **Sentence Embeddings** from Abstracts, Titles, Authors Keywords or Keywords Plus
63
+ - n) **Abstractive Text Summarization** using **PEGASUS** on a set of selected documents or all documents
64
+ - o) **Abstractive Text Summarization** using **chatGPT** on a set of selected documents or all documents. Requires the user to have an **API key** (https://platform.openai.com/account/api-keys)
65
+ - p) **Abstractive Text Summarization** using **Gemini** on a set of selected documents or all documents. Requires the user to have an **API key** (https://ai.google.dev/gemini-api/)
66
+ - q) **Extractive Text Summarization** using **BERT** on a set of selected documents or all documents
67
+ - r) **Ask chatGPT** to analyze the following results: EDA Report, WordCloud, N-Grams, Evolution Plot, Sankey Diagram, Authors Productivity Plot, Bar Plots, Citation Analysis, Collaboration Analysis, Similarity Analysis, and World Map Collaboration Analysis (consult **Example 08**). Requires the user to have an **API key** (https://platform.openai.com/account/api-keys)
68
+ - s) **Ask Gemini** to analyze the following results: EDA Report, WordCloud, N-Grams, Evolution Plot, Sankey Diagram, Authors Productivity Plot, Bar Plots, Citation Analysis, Collaboration Analysis, Similarity Analysis, and World Map Collaboration Analysis (consult **Example 09**). Requires the user to have an **API key** (https://ai.google.dev/gemini-api/)
69
+
70
+ Correction and Manipulation Capabilities:
71
+ - a) Filter the .bib, .csv or .txt file by Year, Sources, Bradford Law Cores, Countries, Languages and/or Abstracts (Documents with Abstracts)
72
+ - b) Merge Authors, Institutions, Countries, Languages and/or Sources that have multiple entries
73
+ - c) Merge different or the same database files one at a time. The preference for information preservation is given to the old database, so the order of merging matters (consult **Examples 04 and 05**)
74
+
75
+ ## Usage
76
+
77
+ 1. Install
78
+ ```bash
79
+ pip install pybibx
80
+ ```
81
+
82
+ 2. Try it in **Colab**:
83
+
84
+ - Example 01: Scopus ([ Colab Demo ](https://colab.research.google.com/drive/1yHiMMZIKa-RrarXbPB9ca0gLN9YvvtPU?usp=sharing))
85
+ - Example 02: WOS ([ Colab Demo ](https://colab.research.google.com/drive/13HLjC4myTvYcjLk2XBTZKbWJ2aqZUST1?usp=sharing))
86
+ - Example 03: PubMed ([ Colab Demo ](https://colab.research.google.com/drive/13CU-KvZMnazga1BmQf2J8wYM9mhHL2e1?usp=sharing))
87
+ - Example 04: Scopus + WOS ([ Colab Demo ](https://colab.research.google.com/drive/1DqEk0_IakJPfIZDVcnTWBE_nxyhW9p-W?usp=sharing))
88
+ - Example 05: WOS + Scopus ([ Colab Demo ](https://colab.research.google.com/drive/12k_IOcSDwumbEtPqqSMbCIE6ZypgKAJn?usp=sharing))
89
+ - Example 06: Scopus + WOS + Pubmed ([ Colab Demo ](https://colab.research.google.com/drive/1Ko6AibkXtB_Kwg3Eu0fhzNMVEIXPkbez?usp=sharing))
90
+ - Example 07: Your Own ([ Colab Demo ](https://colab.research.google.com/drive/19EYjgal9V1kemmzpHnyp6MSlk9S-kGHT?usp=sharing))
91
+ - Example 08: **Ask chatGPT** Analysis ([ Colab Demo ](https://colab.research.google.com/drive/1LMrR49F54MuX-stlrQbrrjX_dEU3kZ8Y?usp=sharing))
92
+ - Example 09: **Ask Gemini** Analysis ([ Colab Demo ](https://colab.research.google.com/drive/1oEJBfCml_OMgmSTicMOB-FKMaR2FtoG3?usp=sharing))
93
+
94
+ # Acknowledgement
95
+ This section indicates the libraries that inspired pybibx
96
+
97
+ - **BERT (https://smrzr.io/)**:
98
+ <!-- -->
99
+ a) Github: https://github.com/dmmiller612/bert-extractive-summarizer
100
+ <!-- -->
101
+ b) Paper: DEREK, M. (2019). Leveraging BERT for Extractive Text Summarization on Lectures. arXiv. doi: https://doi.org/10.48550/arXiv.1906.04165
102
+
103
+ - **BERTopic (https://maartengr.github.io/BERTopic/index.html)**:
104
+ <!-- -->
105
+ a) Github: https://github.com/MaartenGr/BERTopic
106
+ <!-- -->
107
+ b) Paper: GROOTENDORST, M. (2022). BERTopic: Neural Topic Modeling with a Class-based TF-IDF Procedure. arXiv. doi: https://doi.org/10.48550/arXiv.2203.05794
108
+
109
+ - **Bibliometrix (https://www.bibliometrix.org/home/)**:
110
+ <!-- -->
111
+ a) Github: https://github.com/massimoaria/bibliometrix
112
+ <!-- -->
113
+ b) Paper: ARIA, M.; CUCCURULLO, C. (2017). Bibliometrix: An R-tool for Comprehensive Science Mapping Analysis. Journal of Informetrics, 11(4), 959-975. doi: https://doi.org/10.1016/j.joi.2017.08.007
114
+
115
+ - **Gemini (https://gemini.google.com/app)**:
116
+ <!-- -->
117
+ a) Github: https://github.com/google-gemini
118
+ <!-- -->
119
+ b) Paper: Gemini Team Google (2024). Gemini: A Family of Highly Capable Multimodal Models. arXiv. doi: https://arxiv.org/abs/2312.11805
120
+
121
+ - **Gensim (https://radimrehurek.com/gensim/)**:
122
+ <!-- -->
123
+ a) Github: https://github.com/piskvorky/gensim
124
+ <!-- -->
125
+ b) Paper: REHUREK, R.; SOJKA, P. (2010). Software Framework for Topic Modelling with Large Corpora. LREC 2010. doi: https://doi.org/10.13140/2.1.2393.1847
126
+
127
+ - **chatGPT (https://chat.openai.com/chat)**:
128
+ <!-- -->
129
+ a) Github: https://github.com/openai
130
+ <!-- -->
131
+ b) Paper: OPENAI. (2023). GPT-4 Technical Report. arXiv. doi: https://doi.org/10.48550/arXiv.2303.08774
132
+
133
+ - **Metaknowledge (http://www.networkslab.org/metaknowledge)**:
134
+ <!-- -->
135
+ a) Github: https://github.com/UWNETLAB/metaknowledge
136
+ <!-- -->
137
+ b) Paper: McILROY-YOUNG, R.; McLEVEY, J.; ANDERSON, J. (2015). Metaknowledge: Open Source Software for Social Networks, Bibliometrics, and Sociology of Knowledge Research.
138
+
139
+ - **SentenceTransformers (https://www.sbert.net/)**:
140
+ <!-- -->
141
+ a) Github: https://github.com/UKPLab/sentence-transformers
142
+ <!-- -->
143
+ b) Paper: REIMERS, N.; GUREVYCH, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv. doi: https://arxiv.org/abs/1908.10084
144
+
145
+ - **PEGASUS (https://ai.googleblog.com/2020/06/pegasus-state-of-art-model-for.html?m=1)**:
146
+ <!-- -->
147
+ a) Github: https://github.com/huggingface/transformers
148
+ <!-- -->
149
+ b) Paper: ZHANG, J.; ZHAO, Y.; SALEH, M.; LIU, P.J. (2019). PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. arXiv. doi: https://doi.org/10.48550/arXiv.1912.08777
150
+
151
+ And to all the people who helped to improve or correct the code. Thank you very much!
152
+
153
+ * Fabio Ribeiro von Glehn (29.DECEMBER.2022) - UFG - Federal University of Goias (Brazil)
pybibx-4.0.4/README.md ADDED
@@ -0,0 +1,142 @@
1
+ # pybibx
2
+
3
+ ## Introduction
4
+
5
+ A Bibliometric and Scientometric python library that uses the raw files generated by **Scopus** (.bib files or .csv files), **WOS (Web of Science)** (.bib files), and **PubMed** (.txt files) scientific databases. Also, Powered with Advanced AI Technologies for Analyzing Bibliometric, Scientometric Outcomes, and Textual Data
6
+
7
+ To export the correct file formats from Scopus, Web of Science, and PubMed, follow these steps:
8
+
9
+ - a) **Scopus**: Search, select articles, click "Export" choose "BibTeX" or "CSV", select all fields, click "Export" again.
10
+ - b) **WoS**: Search, select articles, click "Export" choose "Save to Other File Formats" select "BibTeX" select all fields, click "Send"
11
+ - c) **PubMed**: Search, select articles, click "Save" choose "PubMed" format, click "Save" to download a .txt file.
12
+
13
+ General Capabilities:
14
+ - a) Works with **Scopus** (.bib files or .csv files), **WOS** (.bib files) and **PubMed** (.txt files) databases
15
+ - b) Identification and Removal of duplicates
16
+ - c) Identification of documents per type
17
+ - d) Generates a Health Report to evaluate the quality of the .bib/.csv file
18
+ - e) Generates an **EDA (Exploratory Data Analysis)** Report: Publications Timespan, Total Number of Countries, Total Number of Institutions, Total Number of Sources, Total Number of References, Total Number of Languages (and also the number of docs for each language), Total Number of Documents, Average Documents per Author, Average Documents per Institution, Average Documents per Source, Average Documents per Year, Total Number of Authors, Total Number of Authors Keywords, Total Number of Authors Keywords Plus, Total Single-Authored Documents, Total Multi-Authored Documents, Average Collaboration Index, Max H-Index, Total Number of Citations, Average Citations per Author, Average Citations per Institution, Average Citations per Document, Average Citations per Source
19
+ - f) Creates an **ID (Identification)** for each Document, Authors, Sources, Institutions, Countries, Authors' Keywords, Keywords Plus. The IDs can be used in graphs/plots to obtain a cleaner visualization
20
+ - g) Creates a **WordCloud** from the Abstracts, Titles, Authors Keywords or Keywords Plus
21
+ - h) Creates a **N-Gram Bar Plot (interactive plot)** from the Abstracts, Titles, Authors Keywords or Keywords Plus
22
+ - i) Creates a **Projection (interactive plot)** of the documents based on the Abstracts, Titles, Authors Keywords or Keywords Plus
23
+ - j) Creates an **Evolution Plot (interactive plot)** based on Abstracts, Titles, Sources, Authors Keywords or Keywords Plus
24
+ - k) Creates an **Evolution Plot Complement (interactive plot)** based on Abstracts, Titles, Sources, Authors Keywords or Keywords Plus
25
+ - l) Creates a **Sankey Diagram (interactive plot)** with any combination of the following keys: Authors, Countries, Institutions, Journals, Auhors_Keywords, Keywords_Plus, and/or Languages
26
+ - m) Creates a **TreeMap** from the Authors, Countries, Institutions, Journals, Auhors_Keywords, or Keywords_Plus
27
+ - n) Creates an **Authors Productivity Plot (interactive plot)** It informs for each year the documents (IDs) published for each author
28
+ - o) Creates a **Countries Productivity Plot (interactive plot)** It informs for each year the documents (IDs) published for each country (each author's country)
29
+ - p) Creates a **Bar Plot** for the following statistics: Documents per Year, Citations per Year, Past Citations per Year, Lotka's Law, Sources per Documents, Sources per Citations, Authors per Documents, Authors per Citations, Authors per H-Index, Bradford's Law (Core Sources 1, 2 or 3), Institutions per Documents, Institutions per Citations, Countries per Documents, Countries per Citations, Language per Documents, Keywords Plus per Documents and Authors' Keywords per Documents
30
+
31
+ Network Capabilities:
32
+ - a) **Collaboration Plot** between Authors, Countries, Institutions, Authors' Keywords or Keywords Plus
33
+ - b) **Citation Analysis (interactive plot)** between Documents (Blue Nodes) and Citations (Red Nodes). Documents and Citations can be highlighted for better visualization
34
+ - c) **Collaboration Analysis (interactive plot)** between Authors, Countries, Institutions or **Adjacency Analysis (interactive plot)** between Authors' Keywords or Keywords Plus. Collaboration and Adjacency can be highlighted for better visualization
35
+ - d) **Similarity Analysis (interactive plot)** can be performed using coupling or cocitation methods
36
+ - e) **World Map Collaboration Analysis (interactive plot)** between Countries in a Map
37
+
38
+ Artificial Intelligence Capabilities:
39
+ - a) **Topic Modelling** using BERTopic to cluster documents by topic
40
+ - b) Visualize topics distribution
41
+ - c) Visualize topics by the most representative words
42
+ - d) Visualize documents projection and clusterization by topic
43
+ - e) Visualize topics heatmap
44
+ - f) Find the most representative documents from each topic
45
+ - g) Find the most representative topics according to a word
46
+ - h) Creates **W2V Embeddings** from Abstracts
47
+ - i) Find Documents based in words
48
+ - j) Calculates the cosine similarity between two words.
49
+ - k) Make operations between **W2V Embeddings**
50
+ - l) Visualize **W2V Embeddings** operations
51
+ - m) Creates **Sentence Embeddings** from Abstracts, Titles, Authors Keywords or Keywords Plus
52
+ - n) **Abstractive Text Summarization** using **PEGASUS** on a set of selected documents or all documents
53
+ - o) **Abstractive Text Summarization** using **chatGPT** on a set of selected documents or all documents. Requires the user to have an **API key** (https://platform.openai.com/account/api-keys)
54
+ - p) **Abstractive Text Summarization** using **Gemini** on a set of selected documents or all documents. Requires the user to have an **API key** (https://ai.google.dev/gemini-api/)
55
+ - q) **Extractive Text Summarization** using **BERT** on a set of selected documents or all documents
56
+ - r) **Ask chatGPT** to analyze the following results: EDA Report, WordCloud, N-Grams, Evolution Plot, Sankey Diagram, Authors Productivity Plot, Bar Plots, Citation Analysis, Collaboration Analysis, Similarity Analysis, and World Map Collaboration Analysis (consult **Example 08**). Requires the user to have an **API key** (https://platform.openai.com/account/api-keys)
57
+ - s) **Ask Gemini** to analyze the following results: EDA Report, WordCloud, N-Grams, Evolution Plot, Sankey Diagram, Authors Productivity Plot, Bar Plots, Citation Analysis, Collaboration Analysis, Similarity Analysis, and World Map Collaboration Analysis (consult **Example 09**). Requires the user to have an **API key** (https://ai.google.dev/gemini-api/)
58
+
59
+ Correction and Manipulation Capabilities:
60
+ - a) Filter the .bib, .csv or .txt file by Year, Sources, Bradford Law Cores, Countries, Languages and/or Abstracts (Documents with Abstracts)
61
+ - b) Merge Authors, Institutions, Countries, Languages and/or Sources that have multiple entries
62
+ - c) Merge different or the same database files one at a time. The preference for information preservation is given to the old database, so the order of merging matters (consult **Examples 04 and 05**)
63
+
64
+ ## Usage
65
+
66
+ 1. Install
67
+ ```bash
68
+ pip install pybibx
69
+ ```
70
+
71
+ 2. Try it in **Colab**:
72
+
73
+ - Example 01: Scopus ([ Colab Demo ](https://colab.research.google.com/drive/1yHiMMZIKa-RrarXbPB9ca0gLN9YvvtPU?usp=sharing))
74
+ - Example 02: WOS ([ Colab Demo ](https://colab.research.google.com/drive/13HLjC4myTvYcjLk2XBTZKbWJ2aqZUST1?usp=sharing))
75
+ - Example 03: PubMed ([ Colab Demo ](https://colab.research.google.com/drive/13CU-KvZMnazga1BmQf2J8wYM9mhHL2e1?usp=sharing))
76
+ - Example 04: Scopus + WOS ([ Colab Demo ](https://colab.research.google.com/drive/1DqEk0_IakJPfIZDVcnTWBE_nxyhW9p-W?usp=sharing))
77
+ - Example 05: WOS + Scopus ([ Colab Demo ](https://colab.research.google.com/drive/12k_IOcSDwumbEtPqqSMbCIE6ZypgKAJn?usp=sharing))
78
+ - Example 06: Scopus + WOS + Pubmed ([ Colab Demo ](https://colab.research.google.com/drive/1Ko6AibkXtB_Kwg3Eu0fhzNMVEIXPkbez?usp=sharing))
79
+ - Example 07: Your Own ([ Colab Demo ](https://colab.research.google.com/drive/19EYjgal9V1kemmzpHnyp6MSlk9S-kGHT?usp=sharing))
80
+ - Example 08: **Ask chatGPT** Analysis ([ Colab Demo ](https://colab.research.google.com/drive/1LMrR49F54MuX-stlrQbrrjX_dEU3kZ8Y?usp=sharing))
81
+ - Example 09: **Ask Gemini** Analysis ([ Colab Demo ](https://colab.research.google.com/drive/1oEJBfCml_OMgmSTicMOB-FKMaR2FtoG3?usp=sharing))
82
+
83
+ # Acknowledgement
84
+ This section indicates the libraries that inspired pybibx
85
+
86
+ - **BERT (https://smrzr.io/)**:
87
+ <!-- -->
88
+ a) Github: https://github.com/dmmiller612/bert-extractive-summarizer
89
+ <!-- -->
90
+ b) Paper: DEREK, M. (2019). Leveraging BERT for Extractive Text Summarization on Lectures. arXiv. doi: https://doi.org/10.48550/arXiv.1906.04165
91
+
92
+ - **BERTopic (https://maartengr.github.io/BERTopic/index.html)**:
93
+ <!-- -->
94
+ a) Github: https://github.com/MaartenGr/BERTopic
95
+ <!-- -->
96
+ b) Paper: GROOTENDORST, M. (2022). BERTopic: Neural Topic Modeling with a Class-based TF-IDF Procedure. arXiv. doi: https://doi.org/10.48550/arXiv.2203.05794
97
+
98
+ - **Bibliometrix (https://www.bibliometrix.org/home/)**:
99
+ <!-- -->
100
+ a) Github: https://github.com/massimoaria/bibliometrix
101
+ <!-- -->
102
+ b) Paper: ARIA, M.; CUCCURULLO, C. (2017). Bibliometrix: An R-tool for Comprehensive Science Mapping Analysis. Journal of Informetrics, 11(4), 959-975. doi: https://doi.org/10.1016/j.joi.2017.08.007
103
+
104
+ - **Gemini (https://gemini.google.com/app)**:
105
+ <!-- -->
106
+ a) Github: https://github.com/google-gemini
107
+ <!-- -->
108
+ b) Paper: Gemini Team Google (2024). Gemini: A Family of Highly Capable Multimodal Models. arXiv. doi: https://arxiv.org/abs/2312.11805
109
+
110
+ - **Gensim (https://radimrehurek.com/gensim/)**:
111
+ <!-- -->
112
+ a) Github: https://github.com/piskvorky/gensim
113
+ <!-- -->
114
+ b) Paper: REHUREK, R.; SOJKA, P. (2010). Software Framework for Topic Modelling with Large Corpora. LREC 2010. doi: https://doi.org/10.13140/2.1.2393.1847
115
+
116
+ - **chatGPT (https://chat.openai.com/chat)**:
117
+ <!-- -->
118
+ a) Github: https://github.com/openai
119
+ <!-- -->
120
+ b) Paper: OPENAI. (2023). GPT-4 Technical Report. arXiv. doi: https://doi.org/10.48550/arXiv.2303.08774
121
+
122
+ - **Metaknowledge (http://www.networkslab.org/metaknowledge)**:
123
+ <!-- -->
124
+ a) Github: https://github.com/UWNETLAB/metaknowledge
125
+ <!-- -->
126
+ b) Paper: McILROY-YOUNG, R.; McLEVEY, J.; ANDERSON, J. (2015). Metaknowledge: Open Source Software for Social Networks, Bibliometrics, and Sociology of Knowledge Research.
127
+
128
+ - **SentenceTransformers (https://www.sbert.net/)**:
129
+ <!-- -->
130
+ a) Github: https://github.com/UKPLab/sentence-transformers
131
+ <!-- -->
132
+ b) Paper: REIMERS, N.; GUREVYCH, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv. doi: https://arxiv.org/abs/1908.10084
133
+
134
+ - **PEGASUS (https://ai.googleblog.com/2020/06/pegasus-state-of-art-model-for.html?m=1)**:
135
+ <!-- -->
136
+ a) Github: https://github.com/huggingface/transformers
137
+ <!-- -->
138
+ b) Paper: ZHANG, J.; ZHAO, Y.; SALEH, M.; LIU, P.J. (2019). PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. arXiv. doi: https://doi.org/10.48550/arXiv.1912.08777
139
+
140
+ And to all the people who helped to improve or correct the code. Thank you very much!
141
+
142
+ * Fabio Ribeiro von Glehn (29.DECEMBER.2022) - UFG - Federal University of Goias (Brazil)
@@ -0,0 +1 @@
1
+
@@ -0,0 +1 @@
1
+ from .pbx import pbx_probe