arxiv-to-prompt 0.1.1__tar.gz → 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
- Metadata-Version: 2.2
1
+ Metadata-Version: 2.4
2
2
  Name: arxiv-to-prompt
3
- Version: 0.1.1
3
+ Version: 0.2.1
4
4
  Summary: transform arXiv papers into a single latex prompt for LLMs
5
5
  Author: Takashi Ishida
6
6
  License: MIT
@@ -15,15 +15,16 @@ Requires-Dist: requests>=2.25.0
15
15
  Provides-Extra: test
16
16
  Requires-Dist: pytest>=7.0.0; extra == "test"
17
17
  Requires-Dist: pytest-cov>=4.0.0; extra == "test"
18
+ Dynamic: license-file
18
19
 
19
20
  # arxiv-to-prompt
20
21
 
21
- [![PyPI version](https://badge.fury.io/py/arxiv-to-prompt.svg?update=20250202)](https://pypi.org/project/arxiv-to-prompt/)
22
+ [![PyPI version](https://badge.fury.io/py/arxiv-to-prompt.svg?update=20250307)](https://pypi.org/project/arxiv-to-prompt/)
22
23
  [![Tests](https://github.com/takashiishida/arxiv-to-prompt/actions/workflows/tests.yml/badge.svg)](https://github.com/takashiishida/arxiv-to-prompt/actions)
23
24
  [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
24
25
  [![Changelog](https://img.shields.io/github/v/release/takashiishida/arxiv-to-prompt?label=changelog)](https://github.com/takashiishida/arxiv-to-prompt/releases)
25
26
 
26
- A command-line tool to transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper. It downloads the source files, automatically finds the main tex file containing `\documentclass`, and flattens multiple files into a single coherent source by resolving `\input` and `\include` commands. The tool also provides an option to remove LaTeX comments from the output (which can be useful to shorten the prompt).
27
+ A command-line tool to transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper. It downloads the source files, automatically finds the main tex file containing `\documentclass`, and flattens multiple files into a single coherent source by resolving `\input` and `\include` commands. The tool also provides options to remove LaTeX comments and appendix sections from the output (which can be useful to shorten the prompt).
27
28
 
28
29
  ### Installation
29
30
 
@@ -41,6 +42,12 @@ arxiv-to-prompt 2303.08774
41
42
  # Display LaTeX source without comments
42
43
  arxiv-to-prompt 2303.08774 --no-comments
43
44
 
45
+ # Display LaTeX source without appendix sections
46
+ arxiv-to-prompt 2303.08774 --no-appendix
47
+
48
+ # Combine options (no comments and no appendix)
49
+ arxiv-to-prompt 2303.08774 --no-comments --no-appendix
50
+
44
51
  # Copy to clipboard
45
52
  arxiv-to-prompt 2303.08774 | pbcopy
46
53
 
@@ -62,8 +69,23 @@ latex_source = process_latex_source("2303.08774")
62
69
 
63
70
  # Get LaTeX source without comments
64
71
  latex_source = process_latex_source("2303.08774", keep_comments=False)
72
+
73
+ # Get LaTeX source without appendix sections
74
+ latex_source = process_latex_source("2303.08774", remove_appendix_section=True)
75
+
76
+ # Combine options (no comments and no appendix)
77
+ latex_source = process_latex_source("2303.08774", keep_comments=False, remove_appendix_section=True)
65
78
  ```
66
79
 
80
+ ### Projects Using arxiv-to-prompt
81
+
82
+ Here are some projects and use cases that leverage arxiv-to-prompt:
83
+
84
+ - [arxiv-latex-mcp](https://github.com/takashiishida/arxiv-latex-mcp): MCP server that uses arxiv-to-prompt to fetch and process arXiv LaTeX sources for precise interpretation of mathematical expressions in scientific papers.
85
+ - [arxiv-tex-ui](https://github.com/takashiishida/arxiv-tex-ui): chat with an llm about an arxiv paper by using the latex source.
86
+
87
+ If you're using arxiv-to-prompt in your project, please submit a pull request to add it to this list!
88
+
67
89
  ### References
68
90
 
69
91
  - Inspired by [files-to-prompt](https://github.com/simonw/files-to-prompt).
@@ -1,11 +1,11 @@
1
1
  # arxiv-to-prompt
2
2
 
3
- [![PyPI version](https://badge.fury.io/py/arxiv-to-prompt.svg?update=20250202)](https://pypi.org/project/arxiv-to-prompt/)
3
+ [![PyPI version](https://badge.fury.io/py/arxiv-to-prompt.svg?update=20250307)](https://pypi.org/project/arxiv-to-prompt/)
4
4
  [![Tests](https://github.com/takashiishida/arxiv-to-prompt/actions/workflows/tests.yml/badge.svg)](https://github.com/takashiishida/arxiv-to-prompt/actions)
5
5
  [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
6
  [![Changelog](https://img.shields.io/github/v/release/takashiishida/arxiv-to-prompt?label=changelog)](https://github.com/takashiishida/arxiv-to-prompt/releases)
7
7
 
8
- A command-line tool to transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper. It downloads the source files, automatically finds the main tex file containing `\documentclass`, and flattens multiple files into a single coherent source by resolving `\input` and `\include` commands. The tool also provides an option to remove LaTeX comments from the output (which can be useful to shorten the prompt).
8
+ A command-line tool to transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper. It downloads the source files, automatically finds the main tex file containing `\documentclass`, and flattens multiple files into a single coherent source by resolving `\input` and `\include` commands. The tool also provides options to remove LaTeX comments and appendix sections from the output (which can be useful to shorten the prompt).
9
9
 
10
10
  ### Installation
11
11
 
@@ -23,6 +23,12 @@ arxiv-to-prompt 2303.08774
23
23
  # Display LaTeX source without comments
24
24
  arxiv-to-prompt 2303.08774 --no-comments
25
25
 
26
+ # Display LaTeX source without appendix sections
27
+ arxiv-to-prompt 2303.08774 --no-appendix
28
+
29
+ # Combine options (no comments and no appendix)
30
+ arxiv-to-prompt 2303.08774 --no-comments --no-appendix
31
+
26
32
  # Copy to clipboard
27
33
  arxiv-to-prompt 2303.08774 | pbcopy
28
34
 
@@ -44,9 +50,24 @@ latex_source = process_latex_source("2303.08774")
44
50
 
45
51
  # Get LaTeX source without comments
46
52
  latex_source = process_latex_source("2303.08774", keep_comments=False)
53
+
54
+ # Get LaTeX source without appendix sections
55
+ latex_source = process_latex_source("2303.08774", remove_appendix_section=True)
56
+
57
+ # Combine options (no comments and no appendix)
58
+ latex_source = process_latex_source("2303.08774", keep_comments=False, remove_appendix_section=True)
47
59
  ```
48
60
 
61
+ ### Projects Using arxiv-to-prompt
62
+
63
+ Here are some projects and use cases that leverage arxiv-to-prompt:
64
+
65
+ - [arxiv-latex-mcp](https://github.com/takashiishida/arxiv-latex-mcp): MCP server that uses arxiv-to-prompt to fetch and process arXiv LaTeX sources for precise interpretation of mathematical expressions in scientific papers.
66
+ - [arxiv-tex-ui](https://github.com/takashiishida/arxiv-tex-ui): chat with an llm about an arxiv paper by using the latex source.
67
+
68
+ If you're using arxiv-to-prompt in your project, please submit a pull request to add it to this list!
69
+
49
70
  ### References
50
71
 
51
72
  - Inspired by [files-to-prompt](https://github.com/simonw/files-to-prompt).
52
- - Reused some code from [paper2slides](https://github.com/takashiishida/paper2slides).
73
+ - Reused some code from [paper2slides](https://github.com/takashiishida/paper2slides).
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "arxiv-to-prompt"
7
- version = "0.1.1"
7
+ version = "0.2.1"
8
8
  description = "transform arXiv papers into a single latex prompt for LLMs"
9
9
  readme = "README.md"
10
10
  authors = [{ name = "Takashi Ishida" }]
@@ -22,13 +22,19 @@ def main():
22
22
  help=f"Custom directory to store downloaded files (default: {default_cache})",
23
23
  default=None
24
24
  )
25
+ parser.add_argument(
26
+ "--no-appendix",
27
+ action="store_true",
28
+ help="Remove the appendix section and everything after it"
29
+ )
25
30
 
26
31
  args = parser.parse_args()
27
32
 
28
33
  content = process_latex_source(
29
34
  args.arxiv_id,
30
35
  keep_comments=not args.no_comments,
31
- cache_dir=args.cache_dir
36
+ cache_dir=args.cache_dir,
37
+ remove_appendix_section=args.no_appendix
32
38
  )
33
39
  if content:
34
40
  print(content)
@@ -140,6 +140,14 @@ def remove_comments_from_lines(text: str) -> str:
140
140
  result.append(''.join(cleaned_line).rstrip())
141
141
  return '\n'.join(result)
142
142
 
143
+ def remove_appendix(text: str) -> str:
144
+ """Remove appendix section and everything after it."""
145
+ # Find the position of \appendix command
146
+ appendix_match = re.search(r'\\appendix\b', text)
147
+ if appendix_match:
148
+ return text[:appendix_match.start()].rstrip()
149
+ return text
150
+
143
151
  def flatten_tex(directory: str, main_file: str) -> str:
144
152
  """Combine all tex files into one, resolving inputs."""
145
153
  def process_file(file_path: str, processed_files: set) -> str:
@@ -184,7 +192,8 @@ def flatten_tex(directory: str, main_file: str) -> str:
184
192
 
185
193
  # Process the command normally
186
194
  input_file = match.group(1)
187
- if not input_file.endswith('.tex'):
195
+ # Only add .tex extension if the file has no extension at all
196
+ if not os.path.splitext(input_file)[1]:
188
197
  input_file += '.tex'
189
198
  input_path = os.path.join(directory, input_file)
190
199
  return process_file(input_path, processed_files)
@@ -201,7 +210,7 @@ def flatten_tex(directory: str, main_file: str) -> str:
201
210
 
202
211
  def process_latex_source(arxiv_id: str, keep_comments: bool = True,
203
212
  cache_dir: Optional[str] = None,
204
- use_cache: bool = False) -> Optional[str]:
213
+ use_cache: bool = False, remove_appendix_section: bool = False) -> Optional[str]:
205
214
  """
206
215
  Process LaTeX source files from arXiv and return the combined content.
207
216
 
@@ -210,6 +219,7 @@ def process_latex_source(arxiv_id: str, keep_comments: bool = True,
210
219
  keep_comments: Whether to keep LaTeX comments in the output
211
220
  cache_dir: Custom directory to store downloaded files
212
221
  use_cache: Whether to use cached files if they exist (default: False)
222
+ remove_appendix_section: Whether to remove the appendix section and everything after it
213
223
 
214
224
  Returns:
215
225
  The processed LaTeX content or None if processing fails
@@ -234,6 +244,10 @@ def process_latex_source(arxiv_id: str, keep_comments: bool = True,
234
244
  if not keep_comments:
235
245
  content = remove_comments_from_lines(content)
236
246
 
247
+ # Remove appendix if requested
248
+ if remove_appendix_section:
249
+ content = remove_appendix(content)
250
+
237
251
  return content
238
252
 
239
253
  def check_source_available(arxiv_id: str) -> bool:
@@ -1,6 +1,6 @@
1
- Metadata-Version: 2.2
1
+ Metadata-Version: 2.4
2
2
  Name: arxiv-to-prompt
3
- Version: 0.1.1
3
+ Version: 0.2.1
4
4
  Summary: transform arXiv papers into a single latex prompt for LLMs
5
5
  Author: Takashi Ishida
6
6
  License: MIT
@@ -15,15 +15,16 @@ Requires-Dist: requests>=2.25.0
15
15
  Provides-Extra: test
16
16
  Requires-Dist: pytest>=7.0.0; extra == "test"
17
17
  Requires-Dist: pytest-cov>=4.0.0; extra == "test"
18
+ Dynamic: license-file
18
19
 
19
20
  # arxiv-to-prompt
20
21
 
21
- [![PyPI version](https://badge.fury.io/py/arxiv-to-prompt.svg?update=20250202)](https://pypi.org/project/arxiv-to-prompt/)
22
+ [![PyPI version](https://badge.fury.io/py/arxiv-to-prompt.svg?update=20250307)](https://pypi.org/project/arxiv-to-prompt/)
22
23
  [![Tests](https://github.com/takashiishida/arxiv-to-prompt/actions/workflows/tests.yml/badge.svg)](https://github.com/takashiishida/arxiv-to-prompt/actions)
23
24
  [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
24
25
  [![Changelog](https://img.shields.io/github/v/release/takashiishida/arxiv-to-prompt?label=changelog)](https://github.com/takashiishida/arxiv-to-prompt/releases)
25
26
 
26
- A command-line tool to transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper. It downloads the source files, automatically finds the main tex file containing `\documentclass`, and flattens multiple files into a single coherent source by resolving `\input` and `\include` commands. The tool also provides an option to remove LaTeX comments from the output (which can be useful to shorten the prompt).
27
+ A command-line tool to transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper. It downloads the source files, automatically finds the main tex file containing `\documentclass`, and flattens multiple files into a single coherent source by resolving `\input` and `\include` commands. The tool also provides options to remove LaTeX comments and appendix sections from the output (which can be useful to shorten the prompt).
27
28
 
28
29
  ### Installation
29
30
 
@@ -41,6 +42,12 @@ arxiv-to-prompt 2303.08774
41
42
  # Display LaTeX source without comments
42
43
  arxiv-to-prompt 2303.08774 --no-comments
43
44
 
45
+ # Display LaTeX source without appendix sections
46
+ arxiv-to-prompt 2303.08774 --no-appendix
47
+
48
+ # Combine options (no comments and no appendix)
49
+ arxiv-to-prompt 2303.08774 --no-comments --no-appendix
50
+
44
51
  # Copy to clipboard
45
52
  arxiv-to-prompt 2303.08774 | pbcopy
46
53
 
@@ -62,8 +69,23 @@ latex_source = process_latex_source("2303.08774")
62
69
 
63
70
  # Get LaTeX source without comments
64
71
  latex_source = process_latex_source("2303.08774", keep_comments=False)
72
+
73
+ # Get LaTeX source without appendix sections
74
+ latex_source = process_latex_source("2303.08774", remove_appendix_section=True)
75
+
76
+ # Combine options (no comments and no appendix)
77
+ latex_source = process_latex_source("2303.08774", keep_comments=False, remove_appendix_section=True)
65
78
  ```
66
79
 
80
+ ### Projects Using arxiv-to-prompt
81
+
82
+ Here are some projects and use cases that leverage arxiv-to-prompt:
83
+
84
+ - [arxiv-latex-mcp](https://github.com/takashiishida/arxiv-latex-mcp): MCP server that uses arxiv-to-prompt to fetch and process arXiv LaTeX sources for precise interpretation of mathematical expressions in scientific papers.
85
+ - [arxiv-tex-ui](https://github.com/takashiishida/arxiv-tex-ui): chat with an llm about an arxiv paper by using the latex source.
86
+
87
+ If you're using arxiv-to-prompt in your project, please submit a pull request to add it to this list!
88
+
67
89
  ### References
68
90
 
69
91
  - Inspired by [files-to-prompt](https://github.com/simonw/files-to-prompt).
@@ -9,6 +9,7 @@ from arxiv_to_prompt.core import (
9
9
  remove_comments_from_lines,
10
10
  check_source_available,
11
11
  flatten_tex,
12
+ remove_appendix,
12
13
  )
13
14
 
14
15
  # Test fixtures
@@ -176,3 +177,97 @@ Text with escaped \\% and then % \\input{commented_file3}
176
177
  assert "\\include{commented_file2}" in result
177
178
  assert "\\input{commented_file3}" in result
178
179
  assert "\\input{nonexistent_file}" in result
180
+
181
+
182
+ def test_remove_appendix():
183
+ """Test appendix removal functionality."""
184
+ test_cases = [
185
+ # Basic appendix removal
186
+ (
187
+ "Main content\n\n\\appendix\nAppendix content",
188
+ "Main content"
189
+ ),
190
+ # No appendix to remove
191
+ (
192
+ "Main content only",
193
+ "Main content only"
194
+ ),
195
+ # Appendix with sections
196
+ (
197
+ "Introduction\n\\section{Method}\nContent\n\\appendix\n\\section{Additional Info}\nMore stuff",
198
+ "Introduction\n\\section{Method}\nContent"
199
+ ),
200
+ # Multiple appendix commands (should remove from first one)
201
+ (
202
+ "Content\n\\appendix\nFirst appendix\n\\appendix\nSecond appendix",
203
+ "Content"
204
+ ),
205
+ # Appendix at the beginning
206
+ (
207
+ "\\appendix\nAll appendix content",
208
+ ""
209
+ ),
210
+ ]
211
+
212
+ for input_text, expected in test_cases:
213
+ result = remove_appendix(input_text)
214
+ assert result == expected, f"Failed for input: {input_text}"
215
+
216
+
217
+ def test_process_latex_with_appendix_removal(sample_arxiv_id, temp_cache_dir):
218
+ """Test processing LaTeX source with appendix removal."""
219
+ # Test with appendix removal
220
+ result = process_latex_source(
221
+ sample_arxiv_id,
222
+ keep_comments=True,
223
+ cache_dir=str(temp_cache_dir),
224
+ remove_appendix_section=True
225
+ )
226
+ assert result is not None
227
+ assert "\\documentclass" in result
228
+
229
+ # Check that appendix was removed (if it existed)
230
+ assert "\\appendix" not in result
231
+
232
+
233
+ def test_input_file_extensions(temp_cache_dir):
234
+ """Test that input files with existing extensions are not modified."""
235
+ # Create test directory and files
236
+ tex_dir = temp_cache_dir / "test_extensions"
237
+ tex_dir.mkdir(parents=True)
238
+
239
+ # Create main file with various input commands
240
+ main_file = tex_dir / "main.tex"
241
+ main_content = """\\documentclass{article}
242
+ \\begin{document}
243
+ \\input{chapter1}
244
+ \\input{main.bbl}
245
+ \\input{mystyle.sty}
246
+ \\input{config.cls}
247
+ \\input{already.tex}
248
+ \\end{document}
249
+ """
250
+ main_file.write_text(main_content)
251
+
252
+ # Create the files that should be included
253
+ files_to_create = [
254
+ ("chapter1.tex", "Chapter 1 content"),
255
+ ("main.bbl", "Bibliography content"),
256
+ ("mystyle.sty", "Style content"),
257
+ ("config.cls", "Class content"),
258
+ ("already.tex", "Already tex content"),
259
+ ]
260
+
261
+ for filename, content in files_to_create:
262
+ file_path = tex_dir / filename
263
+ file_path.write_text(content)
264
+
265
+ # Run the flatten_tex function
266
+ result = flatten_tex(str(tex_dir), "main.tex")
267
+
268
+ # Check that all files were included correctly
269
+ assert "Chapter 1 content" in result
270
+ assert "Bibliography content" in result
271
+ assert "Style content" in result
272
+ assert "Class content" in result
273
+ assert "Already tex content" in result
File without changes