together 0.2.4__tar.gz → 0.2.5__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {together-0.2.4 → together-0.2.5}/PKG-INFO +114 -24
- {together-0.2.4 → together-0.2.5}/README.md +108 -20
- together-0.2.5/examples/antihallucination.jsonl +238 -0
- together-0.2.5/examples/jokes_dataset.json +304 -0
- together-0.2.5/examples/jokes_example.py +14 -0
- {together-0.2.4 → together-0.2.5}/pyproject.toml +5 -4
- {together-0.2.4 → together-0.2.5}/src/together/__init__.py +4 -10
- {together-0.2.4 → together-0.2.5}/src/together/cli/cli.py +8 -9
- {together-0.2.4 → together-0.2.5}/src/together/commands/chat.py +2 -6
- {together-0.2.4 → together-0.2.5}/src/together/commands/complete.py +35 -41
- {together-0.2.4 → together-0.2.5}/src/together/commands/files.py +7 -22
- {together-0.2.4 → together-0.2.5}/src/together/commands/finetune.py +15 -15
- {together-0.2.4 → together-0.2.5}/src/together/commands/image.py +6 -8
- {together-0.2.4 → together-0.2.5}/src/together/commands/models.py +25 -35
- together-0.2.5/src/together/complete.py +110 -0
- {together-0.2.4 → together-0.2.5}/src/together/error.py +12 -2
- {together-0.2.4 → together-0.2.5}/src/together/files.py +24 -86
- {together-0.2.4 → together-0.2.5}/src/together/finetune.py +78 -181
- together-0.2.5/src/together/image.py +44 -0
- together-0.2.5/src/together/models.py +107 -0
- together-0.2.5/src/together/utils.py +175 -0
- together-0.2.5/src/together/version.py +1 -0
- together-0.2.4/src/together/complete.py +0 -165
- together-0.2.4/src/together/config.py +0 -389
- together-0.2.4/src/together/image.py +0 -78
- together-0.2.4/src/together/models.py +0 -155
- together-0.2.4/src/together/utils/utils.py +0 -75
- together-0.2.4/src/together/version.py +0 -1
- {together-0.2.4 → together-0.2.5}/.github/workflows/check_code_quality.yml +0 -0
- {together-0.2.4 → together-0.2.5}/.github/workflows/rdme-docs.yml +0 -0
- {together-0.2.4 → together-0.2.5}/.github/workflows/upload-to-pypi.yml +0 -0
- {together-0.2.4 → together-0.2.5}/.gitignore +0 -0
- {together-0.2.4 → together-0.2.5}/LICENSE +0 -0
- {together-0.2.4 → together-0.2.5}/src/together/cli/__init__.py +0 -0
- {together-0.2.4 → together-0.2.5}/src/together/commands/__init__.py +0 -0
- {together-0.2.4/src/together/utils → together-0.2.5/src/together/tools}/__init__.py +0 -0
- {together-0.2.4/src/together/utils → together-0.2.5/src/together/tools}/conversation.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: together
|
|
3
|
-
Version: 0.2.
|
|
3
|
+
Version: 0.2.5
|
|
4
4
|
Summary: Python client for Together's Cloud Platform!
|
|
5
5
|
Project-URL: Homepage, https://github.com/togethercomputer/together
|
|
6
6
|
Project-URL: Bug Tracker, https://github.com/togethercomputer/together/issues
|
|
@@ -211,16 +211,18 @@ Classifier: License :: OSI Approved :: Apache Software License
|
|
|
211
211
|
Classifier: Operating System :: POSIX :: Linux
|
|
212
212
|
Classifier: Programming Language :: Python :: 3
|
|
213
213
|
Requires-Python: >=3.6
|
|
214
|
-
Requires-Dist: requests
|
|
214
|
+
Requires-Dist: requests==2.31.0
|
|
215
215
|
Requires-Dist: sseclient-py==1.7.2
|
|
216
|
-
Requires-Dist: tqdm
|
|
217
|
-
Requires-Dist: typer
|
|
216
|
+
Requires-Dist: tqdm==4.66.1
|
|
217
|
+
Requires-Dist: typer==0.9.0
|
|
218
218
|
Provides-Extra: quality
|
|
219
219
|
Requires-Dist: black~=23.1; extra == 'quality'
|
|
220
220
|
Requires-Dist: mypy>=1.3.0; extra == 'quality'
|
|
221
221
|
Requires-Dist: ruff<=0.0.259,>=0.0.241; extra == 'quality'
|
|
222
222
|
Requires-Dist: types-requests>=2.31.0.1; extra == 'quality'
|
|
223
223
|
Requires-Dist: types-tqdm>=4.65.0.0; extra == 'quality'
|
|
224
|
+
Provides-Extra: tokenize
|
|
225
|
+
Requires-Dist: transformers>=4.33.2; extra == 'tokenize'
|
|
224
226
|
Description-Content-Type: text/markdown
|
|
225
227
|
|
|
226
228
|
The [Together Python Library](https://pypi.org/project/together/) is the official Python client for Together's API platform, providing a convenient way for interacting with the Together APIs and enables easy integration of the inference API with your applications.
|
|
@@ -390,37 +392,36 @@ Refer to the [Files docs](https://docs.together.ai/docs/python-files) on the cor
|
|
|
390
392
|
|
|
391
393
|
Files uploaded for training, fine-tuning and validation must be in [jsonlines](https://jsonlines.org/) format.
|
|
392
394
|
|
|
393
|
-
|
|
395
|
+
In the example below we provide a link for you to download a jsonl file locally that serves as an example of the correct file formatting. In your terminal or commandline, navigate to the folder you want to download the file, paste the command below and hit enter.
|
|
394
396
|
|
|
395
|
-
```
|
|
396
|
-
|
|
397
|
+
```console
|
|
398
|
+
curl https://huggingface.co/datasets/clam004/antihallucination_dataset/resolve/main/antihallucination.jsonl -o antihallucination.jsonl
|
|
397
399
|
```
|
|
398
400
|
|
|
399
|
-
|
|
400
|
-
[{'text': '<human>: Why is this joke funny? Q: What does a computer do when it is tired? A: It crashes!\nRosey: Did your computer ever stop working (freeze) or turn itself off when you wanted to use it? If it did, you can say your computer crashed. Computer crashes are frustrating because you cannot get your work done, and sometimes you lose the work you finished most recently, unless you saved it. Thankfully, they do not happen too often.'},
|
|
401
|
-
{'text': "<human>: I'm going to give you a joke. After I give you the joke, please explain it: Q: Where can you go to practice math on New Year’s Eve? A: Times Square!\nRosey: To get better at math, people often people often practice addition (+) or subtraction (-) problems; then they work on multiplication (x) and division (÷), and all the way through calculus and beyond. Practicing multiplication is also known as practicing your times tables. You can say what is seven times nine (7 x 9), or you can can say what is seven multiplied by nine. They mean the same thing, times or multiplied by."},
|
|
402
|
-
{'text': '<human>: Explain this joke: Q: When does it rain money? A: When there is change in the weather!\nRosey: “The rain in Spain falls mainly on the plain,” according to the song from My Fair Lady. In Peru, they just wish it would rain! But nowhere does it rain money. Rain is water that falls from the sky as part of the evaporation/water cycle. When it is sunny outside, and then the weather changes to cloudy or rainy, we say that there is a change in the weather.'},
|
|
403
|
-
{'text': '<human>: Q: What happens when winter arrives? A: Autumn leaves! Why is this joke funny?\nRosey: In the northern hemisphere winter officially starts on December 21 (winter solstice, astronomical winter), but for many places in the north, it is already cold. Did you know that there is more than one way to mark the start of winter? Meteorologists, people who study the weather, talk about meteorological winter which starts on December 1. When talking about seasons, winter comes after fall; fall is also known as autumn.'},
|
|
404
|
-
{'text': '<human>: Q: Where do roses sleep? A: In a flower bed! Why is this joke funny?\nRosey: According to many surveys, roses are one of the most popular flowers. Although they have thorns, some people say that they are not that hard to grow.'}]
|
|
405
|
-
```
|
|
401
|
+
This will download the dataset to a file called `antihallucination.jsonl`. Below are two examples of lines you will find in this `.jsonl` file.
|
|
406
402
|
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
together.Files.save_jsonl(together.jokes_list, "jokes.jsonl")
|
|
403
|
+
```Text JSONL
|
|
404
|
+
{"text": "<truth>Wilhelm Windelband (May 11, 1848 - October 22, 1915) was a German philosopher of the Baden School. Windelband is now mainly remembered for the terms \"nomothetic\" and \"idiographic\", which he introduced. These have currency in psychology and other areas, though not necessarily in line with his original meanings. Windelband was a Neo-Kantian who protested other Neo-Kantians of his time and maintained that \"to understand Kant rightly means to go beyond him\". Against his positivist contemporaries, Windelband argued that philosophy should engage in humanistic dialogue with the natural sciences rather than uncritically appropriating its methodologies. His interests in psychology and cultural sciences represented an opposition to psychologism and historicism schools by a critical philosophic system. Windelband relied in his effort to reach beyond Kant on such philosophers as Georg Wilhelm Friedrich Hegel, Johann Friedrich Herbart, and Hermann Lotze. Closely associated with Windelband was Heinrich Rickert. Windelband's disciples were not only noted philosophers, but sociologists like Max Weber and theologians like Ernst Troeltsch and Albert Schweitzer.<generated>Wilhelm Windelband (15 March 1848 – 18 September 1915) was a German philosopher of the late 19th and early 20th centuries. He is now remembered mainly for the terms \"nomothetic\" and \"idiographic,\" which he introduced. He also wrote on history, psychology, the philosophy of religion, values, and other topics. He was a neo-Kantian who protested other neo-Kantians of his time and maintained a critical position towards psychologism. Windelband is known as one of the founders of the \"Baden School\" of neo-Kantianism. He was a student of Kuno Fischer and Franz Brentano. His students included Edmund Husserl, Adolf Reinach, Carl Stumpf, and Richard von Mises.<eval>Wilhelm Windelband (15 March 1848 – 18 September 1915) was a German philosopher of the late 19th and early 20th centuries.<minor_inaccurate>He is now remembered mainly for the terms \"nomothetic\" and \"idiographic,\" which he introduced.<accurate>He also wrote on history, psychology, the philosophy of religion, values, and other topics.<accurate>He was a neo-Kantian who protested other neo-Kantians of his time and maintained a critical position towards psychologism.<accurate>Windelband is known as one of the founders of the \"Baden School\" of neo-Kantianism.<accurate>He was a student of Kuno Fischer and Franz Brentano.<major_inaccurate>His students included Edmund Husserl, Adolf Reinach, Carl Stumpf, and Richard von Mises.<major_inaccurate><stop>"}
|
|
405
|
+
{"text": "<truth>Admiral of the Fleet Matthew Aylmer, 1st Baron Aylmer (ca. 1650 - 18 August 1720) was a Royal Navy officer. He was one of the captains who sent a letter to Prince William of Orange, who had just landed at Torbay, assuring the Prince of the captains' support; the Prince's response ultimately led to the Royal Navy switching allegiance to the Prince and the Glorious Revolution of November 1688. Aylmer saw action at the Battle of Bantry Bay in May 1689, at the Battle of Beachy Head in July 1690, and again at the Battle of Barfleur in May 1692 during the Nine Years' War. Aylmer became Commander-in-Chief of the Navy on 12 November 1709. However, when Aylmer met a French squadron and convoy, he was only able to capture one merchantman and the 56-gun \"Superbe\": the new Harley ministry used this failure as an excuse to remove him as Commander-in-Chief and did so a few months later. Following the accession of George I and the appointment of the Townshend ministry, Aylmer was reappointed Commander-in-Chief on 5 November 1714. He was also appointed Governor of Greenwich Hospital: in this post he founded the Royal Hospital School for the sons of seamen.<generated>Matthew Aylmer, 1st Baron Aylmer (1708–1794) was an Irish soldier and colonial administrator. He was born in Dublin, the son of a barrister, and was educated at Trinity College, Dublin. He joined the British Army in 1727 and served in the War of the Austrian Succession and the Seven Years' War. He was promoted to lieutenant-general in 1772 and was appointed Governor of Gibraltar in 1775. He was made a baron in 1782 and was appointed Commander-in-Chief of the British forces in North America in 1783. He was recalled in 1790 and died in London in 1794. He was buried in Westminster Abbey.<eval>Matthew Aylmer, 1st Baron Aylmer (1708–1794) was an Irish soldier and colonial administrator.<minor_inaccurate>He was born in Dublin, the son of a barrister, and was educated at Trinity College, Dublin.<minor_inaccurate>He joined the British Army in 1727 and served in the War of the Austrian Succession and the Seven Years' War.<minor_inaccurate>He was promoted to lieutenant-general in 1772 and was appointed Governor of Gibraltar in 1775.<major_inaccurate>He was made a baron in 1782 and was appointed Commander-in-Chief of the British forces in North America in 1783.<major_inaccurate>He was recalled in 1790 and died in London in 1794.<major_inaccurate>He was buried in Westminster Abbey.<major_inaccurate><stop>"}
|
|
411
406
|
```
|
|
412
407
|
|
|
408
|
+
This dataset teaches your model a special task using special sequences not found elsewhere. It teaches your model how to check another model's generated text against a ground truth and annotate the generated text for hallucinations. We made up special sequences `<truth>`, `<generated>`, `<eval>` and `<stop>` in order to do this. Read more about how to come up with your own special sequences [here](https://docs.together.ai/docs/fine-tuning-task-specific-sequences).
|
|
409
|
+
|
|
413
410
|
Use `together.Files.check` to check if your jsonl file has the correct format. Also take a look at it with the editor of your choice.
|
|
414
411
|
|
|
415
412
|
```python
|
|
416
|
-
resp = together.Files.check(file="
|
|
413
|
+
resp = together.Files.check(file="antihallucination.jsonl")
|
|
417
414
|
print(resp)
|
|
418
415
|
```
|
|
419
416
|
|
|
420
417
|
If the file format is correct, the `is_check_passed` field will be True
|
|
421
418
|
|
|
422
419
|
```
|
|
423
|
-
{'is_check_passed': True,
|
|
420
|
+
{'is_check_passed': True,
|
|
421
|
+
'model_special_tokens': 'we are not yet checking end of sentence tokens for this model',
|
|
422
|
+
'file_present': 'File found',
|
|
423
|
+
'file_size': 'File size 0.001 GB',
|
|
424
|
+
'num_samples': 238}
|
|
424
425
|
```
|
|
425
426
|
|
|
426
427
|
To check if your data contains `model_special_tokens` (we are still expanding this to include more models and tokens) use:
|
|
@@ -450,7 +451,6 @@ print(resp)
|
|
|
450
451
|
'resulted in only 10 samples. Our minimum is 100 samples. ',
|
|
451
452
|
'model_special_tokens': 'we are not yet checking end of sentence tokens '
|
|
452
453
|
'for this model',
|
|
453
|
-
'num_samples_w_eos_token': 0,
|
|
454
454
|
'text_field': 'No "text" field was found on line 7 of the the input '
|
|
455
455
|
'file.Expected format: {"text":"my sample string"}.see '
|
|
456
456
|
'https://docs.together.ai/docs/fine-tuning for more '
|
|
@@ -468,13 +468,21 @@ The checker will look at the jsonl file to see if:
|
|
|
468
468
|
In the `resp`, we will report the first line where the formatting error occurs, print the line, and the data format documentation on our website. Next lets upload a good file:
|
|
469
469
|
|
|
470
470
|
```python
|
|
471
|
-
together.Files.upload(file="
|
|
471
|
+
resp = together.Files.upload(file="antihallucination.jsonl")
|
|
472
|
+
file_id = resp["id"]
|
|
472
473
|
```
|
|
473
474
|
|
|
474
475
|
You will get back the file `id` of the file you just uploaded
|
|
475
476
|
|
|
476
477
|
```
|
|
477
|
-
{'filename': '
|
|
478
|
+
{'filename': 'antihallucination.jsonl',
|
|
479
|
+
'id': 'file-33ecca00-17ea-4968-ada2-9f82ef2f4cb8',
|
|
480
|
+
'object': 'file',
|
|
481
|
+
'report_dict': {'is_check_passed': True,
|
|
482
|
+
'model_special_tokens': 'we are not yet checking end of sentence tokens for this model',
|
|
483
|
+
'file_present': 'File found',
|
|
484
|
+
'file_size': 'File size 0.001 GB',
|
|
485
|
+
'num_samples': 238}}
|
|
478
486
|
```
|
|
479
487
|
|
|
480
488
|
You will get back the file `id` of the file you just uploaded, but if you forget it, you can get the `id`'s of all the files you have uploaded using `together.Files.list()`. You'll need these `id`'s that start with `file-960be810-4d....` in order to start a fine-tuning job
|
|
@@ -509,7 +517,7 @@ Run and manage your fine-tuning jobs, enabling you to tune all model layers, con
|
|
|
509
517
|
|
|
510
518
|
Refer to the [Fine-tuning docs](https://docs.together.ai/docs/python-fine-tuning) on how to get started.
|
|
511
519
|
|
|
512
|
-
Now that you have a valid file uploaded to together, you can finetune any of the models listed [here](https://docs.together.ai/docs/models-fine-tuning)
|
|
520
|
+
Now that you have a valid file uploaded to together, you can finetune any of the models listed [here](https://docs.together.ai/docs/models-fine-tuning) using `together.Finetune.create`
|
|
513
521
|
|
|
514
522
|
```python
|
|
515
523
|
resp = together.Finetune.create(
|
|
@@ -527,6 +535,25 @@ fine_tune_id = resp['id']
|
|
|
527
535
|
print(resp)
|
|
528
536
|
```
|
|
529
537
|
|
|
538
|
+
Unless you set `confirm_inputs=False` in `together.Finetune.create`, or `--quiet` in the CLI, there will be a confirmation step to make sure you are aware of any defaults or arguments that needed to be reset from their original inputs for this specific finetune job. Type `y` then `Enter` to submit the job, or anything else to abort.
|
|
539
|
+
|
|
540
|
+
```
|
|
541
|
+
10-02-2023 11:14:27 - together.finetune - WARNING - Batch size must be 144 for togethercomputer/llama-2-70b-chat model. Setting batch size to 144 (finetune.py:114)
|
|
542
|
+
Note: Some hyperparameters may have been adjusted with their minimum/maximum values for a given model.
|
|
543
|
+
|
|
544
|
+
Job creation details:
|
|
545
|
+
{ 'batch_size': 144,
|
|
546
|
+
'learning_rate': 1e-05,
|
|
547
|
+
'model': 'togethercomputer/llama-2-70b-chat',
|
|
548
|
+
'n_checkpoints': 1,
|
|
549
|
+
'n_epochs': 4,
|
|
550
|
+
'suffix': None,
|
|
551
|
+
'training_file': 'file-33ecca00-17ea-4968-ada2-9f82ef2f4cb8',
|
|
552
|
+
'wandb_key': 'xxxx'}
|
|
553
|
+
|
|
554
|
+
Do you want to submit the job? [y/N]
|
|
555
|
+
```
|
|
556
|
+
|
|
530
557
|
The response `resp` has alot of information for you that you can retrieve later with `together.Finetune.retrieve` using the `fine_tune_id` for this job. You can find this `fine_tune_id` in `resp['id']` and use it to check in on how your finetune job is doing.
|
|
531
558
|
|
|
532
559
|
```python
|
|
@@ -565,6 +592,69 @@ together.Models.ready("carlton/ft-dd93c727-f35e-41c2-a370-7d55b54128fa-2023-08-1
|
|
|
565
592
|
{'ready': 'model is ready for start, status code:1'}
|
|
566
593
|
```
|
|
567
594
|
|
|
595
|
+
### Using a Downloaded Model
|
|
596
|
+
|
|
597
|
+
The model will download as a `tar.zst` file
|
|
598
|
+
|
|
599
|
+
```python
|
|
600
|
+
together.Finetune.download(
|
|
601
|
+
fine_tune_id="ft-eb167402-98ed-4ac5-b6f5-8140c4ba146e",
|
|
602
|
+
output = "my-model/model.tar.zst"
|
|
603
|
+
)
|
|
604
|
+
```
|
|
605
|
+
|
|
606
|
+
To uncompress this filetype on Mac you need to install zstd.
|
|
607
|
+
|
|
608
|
+
```
|
|
609
|
+
brew install zstd
|
|
610
|
+
cd my-model
|
|
611
|
+
zstd -d model.tar.zst
|
|
612
|
+
tar -xvf model.tar
|
|
613
|
+
cd ..
|
|
614
|
+
```
|
|
615
|
+
|
|
616
|
+
Within the folder that you uncompress the file, you will find a set of files like this:
|
|
617
|
+
`ls my-model`
|
|
618
|
+
|
|
619
|
+
```
|
|
620
|
+
tokenizer_config.json
|
|
621
|
+
special_tokens_map.json
|
|
622
|
+
pytorch_model.bin
|
|
623
|
+
generation_config.json
|
|
624
|
+
tokenizer.json
|
|
625
|
+
config.json
|
|
626
|
+
```
|
|
627
|
+
|
|
628
|
+
Use the folder path that contains these `.bin` and `.json` files to load your model
|
|
629
|
+
|
|
630
|
+
```python
|
|
631
|
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
632
|
+
import torch
|
|
633
|
+
|
|
634
|
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
|
635
|
+
|
|
636
|
+
tokenizer = AutoTokenizer.from_pretrained("./my-model")
|
|
637
|
+
|
|
638
|
+
model = AutoModelForCausalLM.from_pretrained(
|
|
639
|
+
"./my-model",
|
|
640
|
+
trust_remote_code=True,
|
|
641
|
+
).to(device)
|
|
642
|
+
|
|
643
|
+
input_context = "Space Robots are"
|
|
644
|
+
input_ids = tokenizer.encode(input_context, return_tensors="pt")
|
|
645
|
+
output = model.generate(input_ids.to(device), max_length=128, temperature=0.7).cpu()
|
|
646
|
+
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
|
647
|
+
print(output_text)
|
|
648
|
+
```
|
|
649
|
+
|
|
650
|
+
```
|
|
651
|
+
Space Robots are a great way to get your kids interested in science. After all, they are the future!
|
|
652
|
+
```
|
|
653
|
+
|
|
654
|
+
## Colab Tutorial
|
|
655
|
+
|
|
656
|
+
Follow along in our Colab (Google Colaboratory) Notebook Tutorial [Example Finetuning Project](https://colab.research.google.com/drive/11DwtftycpDSgp3Z1vnV-Cy68zvkGZL4K?usp=sharing).
|
|
657
|
+
|
|
568
658
|
## Chat
|
|
569
659
|
|
|
570
660
|
The `chat` command is a CLI-based chat application that can be used for back-and-forth conversations with models in a pre-defined format.
|
|
@@ -165,37 +165,36 @@ Refer to the [Files docs](https://docs.together.ai/docs/python-files) on the cor
|
|
|
165
165
|
|
|
166
166
|
Files uploaded for training, fine-tuning and validation must be in [jsonlines](https://jsonlines.org/) format.
|
|
167
167
|
|
|
168
|
-
|
|
168
|
+
In the example below we provide a link for you to download a jsonl file locally that serves as an example of the correct file formatting. In your terminal or commandline, navigate to the folder you want to download the file, paste the command below and hit enter.
|
|
169
169
|
|
|
170
|
-
```
|
|
171
|
-
|
|
170
|
+
```console
|
|
171
|
+
curl https://huggingface.co/datasets/clam004/antihallucination_dataset/resolve/main/antihallucination.jsonl -o antihallucination.jsonl
|
|
172
172
|
```
|
|
173
173
|
|
|
174
|
-
|
|
175
|
-
[{'text': '<human>: Why is this joke funny? Q: What does a computer do when it is tired? A: It crashes!\nRosey: Did your computer ever stop working (freeze) or turn itself off when you wanted to use it? If it did, you can say your computer crashed. Computer crashes are frustrating because you cannot get your work done, and sometimes you lose the work you finished most recently, unless you saved it. Thankfully, they do not happen too often.'},
|
|
176
|
-
{'text': "<human>: I'm going to give you a joke. After I give you the joke, please explain it: Q: Where can you go to practice math on New Year’s Eve? A: Times Square!\nRosey: To get better at math, people often people often practice addition (+) or subtraction (-) problems; then they work on multiplication (x) and division (÷), and all the way through calculus and beyond. Practicing multiplication is also known as practicing your times tables. You can say what is seven times nine (7 x 9), or you can can say what is seven multiplied by nine. They mean the same thing, times or multiplied by."},
|
|
177
|
-
{'text': '<human>: Explain this joke: Q: When does it rain money? A: When there is change in the weather!\nRosey: “The rain in Spain falls mainly on the plain,” according to the song from My Fair Lady. In Peru, they just wish it would rain! But nowhere does it rain money. Rain is water that falls from the sky as part of the evaporation/water cycle. When it is sunny outside, and then the weather changes to cloudy or rainy, we say that there is a change in the weather.'},
|
|
178
|
-
{'text': '<human>: Q: What happens when winter arrives? A: Autumn leaves! Why is this joke funny?\nRosey: In the northern hemisphere winter officially starts on December 21 (winter solstice, astronomical winter), but for many places in the north, it is already cold. Did you know that there is more than one way to mark the start of winter? Meteorologists, people who study the weather, talk about meteorological winter which starts on December 1. When talking about seasons, winter comes after fall; fall is also known as autumn.'},
|
|
179
|
-
{'text': '<human>: Q: Where do roses sleep? A: In a flower bed! Why is this joke funny?\nRosey: According to many surveys, roses are one of the most popular flowers. Although they have thorns, some people say that they are not that hard to grow.'}]
|
|
180
|
-
```
|
|
174
|
+
This will download the dataset to a file called `antihallucination.jsonl`. Below are two examples of lines you will find in this `.jsonl` file.
|
|
181
175
|
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
together.Files.save_jsonl(together.jokes_list, "jokes.jsonl")
|
|
176
|
+
```Text JSONL
|
|
177
|
+
{"text": "<truth>Wilhelm Windelband (May 11, 1848 - October 22, 1915) was a German philosopher of the Baden School. Windelband is now mainly remembered for the terms \"nomothetic\" and \"idiographic\", which he introduced. These have currency in psychology and other areas, though not necessarily in line with his original meanings. Windelband was a Neo-Kantian who protested other Neo-Kantians of his time and maintained that \"to understand Kant rightly means to go beyond him\". Against his positivist contemporaries, Windelband argued that philosophy should engage in humanistic dialogue with the natural sciences rather than uncritically appropriating its methodologies. His interests in psychology and cultural sciences represented an opposition to psychologism and historicism schools by a critical philosophic system. Windelband relied in his effort to reach beyond Kant on such philosophers as Georg Wilhelm Friedrich Hegel, Johann Friedrich Herbart, and Hermann Lotze. Closely associated with Windelband was Heinrich Rickert. Windelband's disciples were not only noted philosophers, but sociologists like Max Weber and theologians like Ernst Troeltsch and Albert Schweitzer.<generated>Wilhelm Windelband (15 March 1848 – 18 September 1915) was a German philosopher of the late 19th and early 20th centuries. He is now remembered mainly for the terms \"nomothetic\" and \"idiographic,\" which he introduced. He also wrote on history, psychology, the philosophy of religion, values, and other topics. He was a neo-Kantian who protested other neo-Kantians of his time and maintained a critical position towards psychologism. Windelband is known as one of the founders of the \"Baden School\" of neo-Kantianism. He was a student of Kuno Fischer and Franz Brentano. His students included Edmund Husserl, Adolf Reinach, Carl Stumpf, and Richard von Mises.<eval>Wilhelm Windelband (15 March 1848 – 18 September 1915) was a German philosopher of the late 19th and early 20th centuries.<minor_inaccurate>He is now remembered mainly for the terms \"nomothetic\" and \"idiographic,\" which he introduced.<accurate>He also wrote on history, psychology, the philosophy of religion, values, and other topics.<accurate>He was a neo-Kantian who protested other neo-Kantians of his time and maintained a critical position towards psychologism.<accurate>Windelband is known as one of the founders of the \"Baden School\" of neo-Kantianism.<accurate>He was a student of Kuno Fischer and Franz Brentano.<major_inaccurate>His students included Edmund Husserl, Adolf Reinach, Carl Stumpf, and Richard von Mises.<major_inaccurate><stop>"}
|
|
178
|
+
{"text": "<truth>Admiral of the Fleet Matthew Aylmer, 1st Baron Aylmer (ca. 1650 - 18 August 1720) was a Royal Navy officer. He was one of the captains who sent a letter to Prince William of Orange, who had just landed at Torbay, assuring the Prince of the captains' support; the Prince's response ultimately led to the Royal Navy switching allegiance to the Prince and the Glorious Revolution of November 1688. Aylmer saw action at the Battle of Bantry Bay in May 1689, at the Battle of Beachy Head in July 1690, and again at the Battle of Barfleur in May 1692 during the Nine Years' War. Aylmer became Commander-in-Chief of the Navy on 12 November 1709. However, when Aylmer met a French squadron and convoy, he was only able to capture one merchantman and the 56-gun \"Superbe\": the new Harley ministry used this failure as an excuse to remove him as Commander-in-Chief and did so a few months later. Following the accession of George I and the appointment of the Townshend ministry, Aylmer was reappointed Commander-in-Chief on 5 November 1714. He was also appointed Governor of Greenwich Hospital: in this post he founded the Royal Hospital School for the sons of seamen.<generated>Matthew Aylmer, 1st Baron Aylmer (1708–1794) was an Irish soldier and colonial administrator. He was born in Dublin, the son of a barrister, and was educated at Trinity College, Dublin. He joined the British Army in 1727 and served in the War of the Austrian Succession and the Seven Years' War. He was promoted to lieutenant-general in 1772 and was appointed Governor of Gibraltar in 1775. He was made a baron in 1782 and was appointed Commander-in-Chief of the British forces in North America in 1783. He was recalled in 1790 and died in London in 1794. He was buried in Westminster Abbey.<eval>Matthew Aylmer, 1st Baron Aylmer (1708–1794) was an Irish soldier and colonial administrator.<minor_inaccurate>He was born in Dublin, the son of a barrister, and was educated at Trinity College, Dublin.<minor_inaccurate>He joined the British Army in 1727 and served in the War of the Austrian Succession and the Seven Years' War.<minor_inaccurate>He was promoted to lieutenant-general in 1772 and was appointed Governor of Gibraltar in 1775.<major_inaccurate>He was made a baron in 1782 and was appointed Commander-in-Chief of the British forces in North America in 1783.<major_inaccurate>He was recalled in 1790 and died in London in 1794.<major_inaccurate>He was buried in Westminster Abbey.<major_inaccurate><stop>"}
|
|
186
179
|
```
|
|
187
180
|
|
|
181
|
+
This dataset teaches your model a special task using special sequences not found elsewhere. It teaches your model how to check another model's generated text against a ground truth and annotate the generated text for hallucinations. We made up special sequences `<truth>`, `<generated>`, `<eval>` and `<stop>` in order to do this. Read more about how to come up with your own special sequences [here](https://docs.together.ai/docs/fine-tuning-task-specific-sequences).
|
|
182
|
+
|
|
188
183
|
Use `together.Files.check` to check if your jsonl file has the correct format. Also take a look at it with the editor of your choice.
|
|
189
184
|
|
|
190
185
|
```python
|
|
191
|
-
resp = together.Files.check(file="
|
|
186
|
+
resp = together.Files.check(file="antihallucination.jsonl")
|
|
192
187
|
print(resp)
|
|
193
188
|
```
|
|
194
189
|
|
|
195
190
|
If the file format is correct, the `is_check_passed` field will be True
|
|
196
191
|
|
|
197
192
|
```
|
|
198
|
-
{'is_check_passed': True,
|
|
193
|
+
{'is_check_passed': True,
|
|
194
|
+
'model_special_tokens': 'we are not yet checking end of sentence tokens for this model',
|
|
195
|
+
'file_present': 'File found',
|
|
196
|
+
'file_size': 'File size 0.001 GB',
|
|
197
|
+
'num_samples': 238}
|
|
199
198
|
```
|
|
200
199
|
|
|
201
200
|
To check if your data contains `model_special_tokens` (we are still expanding this to include more models and tokens) use:
|
|
@@ -225,7 +224,6 @@ print(resp)
|
|
|
225
224
|
'resulted in only 10 samples. Our minimum is 100 samples. ',
|
|
226
225
|
'model_special_tokens': 'we are not yet checking end of sentence tokens '
|
|
227
226
|
'for this model',
|
|
228
|
-
'num_samples_w_eos_token': 0,
|
|
229
227
|
'text_field': 'No "text" field was found on line 7 of the the input '
|
|
230
228
|
'file.Expected format: {"text":"my sample string"}.see '
|
|
231
229
|
'https://docs.together.ai/docs/fine-tuning for more '
|
|
@@ -243,13 +241,21 @@ The checker will look at the jsonl file to see if:
|
|
|
243
241
|
In the `resp`, we will report the first line where the formatting error occurs, print the line, and the data format documentation on our website. Next lets upload a good file:
|
|
244
242
|
|
|
245
243
|
```python
|
|
246
|
-
together.Files.upload(file="
|
|
244
|
+
resp = together.Files.upload(file="antihallucination.jsonl")
|
|
245
|
+
file_id = resp["id"]
|
|
247
246
|
```
|
|
248
247
|
|
|
249
248
|
You will get back the file `id` of the file you just uploaded
|
|
250
249
|
|
|
251
250
|
```
|
|
252
|
-
{'filename': '
|
|
251
|
+
{'filename': 'antihallucination.jsonl',
|
|
252
|
+
'id': 'file-33ecca00-17ea-4968-ada2-9f82ef2f4cb8',
|
|
253
|
+
'object': 'file',
|
|
254
|
+
'report_dict': {'is_check_passed': True,
|
|
255
|
+
'model_special_tokens': 'we are not yet checking end of sentence tokens for this model',
|
|
256
|
+
'file_present': 'File found',
|
|
257
|
+
'file_size': 'File size 0.001 GB',
|
|
258
|
+
'num_samples': 238}}
|
|
253
259
|
```
|
|
254
260
|
|
|
255
261
|
You will get back the file `id` of the file you just uploaded, but if you forget it, you can get the `id`'s of all the files you have uploaded using `together.Files.list()`. You'll need these `id`'s that start with `file-960be810-4d....` in order to start a fine-tuning job
|
|
@@ -284,7 +290,7 @@ Run and manage your fine-tuning jobs, enabling you to tune all model layers, con
|
|
|
284
290
|
|
|
285
291
|
Refer to the [Fine-tuning docs](https://docs.together.ai/docs/python-fine-tuning) on how to get started.
|
|
286
292
|
|
|
287
|
-
Now that you have a valid file uploaded to together, you can finetune any of the models listed [here](https://docs.together.ai/docs/models-fine-tuning)
|
|
293
|
+
Now that you have a valid file uploaded to together, you can finetune any of the models listed [here](https://docs.together.ai/docs/models-fine-tuning) using `together.Finetune.create`
|
|
288
294
|
|
|
289
295
|
```python
|
|
290
296
|
resp = together.Finetune.create(
|
|
@@ -302,6 +308,25 @@ fine_tune_id = resp['id']
|
|
|
302
308
|
print(resp)
|
|
303
309
|
```
|
|
304
310
|
|
|
311
|
+
Unless you set `confirm_inputs=False` in `together.Finetune.create`, or `--quiet` in the CLI, there will be a confirmation step to make sure you are aware of any defaults or arguments that needed to be reset from their original inputs for this specific finetune job. Type `y` then `Enter` to submit the job, or anything else to abort.
|
|
312
|
+
|
|
313
|
+
```
|
|
314
|
+
10-02-2023 11:14:27 - together.finetune - WARNING - Batch size must be 144 for togethercomputer/llama-2-70b-chat model. Setting batch size to 144 (finetune.py:114)
|
|
315
|
+
Note: Some hyperparameters may have been adjusted with their minimum/maximum values for a given model.
|
|
316
|
+
|
|
317
|
+
Job creation details:
|
|
318
|
+
{ 'batch_size': 144,
|
|
319
|
+
'learning_rate': 1e-05,
|
|
320
|
+
'model': 'togethercomputer/llama-2-70b-chat',
|
|
321
|
+
'n_checkpoints': 1,
|
|
322
|
+
'n_epochs': 4,
|
|
323
|
+
'suffix': None,
|
|
324
|
+
'training_file': 'file-33ecca00-17ea-4968-ada2-9f82ef2f4cb8',
|
|
325
|
+
'wandb_key': 'xxxx'}
|
|
326
|
+
|
|
327
|
+
Do you want to submit the job? [y/N]
|
|
328
|
+
```
|
|
329
|
+
|
|
305
330
|
The response `resp` has alot of information for you that you can retrieve later with `together.Finetune.retrieve` using the `fine_tune_id` for this job. You can find this `fine_tune_id` in `resp['id']` and use it to check in on how your finetune job is doing.
|
|
306
331
|
|
|
307
332
|
```python
|
|
@@ -340,6 +365,69 @@ together.Models.ready("carlton/ft-dd93c727-f35e-41c2-a370-7d55b54128fa-2023-08-1
|
|
|
340
365
|
{'ready': 'model is ready for start, status code:1'}
|
|
341
366
|
```
|
|
342
367
|
|
|
368
|
+
### Using a Downloaded Model
|
|
369
|
+
|
|
370
|
+
The model will download as a `tar.zst` file
|
|
371
|
+
|
|
372
|
+
```python
|
|
373
|
+
together.Finetune.download(
|
|
374
|
+
fine_tune_id="ft-eb167402-98ed-4ac5-b6f5-8140c4ba146e",
|
|
375
|
+
output = "my-model/model.tar.zst"
|
|
376
|
+
)
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
To uncompress this filetype on Mac you need to install zstd.
|
|
380
|
+
|
|
381
|
+
```
|
|
382
|
+
brew install zstd
|
|
383
|
+
cd my-model
|
|
384
|
+
zstd -d model.tar.zst
|
|
385
|
+
tar -xvf model.tar
|
|
386
|
+
cd ..
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
Within the folder that you uncompress the file, you will find a set of files like this:
|
|
390
|
+
`ls my-model`
|
|
391
|
+
|
|
392
|
+
```
|
|
393
|
+
tokenizer_config.json
|
|
394
|
+
special_tokens_map.json
|
|
395
|
+
pytorch_model.bin
|
|
396
|
+
generation_config.json
|
|
397
|
+
tokenizer.json
|
|
398
|
+
config.json
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
Use the folder path that contains these `.bin` and `.json` files to load your model
|
|
402
|
+
|
|
403
|
+
```python
|
|
404
|
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
405
|
+
import torch
|
|
406
|
+
|
|
407
|
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
|
408
|
+
|
|
409
|
+
tokenizer = AutoTokenizer.from_pretrained("./my-model")
|
|
410
|
+
|
|
411
|
+
model = AutoModelForCausalLM.from_pretrained(
|
|
412
|
+
"./my-model",
|
|
413
|
+
trust_remote_code=True,
|
|
414
|
+
).to(device)
|
|
415
|
+
|
|
416
|
+
input_context = "Space Robots are"
|
|
417
|
+
input_ids = tokenizer.encode(input_context, return_tensors="pt")
|
|
418
|
+
output = model.generate(input_ids.to(device), max_length=128, temperature=0.7).cpu()
|
|
419
|
+
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
|
420
|
+
print(output_text)
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
```
|
|
424
|
+
Space Robots are a great way to get your kids interested in science. After all, they are the future!
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
## Colab Tutorial
|
|
428
|
+
|
|
429
|
+
Follow along in our Colab (Google Colaboratory) Notebook Tutorial [Example Finetuning Project](https://colab.research.google.com/drive/11DwtftycpDSgp3Z1vnV-Cy68zvkGZL4K?usp=sharing).
|
|
430
|
+
|
|
343
431
|
## Chat
|
|
344
432
|
|
|
345
433
|
The `chat` command is a CLI-based chat application that can be used for back-and-forth conversations with models in a pre-defined format.
|