Skip to content

Commit 81d7b15

Browse files
authored
Many minor updates to README (#883)
1 parent 98dc0b3 commit 81d7b15

File tree

2 files changed

+68
-21
lines changed

2 files changed

+68
-21
lines changed

README.md

Lines changed: 67 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ and slow down your queries to accommodate.
271271
You can also specify them manually with any rate limit string that matches the specification in the [limits](https://limits.readthedocs.io/en/stable/quickstart.html#rate-limit-string-notation) module:
272272

273273
```bash
274-
pqa --summary_llm_config '{"rate_limit": {"gpt-4o-2024-08-06": "30000 per 1 minute"}}' ask 'Are there nm scale features in thermoelectric materials?'
274+
pqa --summary_llm_config '{"rate_limit": {"gpt-4o-2024-11-20": "30000 per 1 minute"}}' ask 'Are there nm scale features in thermoelectric materials?'
275275
```
276276

277277
Or by adding into a `Settings` object, if calling imperatively:
@@ -282,8 +282,8 @@ from paperqa import Settings, ask
282282
answer_response = ask(
283283
"What manufacturing challenges are unique to bispecific antibodies?",
284284
settings=Settings(
285-
llm_config={"rate_limit": {"gpt-4o-2024-08-06": "30000 per 1 minute"}},
286-
summary_llm_config={"rate_limit": {"gpt-4o-2024-08-06": "30000 per 1 minute"}},
285+
llm_config={"rate_limit": {"gpt-4o-2024-11-20": "30000 per 1 minute"}},
286+
summary_llm_config={"rate_limit": {"gpt-4o-2024-11-20": "30000 per 1 minute"}},
287287
),
288288
)
289289
```
@@ -405,12 +405,13 @@ asyncio.run(main())
405405

406406
### Choosing Model
407407

408-
By default, PaperQA2 uses OpenAI's `gpt-4o-2024-08-06` model for:
409-
410-
- `summary_llm`: Re-ranking and summarizing evidence passages
411-
- `llm`: Generating the final answer
412-
- `agent_llm`: Making tool selection decisions
408+
By default, PaperQA2 uses OpenAI's `gpt-4o-2024-11-20` model for the
409+
`summary_llm`, `llm`, and `agent_llm`.
410+
Please see the [Settings Cheatsheet](#settings-cheatsheet)
411+
for more information on these settings.
413412

413+
We use the [`lmi`](https://github.com/Future-House/ldp/tree/main/packages/lmi) package for our LLM interface,
414+
which in turn uses `litellm` to support many LLM providers.
414415
You can adjust this easily to use any model supported by `litellm`:
415416

416417
```python
@@ -428,6 +429,7 @@ To use Claude, make sure you set the `ANTHROPIC_API_KEY`
428429

429430
```python
430431
from paperqa import Settings, ask
432+
from paperqa.settings import AgentSettings
431433

432434
answer_response = ask(
433435
"What manufacturing challenges are unique to bispecific antibodies?",
@@ -769,9 +771,9 @@ will return much faster than the first query and we'll be certain the authors ma
769771

770772
| Setting | Default | Description |
771773
| -------------------------------------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------------- |
772-
| `llm` | `"gpt-4o-2024-08-06"` | Default LLM for most things, including answers. Should be 'best' LLM. |
774+
| `llm` | `"gpt-4o-2024-11-20"` | Default LLM for most things, including answers. Should be 'best' LLM. |
773775
| `llm_config` | `None` | Optional configuration for `llm`. |
774-
| `summary_llm` | `"gpt-4o-2024-08-06"` | Default LLM for summaries and parsing citations. |
776+
| `summary_llm` | `"gpt-4o-2024-11-20"` | Default LLM for summaries and parsing citations. |
775777
| `summary_llm_config` | `None` | Optional configuration for `summary_llm`. |
776778
| `embedding` | `"text-embedding-3-small"` | Default embedding model for texts. |
777779
| `embedding_config` | `None` | Optional configuration for `embedding`. |
@@ -809,7 +811,7 @@ will return much faster than the first query and we'll be certain the authors ma
809811
| `prompt.summary_json_system` | `summary_json_system_prompt` | System prompt for JSON summaries. |
810812
| `prompt.context_outer` | `CONTEXT_OUTER_PROMPT` | Prompt for how to format all contexts in generate answer. |
811813
| `prompt.context_inner` | `CONTEXT_INNER_PROMPT` | Prompt for how to format a single context in generate answer. Must contain 'name' and 'text' variables. |
812-
| `agent.agent_llm` | `"gpt-4o-2024-08-06"` | Model to use for agent. |
814+
| `agent.agent_llm` | `"gpt-4o-2024-11-20"` | Model to use for agent making tool selections. |
813815
| `agent.agent_llm_config` | `None` | Optional configuration for `agent_llm`. |
814816
| `agent.agent_type` | `"ToolSelector"` | Type of agent to use. |
815817
| `agent.agent_config` | `None` | Optional kwarg for AGENT constructor. |
@@ -898,10 +900,19 @@ You can read more about the search syntax by typing `zotero.iterate?` in IPython
898900

899901
### Paper Scraper
900902

901-
If you want to search for papers outside of your own collection, I've found an unrelated project called [paper-scraper](https://github.com/blackadad/paper-scraper) that looks
903+
If you want to search for papers outside of your own collection, I've found an unrelated project called [`paper-scraper`](https://github.com/blackadad/paper-scraper) that looks
902904
like it might help. But beware, this project looks like it uses some scraping tools that may violate publisher's rights or be in a gray area of legality.
903905

906+
First, install `paper-scraper`:
907+
908+
```bash
909+
pip install git+https://github.com/blackadad/paper-scraper.git
910+
```
911+
912+
Then run with it:
913+
904914
```python
915+
import paperscraper
905916
from paperqa import Docs
906917

907918
keyword_search = "bispecific antibody manufacture"
@@ -924,6 +935,9 @@ print(session)
924935
To execute a function on each chunk of LLM completions, you need to provide a function that can be executed on each chunk. For example, to get a typewriter view of the completions, you can do:
925936

926937
```python
938+
from paperqa import Docs
939+
940+
927941
def typewriter(chunk: str) -> None:
928942
print(chunk, end="")
929943

@@ -1011,17 +1025,49 @@ with open("my_docs.pkl", "rb") as f:
10111025
## Reproduction
10121026

10131027
Contained in [docs/2024-10-16_litqa2-splits.json5](docs/2024-10-16_litqa2-splits.json5)
1014-
are the question IDs
1015-
(correspond with [LAB-Bench's LitQA2 question IDs](https://github.com/Future-House/LAB-Bench/blob/main/LitQA2/litqa-v2-public.jsonl))
1016-
used in the train and evaluation splits,
1017-
as well as paper DOIs used to build the train and evaluation splits' indexes.
1018-
The test split remains held out.
1019-
Example on how to use LitQA for evaluation can be found in [aviary.litqa](https://github.com/Future-House/aviary/tree/main/packages/litqa#running-litqa).
1028+
are the question IDs used in train, evaluation, and test splits,
1029+
as well as paper DOIs used to build the splits' indexes.
1030+
1031+
- Train and eval splits: question IDs come from
1032+
[LAB-Bench's LitQA2 question IDs](https://github.com/Future-House/LAB-Bench/blob/main/LitQA2/litqa-v2-public.jsonl).
1033+
- Test split: questions IDs come from
1034+
[aviary-paper-data's LitQA2 question IDs](https://huggingface.co/datasets/futurehouse/aviary-paper-data).
1035+
1036+
There are multiple papers slowly building PaperQA, shown below in [Citation](#citation).
1037+
To reproduce:
1038+
1039+
- `skarlinski2024language`: train and eval splits are applicable.
1040+
The test split remains held out.
1041+
- `narayanan2024aviarytraininglanguageagents`: train, eval, and test splits are applicable.
1042+
1043+
Example on how to use LitQA for evaluation can be found in
1044+
[aviary.litqa](https://github.com/Future-House/aviary/tree/main/packages/litqa#running-litqa).
10201045

10211046
## Citation
10221047

10231048
Please read and cite the following papers if you use this software:
10241049

1050+
```bibtex
1051+
@article{narayanan2024aviarytraininglanguageagents,
1052+
title = {Aviary: training language agents on challenging scientific tasks},
1053+
author = {
1054+
Siddharth Narayanan and
1055+
James D. Braza and
1056+
Ryan-Rhys Griffiths and
1057+
Manu Ponnapati and
1058+
Albert Bou and
1059+
Jon Laurent and
1060+
Ori Kabeli and
1061+
Geemi Wellawatte and
1062+
Sam Cox and
1063+
Samuel G. Rodriques and
1064+
Andrew D. White},
1065+
journal = {arXiv preprent arXiv:2412.21154},
1066+
year = {2024},
1067+
url = {https://doi.org/10.48550/arXiv.2412.21154},
1068+
}
1069+
```
1070+
10251071
```bibtex
10261072
@article{skarlinski2024language,
10271073
title = {Language agents achieve superhuman synthesis of scientific knowledge},
@@ -1035,8 +1081,8 @@ Please read and cite the following papers if you use this software:
10351081
Manvitha Ponnapati and
10361082
Samuel G. Rodriques and
10371083
Andrew D. White},
1038-
year = {2024},
10391084
journal = {arXiv preprent arXiv:2409.13740},
1085+
year = {2024},
10401086
url = {https://doi.org/10.48550/arXiv.2409.13740}
10411087
}
10421088
```
@@ -1052,6 +1098,7 @@ Please read and cite the following papers if you use this software:
10521098
Samuel G. Rodriques and
10531099
Andrew D. White},
10541100
journal = {arXiv preprint arXiv:2312.07559},
1055-
year = {2023}
1101+
year = {2023},
1102+
url = {https://doi.org/10.48550/arXiv.2312.07559}
10561103
}
10571104
```

paperqa/settings.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -450,7 +450,7 @@ class AgentSettings(BaseModel):
450450

451451
agent_llm: str = Field(
452452
default=CommonLLMNames.GPT_4O.value,
453-
description="Model to use for agent.",
453+
description="Model to use for agent making tool selections.",
454454
)
455455

456456
agent_llm_config: dict | None = Field(

0 commit comments

Comments
 (0)