Skip to content

Commit c0512db

Browse files
committed
Add a script to automatically add publications to website (#4)
Add a script which runs via the CI once a month. The script uses OpenAlex to search for all relevant articles. It does this by searching for the keywords "gysela", "gyselax" and "gyselalib" amongst articles published since a given date. The date is 2019-01-01 by default (which corresponds with the most recent article currently indexed), but it is usually equal to the date at which the workflow last ran successfully minus 2 months. It filters the articles found to: - Remove preprints - Keep articles with the keyword in the title or abstract - Keep articles where one of the authors is one of the people in https://github.com/gyselax/gyselax.github.io/tree/main/content/authors For each of the remaining articles a `cite.bib` file and an `index.md` file are created in an appropriately named sub-folder in https://github.com/gyselax/gyselax.github.io/tree/main/content/publication if any publications are added then a branch is created with these changes and a PR is opened. If PRs are merged regularly the same article should not appear in more than 1 PR. Once the PR has been created it can be pruned manually before merging if articles were inappropriately added.
1 parent 61bb73f commit c0512db

File tree

3 files changed

+353
-0
lines changed

3 files changed

+353
-0
lines changed
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
name: Update Publications
2+
3+
on:
4+
schedule:
5+
# Runs at 03:00 on the first day of each month
6+
- cron: '0 3 1 * *'
7+
workflow_dispatch: # allows manual trigger
8+
9+
jobs:
10+
update-publications:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Checkout repository
14+
uses: actions/checkout@v3
15+
with:
16+
fetch-depth: 0
17+
18+
- name: Set up Python
19+
uses: actions/setup-python@v4
20+
with:
21+
python-version: '3.11'
22+
- name: Install dependencies
23+
run: |
24+
python -m pip install --upgrade pip
25+
pip install requests pyyaml unidecode
26+
- name: Get last run date
27+
run: |
28+
last_run_iso=$(gh run list --workflow "Update Publications" --status success --limit 1 --json createdAt --jq '.[0].createdAt' 2>/dev/null || echo "")
29+
if [ -z "$last_run_iso" ]; then
30+
echo "No last run found"
31+
last_run_iso="2019-01-01T00:00:00Z" # fallback default
32+
fi
33+
# Remove 2 months to allow lots of time for indexing
34+
CHECK_FROM=$(date -u -d "$last_run_iso -2 months" +"%Y-%m-%d")
35+
echo "CHECK_FROM=$CHECK_FROM"
36+
echo "CHECK_FROM=$CHECK_FROM" >> $GITHUB_ENV
37+
- name: Create branch
38+
run: |
39+
branch_name="update-publications-$(date +'%Y%m%d')"
40+
echo "branch_name=${branch_name}" >> $GITHUB_ENV
41+
- name: Run publication update script
42+
run: python scripts/update_publications.py
43+
- name: Check for changes
44+
id: check_changes
45+
run: |
46+
NEW_FILES=$(git ls-files --other --exclude-standard content/publication)
47+
if [ -z "${NEW_FILES}" ]; then
48+
echo "No new publications found."
49+
echo "has_new=false" >> $GITHUB_OUTPUT
50+
else
51+
echo "has_new=true" >> $GITHUB_OUTPUT
52+
fi
53+
- name: Commit changes
54+
if: steps.check_new.outputs.has_new == 'true'
55+
run: |
56+
git checkout main
57+
git checkout -b $branch_name
58+
git add content/publication
59+
git commit -m "Automated update of publications" || echo "No changes to commit"
60+
- name: Push branch
61+
if: steps.check_new.outputs.has_new == 'true'
62+
run: git push origin HEAD
63+
env:
64+
GH_TOKEN: ${{ github.token }}
65+
- name: Create Pull Request
66+
if: steps.check_new.outputs.has_new == 'true'
67+
run: |
68+
branch_name=$(git rev-parse --abbrev-ref HEAD)
69+
gh pr create \
70+
--title "Update publications" \
71+
--body "Automated update of publications since ${CHECK_FROM}." \
72+
--base main \
73+
--head $branch_name
74+
env:
75+
GH_TOKEN: ${{ github.token }}

scripts/update_publications.py

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
import requests
2+
import os
3+
import re
4+
import datetime
5+
import yaml
6+
from pathlib import Path
7+
import unidecode
8+
9+
# === Config ===
10+
PROJECT_NAME = "gysela" # change to your project keyword
11+
AUTHOR_DIR = Path(__file__).parent.parent / "content" / "authors"
12+
PUBLICATION_DIR = Path(__file__).parent.parent / "content" / "publication"
13+
VENUE_ABBREVIATIONS_FILE = Path("venue_abbreviations.yml")
14+
CHECK_FROM=os.environ['CHECK_FROM']
15+
16+
existing_slugs = {p.stem for p in PUBLICATION_DIR.iterdir() if p.is_dir()}
17+
18+
# === Helpers ===
19+
def load_abbrev_map():
20+
if VENUE_ABBREVIATIONS_FILE.exists():
21+
with open(VENUE_ABBREVIATIONS_FILE) as f:
22+
return yaml.safe_load(f)
23+
return {}
24+
25+
def load_key_authors():
26+
key_authors = []
27+
for md_file in AUTHOR_DIR.rglob("*.md"):
28+
with open(md_file, encoding="utf-8") as f:
29+
content = f.read()
30+
if content.startswith("---"):
31+
front_matter = content.split("---", 2)[1]
32+
data = yaml.safe_load(front_matter)
33+
if "name" in data and "organizations" in data:
34+
orgs = [o["name"] for o in data.get("organizations", []) if "name" in o]
35+
key_authors.append({
36+
"name": " ".join(data["name"].split(" ")[1:]),
37+
"organizations": orgs
38+
})
39+
return key_authors
40+
41+
def load_known_dois():
42+
dois = set()
43+
for md_file in PUBLICATION_DIR.rglob("*.md"):
44+
with open(md_file, encoding="utf-8") as f:
45+
content = f.read()
46+
if content.startswith("---"):
47+
front_matter = content.split("---", 2)[1]
48+
data = yaml.safe_load(front_matter)
49+
if "doi" in data:
50+
dois.add(data["doi"])
51+
return dois
52+
53+
def get_first_author_surname(authorships):
54+
if authorships:
55+
first_author = authorships[0]["author"]["display_name"]
56+
surname = first_author.split()[-1]
57+
return unidecode.unidecode(surname).lower()
58+
return "unknown"
59+
60+
def get_all_authors(authorships):
61+
return " and ".join(a["author"]["display_name"] for a in authorships) if authorships else "Unknown"
62+
63+
def author_matches(work_authorships, key_authors):
64+
for a in work_authorships:
65+
author_name = a["raw_author_name"]
66+
institutions = [i["raw_affiliation_string"] for i in a["affiliations"]]
67+
for ka in key_authors:
68+
if (ka["name"].lower() in author_name.lower()) and \
69+
any(org.lower() in instit.lower() for org in ka["organizations"] for instit in institutions):
70+
return True
71+
return False
72+
73+
def make_slug(meta, abbrev_map):
74+
surname = get_first_author_surname(meta["authorships"])
75+
if meta["venue_full"] in abbrev_map:
76+
venue = abbrev_map[meta["venue_full"]]['slug']
77+
else:
78+
venue = meta["venue_full"]
79+
year = str(meta["year"])
80+
slug_base = f"{surname}-{venue}-{year}"
81+
slug = slug_base
82+
i = 2
83+
while slug in existing_slugs:
84+
slug = f'{slug_base}_{i}'
85+
i+=1
86+
existing_slugs.add(slug)
87+
return slug
88+
89+
def extract_metadata(work, abbrev_map):
90+
"""Extract shared metadata for front_matter and bibtex."""
91+
title = work.get("title", "")
92+
authorships = work.get("authorships", [])
93+
authors_list = [a["author"]["display_name"] for a in authorships]
94+
authors_bibtex = get_all_authors(authorships)
95+
surname = get_first_author_surname(authorships)
96+
venue_host = work.get("host_venue", {}).get("display_name")
97+
venue_primary = work.get("primary_location", {})
98+
if venue_primary:
99+
venue_primary = venue_primary.get("source", {})
100+
if venue_primary:
101+
venue_primary = venue_primary.get("display_name")
102+
venue_full = venue_primary or venue_host or ""
103+
year = work.get("publication_year", "")
104+
doi = work.get("doi")
105+
url = f"https://doi.org/{doi}" if doi else None
106+
pub_date = work.get("publication_date", "1900-01-01")
107+
biblio = work.get("biblio", {})
108+
volume = biblio.get("volume")
109+
issue = biblio.get("issue")
110+
first_page = biblio.get("first_page")
111+
last_page = biblio.get("last_page")
112+
pages = f"{first_page}--{last_page}" if first_page and last_page else None
113+
abstract = work.get("abstract_inverted_index") and " ".join(work["abstract_inverted_index"].keys()) or ""
114+
return {
115+
"title": title,
116+
"authors_list": authors_list,
117+
"authors_bibtex": authors_bibtex,
118+
"authorships": authorships,
119+
"venue_full": venue_full,
120+
"year": year,
121+
"doi": doi,
122+
"url": url,
123+
"pub_date": pub_date,
124+
"volume": volume,
125+
"issue": issue,
126+
"pages": pages,
127+
"surname": surname,
128+
"abstract": abstract
129+
}
130+
131+
def to_bibtex(meta, slug, abbrev_map):
132+
if meta["venue_full"] in abbrev_map:
133+
venue = abbrev_map[meta["venue_full"]]['bibtex']
134+
else:
135+
venue = meta["venue_full"]
136+
fields = {
137+
"title": meta["title"],
138+
"author": meta["authors_bibtex"],
139+
"journal": venue,
140+
"year": meta["year"],
141+
"volume": meta["volume"],
142+
"number": meta["issue"],
143+
"pages": meta["pages"],
144+
"doi": meta["doi"],
145+
"url": meta["url"]
146+
}
147+
lines = [f"@article{{{slug},"]
148+
lines.extend(f" {k} = {{{v}}}," for k, v in fields.items() if v)
149+
lines[-1] = lines[-1].rstrip(",") # drop trailing comma
150+
lines.append("}")
151+
return "\n".join(lines)
152+
153+
def write_index_md(folder, meta):
154+
front_matter = {
155+
"title": meta["title"],
156+
"subtitle": "",
157+
"summary": "",
158+
"authors": meta["authors_list"],
159+
"tags": [],
160+
"categories": [],
161+
"date": meta["pub_date"],
162+
"lastmod": datetime.datetime.now().isoformat(),
163+
"featured": False,
164+
"draft": False,
165+
"image": {"caption": "", "focal_point": "", "preview_only": False},
166+
"projects": [],
167+
"publishDate": datetime.datetime.now().isoformat(),
168+
"publication_types": ["1"],
169+
"abstract": meta["abstract"],
170+
"publication": meta["venue_full"],
171+
"doi": meta["doi"] or ""
172+
}
173+
index_md = "---\n" + yaml.dump(front_matter, sort_keys=False) + "---\n"
174+
(folder / "index.md").write_text(index_md, encoding="utf-8")
175+
176+
# === Main ===
177+
def main():
178+
abbrev_map = load_abbrev_map()
179+
key_authors = load_key_authors()
180+
dois = load_known_dois()
181+
182+
for PROJECT_NAME in ('gysela', 'gyselax', 'gyselalib'):
183+
url = "https://api.openalex.org/works"
184+
params = {
185+
"search": PROJECT_NAME,
186+
"filter": f"from_publication_date:{CHECK_FROM}",
187+
"per-page": 100
188+
}
189+
response = requests.get(url, params=params)
190+
response.raise_for_status()
191+
data = response.json()
192+
results = data.get("results", [])
193+
print(f"Found {len(results)} results for {PROJECT_NAME} since {CHECK_FROM}")
194+
195+
for work in results:
196+
# Discard preprints
197+
if work.get("type") == "preprint":
198+
continue
199+
200+
meta = extract_metadata(work, abbrev_map)
201+
202+
# Discard preprints
203+
if "arxiv" in meta["venue_full"].lower():
204+
continue
205+
206+
# Check relevance
207+
gysela_in_title = PROJECT_NAME in meta["title"].lower()
208+
gysela_in_abstract = PROJECT_NAME in meta["abstract"].lower()
209+
written_by_key_author = author_matches(meta["authorships"], key_authors)
210+
if not (gysela_in_title or gysela_in_abstract) and \
211+
not written_by_key_author:
212+
print("Discarding citation : ", meta["title"], meta["authors_list"])
213+
continue
214+
215+
# Discard if already found
216+
if meta["doi"] in dois:
217+
continue
218+
dois.add(meta["doi"])
219+
220+
print("Saving :")
221+
print(" ", meta["title"])
222+
print(" ", meta["authors_list"])
223+
if gysela_in_title or gysela_in_abstract:
224+
print("Mentioning Gysela prominently")
225+
if written_by_key_author:
226+
print("Written by permanent contributor")
227+
print()
228+
229+
slug = make_slug(meta, abbrev_map)
230+
folder = PUBLICATION_DIR / slug
231+
folder.mkdir(parents=True, exist_ok=True)
232+
233+
# Write index.md
234+
write_index_md(folder, meta)
235+
236+
# Write cite.bib
237+
bibtex = to_bibtex(meta, slug, abbrev_map)
238+
(folder / "cite.bib").write_text(bibtex, encoding="utf-8")
239+
240+
if __name__ == "__main__":
241+
main()
242+

scripts/venue_abbreviations.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
Journal of Computational Physics:
2+
slug: jcp
3+
bibtex: "J. Comput. Phys."
4+
Journal of Plasma Physics:
5+
slug: jpp
6+
bibtex: "J. Plasma Phys."
7+
Computer Physics Communications:
8+
slug: cpc
9+
bibtex: "Comput. Phys. Commun."
10+
Concurrency and Computation Practice and Experience:
11+
slug: ccpe
12+
bibtex: "Concurrency and Computation Practice and Experience"
13+
Plasma Physics and Controlled Fusion:
14+
slug: ppcf
15+
bibtex: "Plasma Phys. Controlled Fusion"
16+
SMAI Journal of Computational Mathematics:
17+
slug: smai
18+
bibtex: "SMAI Journal of Computational Mathematics"
19+
Communications Physics:
20+
slug: cp
21+
bibtex: "Commun. Phys."
22+
The International Journal of High Performance Computing Applications:
23+
slug: ijhpca
24+
bibtex: "Int. J. High Perform. Comput. Appl."
25+
Physics of Plasmas:
26+
slug: po-p
27+
bibtex: "Phys. Plasmas"
28+
Nuclear Fusion:
29+
slug: nf
30+
bibtex: "Nucl. Fusion"
31+
Physical review. E:
32+
slug: pre
33+
bibtex: "Phys. Rev. E"
34+
Physical Review Letters:
35+
slug: prl
36+
bibtex: "Phys. Rev. Lett."

0 commit comments

Comments
 (0)