Skip to content

Commit 6259b14

Browse files
committed
Merge branch 'develop' into main
2 parents 7db9876 + 93bb0f0 commit 6259b14

File tree

6 files changed

+63
-74
lines changed

6 files changed

+63
-74
lines changed

CHANGES.md

Lines changed: 36 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,103 +1,94 @@
1-
Change log for microarchiver
2-
============================
1+
# Change log for microarchiver
32

4-
Version 1.12.0
5-
--------------
3+
## Version 1.12.1
4+
5+
This release fixes a couple of issues:
6+
7+
* The default network timeout was too short to get large PDF files from micropublication.org. Fixed by tripling the timeout duration.
8+
* Image conversion exceeded an internal default in the Python Pillow package being used for image conversion. Fixed by disabling the size check.
9+
10+
11+
# Version 1.12.0
612

713
Reports can now be written in _both_ CSV and HTML formats.
814

915

10-
Version 1.11.0
11-
--------------
16+
# Version 1.11.0
1217

13-
* Add support for producing reports in HTML format.
1418
* Add support for specifying the title of the report.
1519
* Fix incorrect count of articles in ZIP file comments.
1620

1721

18-
Version 1.10.7
19-
--------------
22+
# Version 1.10.7
2023

2124
* Test for more signs of failure in `upload-to-pmc.sh`.
2225
* Make some very tiny tweaks to the format of logs.
2326

2427

25-
Version 1.10.6
26-
--------------
28+
# Version 1.10.6
2729

2830
* Assume the use of Python `virtualenv` to lock in a specific Python environment.
2931
* Fix a bug in one of the workflow scripts in which the lack of a mail message body caused the mail command to hang indefinitely.
3032

3133

32-
Version 1.10.5
33-
--------------
34+
# Version 1.10.5
3435

3536
* Add new helper function to run `curl` in the upload script for PMC.
3637
* Fix inconsistency in the PMC upload script, wherein the user and password variables were not the same name as the cron variables actually used.
3738

3839

39-
Version 1.10.4
40-
--------------
40+
# Version 1.10.4
4141

4242
* Fix bug in date handling in workflow scripts. The value of the `--after-date` argument to `microarchiver` was set to the date it ran, which caused it to miss articles published on the date that it ran. The value of the date should have been modified to include the day it last ran so that the date comparison was correct. (Thanks to Nick Stiffler for catching and reporting the problem.)
4343

4444

45-
Version 1.10.3
46-
--------------
45+
# Version 1.10.3
4746

4847
(Mistaken release -- ignore this.)
4948

5049

51-
Version 1.10.2
52-
--------------
50+
# Version 1.10.2
5351

5452
* Update the workflow scripts and associated crontab template.
5553

5654

57-
Version 1.10.1
58-
--------------
55+
# Version 1.10.1
5956

6057
* Fix behavior when DataCite has no data for an article: `microarchiver` was _meant_ to flag the article and keep going, but instead it treated it as a fatal error.
6158
* Fix some documentation errors about the numeric codes returned by `microarchiver`.
6259
* Minor other improvements.
6360

6461

65-
Version 1.10.0
66-
--------------
62+
# Version 1.10.0
6763

6864
This version changes the behavior of the `-@` command-line option, such that exceptions encountered when running with the `-@` option do _not_ cause `microarchiver` to drop into an interactive debugger. The old behavior turned out to be unhelpful in practice, and moreover, it mixed two behaviors into one command-line flag. The latter was problematic when running `microarchiver` from scripts.
6965

7066

71-
Version 1.9.4
72-
--------------
67+
# Version 1.9.4
7368

7469
This version removes an unnecessary dependency on wxPython. A GUI interface was never completed for Microarchiver, and while the starting code is still in the code base in case we try to build a GUI, it doesn't have to be hooked in at this point. Removing the internal references to the GUI code allows the wxPython requirement to be removed, which in turn simplifies and speeds up installation.
7570

7671

77-
Version 1.9.3
78-
--------------
72+
# Version 1.9.3
7973

8074
* Add missing Python package requirement to requirements.txt.
8175
* Simplify PMC upload script.
8276

8377

84-
Version 1.9.2
85-
--------------
78+
# Version 1.9.2
8679

8780
* Fix broken logos and images in README.md.
8881
* Replace local version of `debug.py` with the use of [Sidetrack](https://github.com/caltechlibrary/sidetrack).
8982
* Use newer approach to recording version and other metadata in `__init__.py` and the release procedure codified in `Makefile`.
9083
* Minor internal changes.
9184

9285

93-
Version 1.9.1
94-
-------------
86+
# Version 1.9.1
9587

9688
* Fix [issue #2](https://github.com/caltechlibrary/microarchiver/issues/2): volume number in file names is incorrectly determined
9789

9890

99-
Version 1.9.0
100-
-------------
91+
# Version 1.9.0
10192

10293
* Support output for PMC using new command-line option `-s`.
10394
* Rename the JATS XML file after the pattern _issn_-_volume_-_doi_.xml, to make it more compatible with output generated for PMC.
@@ -107,14 +98,12 @@ Version 1.9.0
10798
* Some internal code changes.
10899

109100

110-
Version 1.8.0
111-
-------------
101+
# Version 1.8.0
112102

113103
* Instead of quitting with an error if the file given to `-a` is empty, `microarchiver` will now just print a warning.
114104

115105

116-
Version 1.7.0
117-
-------------
106+
# Version 1.7.0
118107

119108
* Store JATS XML for each article, as well as any image referenced in the JATS data. Images are converted to uncompressed TIFF before being stored.
120109
* Perform JATS validation for each article by default.
@@ -127,70 +116,60 @@ Version 1.7.0
127116
* Fix miscellaneous bugs.
128117

129118

130-
Version 1.6.3
131-
-------------
119+
# Version 1.6.3
132120

133121
* Catch and handle no-content errors more gracefully.
134122
* Detect mangled XML returned by micropublication.org and handle it more gracefully.
135123

136124

137-
Version 1.6.2
138-
-------------
125+
# Version 1.6.2
139126

140127
* Fix crasher in writing comment into zip file because of reference to no-longer-existing package attribute.
141128

142129

143-
Version 1.6.1
144-
-------------
130+
# Version 1.6.1
145131

146132
* Fix broken handling of debug trace output destination.
147133
* Update `README.md` to describe changes to debug flag.
148134

149135

150-
Version 1.6.0
151-
-------------
136+
# Version 1.6.0
152137

153138
* Change the debug flag `-@` to accept an argument for where to send the debug output trace. The behavior change of `-@` is not backward compatible.
154139
* Put metadata in `setup.cfg` and change how Microarchiver gets the metadata internally.
155140

156141

157-
Version 1.5.1
158-
-------------
142+
# Version 1.5.1
159143

160144
* Fix bug in propagating network failures up to the top of main.
161145
* Fix case of variable being shadowed inside a block.
162146

163147

164-
Version 1.5.0
165-
-------------
148+
# Version 1.5.0
166149

167150
* Added new `-g` option to print the raw XML article list from the server.
168151
* Did very minor internal code refactoring.
169152

170153

171-
Version 1.4.0
172-
-------------
154+
# Version 1.4.0
173155

174156
* Added new `scripts` subdirectory with script for use with cron.
175157
* Fixed behavior: if there are no articles to archive, don't create the output directory either.
176158

177159

178-
Version 1.3.0
179-
-------------
160+
# Version 1.3.0
180161

181162
* Now if there are no articles to archive, it won't create a zip file.
182163

183164

184-
Version 1.2.0
185-
-------------
165+
# Version 1.2.0
186166

187167
* Improved installation instructions.
188168
* Changed debug flag from `-Z` to `-@`.
189169
* Internal code changes for message printing & colorization.
190170

191171

192-
Version 1.1.0
193-
-------------
172+
# Version 1.1.0
194173

195174
* **Backwards incompatible change**: command-line arguments have been significantly changed in terms of names and shortcut letters.
196175
* Addition of new `-d` command-line argument, for getting only articles published after a certain date.

microarchiver/__main__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@
3939
# "OSError: image file is truncated (10 bytes not processed)"
4040
ImageFile.LOAD_TRUNCATED_IMAGES = True
4141

42+
# This is to prevent Pillow from warning "DecompressionBombWarning: Image size
43+
# (100153418 pixels) exceeds limit of 89478485 pixels ..."
44+
Image.MAX_IMAGE_PIXELS = None
45+
4246
import microarchiver
4347
from microarchiver import print_version
4448
from .exceptions import *

microarchiver/network.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ def addurl(text):
251251
return (text + ' for {}').format(url)
252252

253253
try:
254-
req = timed_request('get', url, stream = True)
254+
req = timed_request('get', url, stream = True, timeout = 60)
255255
except requests.exceptions.ConnectionError as ex:
256256
if recursing >= _MAX_RECURSIVE_CALLS:
257257
raise NetworkFailure(addurl('Too many connection errors'))

scripts/archive-in-portico

100755100644
Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ today=$(date +%Y-%m-%d)
4545
datestampfile=$PORTICO_OUTPUT/last-run-date
4646
failurefile=$PORTICO_OUTPUT/last-failures
4747

48-
# Today's run will be written in a subdirctory. Note the subdirectory name
48+
# Today's run will be written in a subdirectory. Note the subdirectory name
4949
# includes the current time, not just today's date, because otherwise we
5050
# would overwrite the previous data if we ran run multiple times per day.
5151
now=$(date +%Y-%m-%d-%H%M)
@@ -91,26 +91,30 @@ fi
9191

9292
# Run microarchiver separately on past failures, leaving the results unpackaged
9393
# so that we can add to them the results of today's run.
94+
rerun_count=0
9495
if [[ -f $failurefile ]]; then
9596
echo "=== Running microarchiver on past failures ===" >> $log
9697
echo "" >> $log
9798
# Note the use of -Z to prevent zip'ing the final results.
98-
run_microarchiver -s portico -Z -C -a $failurefile -o $outputdir -r $report -@ $debuglog
99+
run_microarchiver -s portico -Z -C -a $failurefile -o $outputdir -r $report \
100+
-f csv,html -t Past_failures_retried -@ $debuglog
99101
echo "" >> $log
100102
fi
103+
mv $outputdir/report.html $outputdir/rerun-report.html
101104

102105
echo "=== Running microarchiver for new articles ===" >> $log
103106
echo "" >> $log
104-
thisreport=$outputdir/latest-report.csv
105-
thisdebuglog=$outputdir/latest-debug.log
107+
this_report=$outputdir/latest-report.csv
108+
this_debuglog=$outputdir/latest-debug.log
106109
# This will add new articles to any existing ones from the past failures
107110
# code above, and this time will zip up the final result.
108-
run_microarchiver -s portico -C -d $afterdate -o $outputdir -r $thisreport -@ $thisdebuglog
111+
run_microarchiver -s portico -C -d $afterdate -o $outputdir -r $this_report \
112+
-f csv,html -t $today -@ $this_debuglog
109113

110114
# Combine separate report files, leave that, & delete the intermediate files.
111-
tail -n +2 $thisreport >> $report
112-
tail $thisdebuglog >> $debuglog
113-
rm -f $thisdebuglog $thisreport
115+
tail -n +2 $this_report >> $report
116+
tail $this_debuglog >> $debuglog
117+
rm -f $this_debuglog $this_report
114118

115119
# Did we have any failures? If so, note them for next time.
116120
grep -i "missing," $outputdir/*report.csv | cut -f2 -d',' > $failurefile
@@ -136,13 +140,13 @@ echo $today > $datestampfile
136140

137141
grep -F "Total articles" $log | \
138142
sed 's/Total //g;1 s/articles/Past failures retried/;2 s/articles/New &/' | \
139-
mail -s "Portico archiving results for $today" -a $report -a $log $EMAIL_SUCCESS
143+
mail -s "Portico archiving results for $today" \
144+
-a $outputdir/latest-report.html -a $outputdir/rerun-report.html \
145+
-a $log $EMAIL_SUCCESS
140146

141147

142148
# Post the report to Slack ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143149

144150
run_slack chat send --channel $SLACK_CHANNEL --color "#00ff00" \
145-
--title "microarchiver successfully completed Portico upload" \
151+
--title "Portico run for micropublications.org completed." \
146152
--text "There were $(wc -l < $failurefile) articles skipped."
147-
run_slack file upload --channels $SLACK_CHANNEL --file $report \
148-
--comment "Here is the record of what was uploaded:"

scripts/helpers.sh

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,11 +60,13 @@ EOF
6060
fi
6161

6262
# Run microarchiver with arguments and save output in $log
63-
microarchiver $@ >> $log 2>&1
63+
count=$(microarchiver $@ 2>&1 | tee -a $log | grep "Total articles" | cut -d ' ' -f3)
6464

65-
# Was it a successful run? If not, send mail & quit.
65+
# If successful, return the num. of articles written, else send mail & quit
6666
status=$?
67-
if (($status > 0 && $status < 100)); then
67+
if (($status == 0)); then
68+
echo $count
69+
else
6870
case "$status" in
6971
1) cause="No network detected" ;;
7072
2) cause="The user interrupted program execution" ;;

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
[metadata]
1414
name = microarchiver
15-
version = 1.12.0
15+
version = 1.12.1
1616
summary = Archives articles from microPublication.org
1717
description = Create archives of articles from microPublication.org.
1818
author = Michael Hucka, Tom Morrell

0 commit comments

Comments
 (0)