Skip to content

[Re] The principal components of natural images #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
IainDaviesMaths opened this issue Jul 17, 2020 · 31 comments
Closed

[Re] The principal components of natural images #45

IainDaviesMaths opened this issue Jul 17, 2020 · 31 comments

Comments

@IainDaviesMaths
Copy link

Original article: The Principal Components of Natural Images by Hancock et al (1991),
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=044DFF560FC2E801B579C2F23D268B44?doi=10.1.1.41.192&rep=rep1&type=pdf*

PDF URL: https://github.com/IainDaviesMaths/Hancock-Rescience-Paper/blob/master/article.pdf
Metadata URL: https://github.com/IainDaviesMaths/Hancock-Rescience-Paper/blob/master/metadata.yaml
Code URL: https://github.com/IainDaviesMaths/Reproduction-Hancock

Scientific domain: Computational Neuroscience
Programming language: Julia
Suggested editor: -

@cJarvers
Copy link

Oh, this looks interesting. @(whoever becomes editor for this), I'd be happy to review.

@rougier
Copy link
Member

rougier commented Jul 23, 2020

Thanks for your submission, we'll assign an editor soon (hopefully, because of summer vacation it may take some time). and@cJarvers volunteered to review.

@rougier
Copy link
Member

rougier commented Jul 23, 2020

@ThomasA Can you edit this submission ? @cJavers already proposed to review.

@otizonaizit
Copy link
Member

@rougier : in case @ThomasA does not react, I can edit this. I am not familiar with Julia, but I am interested in the reproduction of this classic paper :-)

@rougier
Copy link
Member

rougier commented Aug 21, 2020

That would be great, thanks.

@rougier rougier assigned rougier and otizonaizit and unassigned rougier Aug 21, 2020
@ThomasA
Copy link

ThomasA commented Aug 27, 2020

Sorry, I lost a bit track of ReScience over the summer. I see that an editor has been assigned now.

@otizonaizit
Copy link
Member

@cJarvers : thanks for the offer of reviewing this submission. Are you still up for it?
@pberkes : would you like to review this?

@otizonaizit
Copy link
Member

Guidelines for reviewers are here: https://rescience.github.io/edit/

@pberkes
Copy link

pberkes commented Aug 27, 2020

@otizonaizit : Gladly!

@cJarvers
Copy link

@otizonaizit : Yes, I'd still be happy to review.

@pberkes
Copy link

pberkes commented Sep 8, 2020

The authors successfully reproduce the findings of Hancock’s “PCA of natural images” paper, with minor deviations from the original results that are easily accounted for. The paper is well written and well organized. It’s good work, I don’t have any further comment.

The code is provided in two languages, Matlab and Julia. Unfortunately, I do not have a Matlab license so I could not verify that that part of the code works as intended. As far as I’m concerned, the Matlab code could be removed from the repository as it is redundant and Matlab is not an open language.

The Julia code is readable and decently organized. It runs as intended and produces some version of the figure in the paper. The random seed has not been fixed, so it produces slightly different images for each run.

For instance, this is the image shown in the paper and shipped with the repository:
julia-fig6 copy

Running the code I obtain this version
julia-fig6

I don’t know where the editors stand on this issue, but personally I think that the random seed should be a global parameter, and running the code with the same setting would always return exactly the same images.

Regarding the code repository: while I do appreciate the effort of the authors in making their research reproducible, the GitHub repository could be organized better.

  • Use directories to organize the content: input data, output results, Matlab, and Julia code in separate directories
  • Please add some instructions for running the code. E.g. for Julia one needs to run setup.jl first and then make run-julia
  • The version of packages that need to be installed for Julia is specified in Project.toml, it would be useful to also have the Matlab version and a list of any Matlab optional packages

Thank you for reproducing this classic!

@otizonaizit
Copy link
Member

@otizonaizit : Yes, I'd still be happy to review.

@cJarvers: do yo uhave a timeline for your review?

@otizonaizit
Copy link
Member

otizonaizit commented Sep 8, 2020

@IainDaviesMaths :

  • I agree with @pberkes that the random seed should be a global parameter
  • Regarding the Matlab code, I'd like to wait for @cJarvers and see if they can review it. In that case the code should stay. If not, then we may need to find someone who has a Matlab license and can review the code...
  • Can you comment/act on the other observations by @pberkes ?

@cJarvers
Copy link

@otizonaizit I'll try to do it within the next few days. Since I have a 6-month-old at home, it's a bit hard to predict how much time I can actually spend on work each day, but I'll do it by end of next week the latest.
I have a Matlab license, so I should be able to run & review that part of the code as well.

@otizonaizit
Copy link
Member

@otizonaizit I'll try to do it within the next few days. Since I have a 6-month-old at home, it's a bit hard to predict how much time I can actually spend on work each day, but I'll do it by end of next week the latest.
I have a Matlab license, so I should be able to run & review that part of the code as well.

Perfect, thanks! I just wanted to have a rough timeline, please take your time :-)

@cJarvers
Copy link

Review of Davies & Eglen: [Re] The principal components of natural images

Here is my review, @otizonaizit & @IainDaviesMaths

This review based on commit IainDaviesMaths/Reproduction-Hancock@519b169

Success of the replication

The authors successfully replicated the results of the original article. There are some minor deviations, but these are within the range of what should be expected, given that the additional natural images and the text images were not available anymore. The authors discuss these differences sufficiently.

The only place where I feel that more detailed commentary would be helpful is the reproduction of the text experiment. The authors state that "[f]or this to be reproduced, further experimental modifications had to be made to the text image", but what exactly these modifications were remains unclear. A suitable font had to be chosen (which one?) - anything else? Since the authors state that these results seem to be brittle, it might be nice to also include the result for a different font.

Reproducibility

The authors include a makefile, which makes rerunning their code easy. Since the code is provided in both Matlab and Julia, I'll briefly comment on each below.

Julia:

I ran the code with Julia 1.5.1.

As @pberkes already noted, the Julia code runs without problems, but does not produce perfectly reproducible results. I agree that the most crucial fix would be to set the random seeds.

Another potential problem for reproducibility is package management. The authors provide a setup.jl script that installs the required dependencies. However, the versions that get installed are not necessarily the same ones as the authors used, so the behavior of the code might change in the future. However, the repo already contains the solution to this problem: the Manifest.toml and Project.toml files document the exact package versions that were used. Instead of letting the user run the setup.jl script, the authors could add the following lines to the beginning of their runall.jl script:

using Pkg
Pkg.activate(".")
Pkg.instantiate()

In that case it might also be good to remove the setup.jl script to avoid confusion.

Matlab:

I ran the code with Matlab version R2019b.

Similar to the Julia code, the Matlab version runs without problems and generates Figures that resemble those in the paper, but have subtle differences (for example, the sign of he principal components is flipped) since the random seeds are not fixed. As with the Julia code, this should be rectified by setting random seeds.

Running make all generates figures 2 to 8 with Matlab, leaving out figure 1. Since figure 1 only concatenates the input images, this is not a big problem, but for completeness sake it might be better to generate this figure as well (just like the Julia code).

Since Matlab is not an open language, I also tried running the code with Octave 5.2.0. This did not throw any errors and the results for figure 1 were correct, but the other figures ran for a long time. I gave up after a few hours. I know it should be possible to make it run faster in Octave using the JIT compiler or something, but I don't have enough experience with that to go down that road.

As far as I understand it, the policy of ReScience is that the code has to be in an open source language, so not Matlab. This replication is an interesting edge case, since there is also an open source implementation (in Julia). The question is whether both should be published. I guess this is for the editorial team to decide.

Clarity of the code

I agree with @pberkes that the structure of the repository should be cleaned up. There should be separate subfolders for the images, the generated figures, the Julia code and the Matlab code.

In general, the code is clear and easy to read. There are only a few improvements I would suggest. All of these are suggestions, not requirements. If the authors disagree with me, I would nevertheless recommend publication.

Matlab:

In the Matlab code, there are some code duplications. For example, the functions LearningProcess and LearningProcessRotated are nearly identical, except for the rotation in the latter. One could just make rotation a parameter of the function and pass 0 if the images should not be rotated. Another example occurs in PlotFigure7.m and PlotFigure8.m. Here, the calculation of the mean gray level and the learning process are included in the script, even though the code is basically identical to the code in the functions MeanGrayLevel and LearningProcess. It would be better to just replace the code blocks by calls to these functions.

In general, I think the Matlab code would be a bit more readable if spaces were added around operators.

Julia:

Since the Julia code is very close to the Matlab code, the same improvements (reduction of code duplications, more spaces) could be made here. One additional recommendation concerns the documentation. Currently, each function is documented with an initial comment, Matlab-style. The convention in Julia is to use docstrings (see here). This has the advantage that Julias inbuilt help features (e.g., at the REPL) can be used. So simply cutting the text of each documentation comment and pasting it as a string atop the function would be an improvement. Again, this is only a recommendation.

Clarity of the article

The article is excellent. It is easy to understand what was done, which parts of the replication were difficult etc.

Some small issues

While reading the paper and code, I found a few small things (typos etc.) that could be improved:

  • In the introduction, the original paper is cited as "Hancock et al. 1991", but in the references the date is given as 1992.
  • In figure 7, the example image at half scale seems to be centered on a space, so that no letters are visible. It would be better to actually show letters here. Also the figure caption references (a), (b) and (c), but these letters are not actually used in the figure.
  • The file sagerupdate.m should probably be called sangerupdate.m
  • In that file in the comment on line 16, Sager should be Sanger
  • In PlotFigure6.m the comment in line 36 says "Double scale", but should probably be "Half scale"
  • The same holds for hancock.jl line 469
  • The code in lines 378, 379, 798-800, 803, 817-820 and 835-837 in hancock.jl has been commented out and could be removed to increase clarity

TL;DR

The replication was successful. The article is excellent. The code is also well structured and easily readable, however there are two problems with reproducibility (lack of random seeds; suboptimal package management) that the authors should fix before publication. In addition, the repo should be cleaned up.

Since one of the two implementation is in Matlab, the editorial team needs to decide whether that is appropriate for publication in ReScience, or whether only the Julia code should be published.

Finally, there are some improvements to the code that I would recommend (reducing duplications, adding spaces around operators, adding docstrings), but these are only suggestions.

@otizonaizit
Copy link
Member

Thank you @cJarvers for your review!

@IainDaviesMaths : both reviewers agree that the paper is fit for publication. You just need to address the reviewers comments, especially regarding the random seed and the re-organization of the code. Do you have an estimate when this can happen?

@rougier , @khinsen , @benoit-girard , @oliviaguest : in this submission we have a reproduction with code both in Julia and Matlab. The code in both languages has been reviewed, even if the Matlab code had some difficulties running under Octave. We have had this discussion already, but I think an official statement by the Editors in Chief would be good: do we accept code submissions in Matlab? Given that in this particular case we have corresponding code in Julia, it is maybe worth to relax the submission policy? Should the authors remove the Matlab code from the submission?

@oliviaguest
Copy link
Member

@otizonaizit seems like a discussion that deserves it's own thread IMHO!

@rougier
Copy link
Member

rougier commented Sep 18, 2020

In this specific case, I think we can relax the recommendation and include the Matlab code as well, expecially if it works under Octave. But I agree with @oliviaguest that we should open a thread to decide on a proper guideline.

@IainDaviesMaths
Copy link
Author

Thank you everyone for the reviews and feedback on the paper! Myself and Stephen are in the process of making all the suggested edits and should be done in the next few days.

@IainDaviesMaths
Copy link
Author

Most edits have been done now - just a few minor Julia edits to be made which Stephen is on top of. Will alert when the article is in final draft.

@IainDaviesMaths
Copy link
Author

Hi all - I think that's all your suggestions added to the code and article now! Let me know if there's anything else to be done.

@rougier
Copy link
Member

rougier commented Oct 26, 2020

@otizonaizit 🛎️

@otizonaizit
Copy link
Member

@pberkes @cJarvers : I find the reorganized repo and the other changes quite good, so if there are no objections from you two I'd accept the paper and contact @IainDaviesMaths fo the details of the publication process! Please write a comment here in any case.

@pberkes
Copy link

pberkes commented Oct 26, 2020

@otizonaizit no objections from me!

@cJarvers
Copy link

@otizonaizit no objections from me either. The changes address all the issues I raised. Great work by the authors!

@otizonaizit
Copy link
Member

Great. Congratulations @IainDaviesMaths : the paper is accepted! I will follow up shortly with a PR to the paper repo for the update of metadata.

@IainDaviesMaths
Copy link
Author

@otizonaizit thanks for the info about proceeding, I think it is ready for publication now!

@otizonaizit
Copy link
Member

otizonaizit commented Oct 27, 2020

Great! Published:

The article will appear on https://rescience.github.io/read/ as soon as these two PRs are merged by one of the editors in chief (@rougier ?)

@sje30
Copy link

sje30 commented Oct 27, 2020

Super, thanks! Well done Iain for getting this great work published.

@rougier
Copy link
Member

rougier commented Nov 2, 2020

Should be online now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment