-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[rfile] Introduce RFile first prototype #19958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the work!
Seeing the code sparked a lot of questions. Many of those might be either addressed in later PRs and some might be irrelevant, so feel free to "Resolve" those.
Test Results0 tests 0 ✅ 0s ⏱️ Results for commit bb01c06. ♻️ This comment has been updated with latest results. |
978733c
to
7430133
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments, but in general I'll leave review to others...
868ac3e
to
6cbc229
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice take-off!
6cbc229
to
ca1fd6c
Compare
da895e4
to
51f718e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Just a few small things below.
/// Given `path`, returns the TKey corresponding to the object at that path (assuming the path is fully split, i.e. | ||
/// "a/b/c" always means "object 'c' inside directory 'b' inside directory 'a'"). | ||
/// IMPORTANT: `path` must have been validated/normalized via ValidateAndNormalizePath() (see RFile.cxx). | ||
TKey *GetTKey(std::string_view path) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider:
TKey *GetTKey(std::string_view path) const; | |
TKey *GetKey(std::string_view path) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use TKey
in RFile whenever I want to point out that it's specifically a TKey
- which should only be used internally and never exposed in the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is internal, it does not really matter that much. I still find it a bit awkward unless we envisioned the possibility that there would eventually be a RKey
.
} else if (key && !GetROOT()->IsBatch()) { | ||
R__LOG_WARNING(RFileLog()) << "Tried to get object '" << path << "' of type " << cls->GetName() | ||
<< " but that path contains an object of type " << key->GetClassName(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is debatable. What if the user wants to use this in a script used interactively? For example, the inner part of the script might be interrogating what 'user type' of file this before moving forward (eg. do something for file containing TTree
with one name and another for files containing a RNTuple
with a same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is certainly debatable and I'm still not sure about the best compromise. I guess a flag could help, though it's a bit unergonomic. Another option would be to use R__LOG_INFO which iirc is off by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this as the I/O meeting
RFile is a modern and minimalistic interface to ROOT files, both local and remote, that can be used instead of TFile | ||
when the following conditions are met: | ||
- you want a simple interface that makes it easy to do things right and hard to do things wrong; | ||
- you only need basic Put/Get operations and don't need the more advanced TFile/TDirectory functionalities; | ||
- you want more robustness and better error reporting for those operations; | ||
- you want clearer ownership semantics expressed through the type system rather than having objects "automagically" | ||
handled for you via implicit ownership of raw pointers. | ||
RFile doesn't try to cover the entirety of use cases covered by TFile/TDirectory/TDirectoryFile and is not | ||
a 1:1 replacement for them. It is meant to simplify the most common use cases and make them easier to handle by | ||
minimizing the amount of ROOT-specific quirks and conforming to more standard C++ practices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the tone a bit harsh. Consider:
RFile is a modern and minimalistic interface to ROOT files, both local and remote, that can be used instead of TFile
when you only need basic Put/Get operations and don't need the more advanced TFile/TDirectory functionalities. It provides:
- a simple interface that makes it easy to do things right and hard to do things wrong,
- more robustness and better error reporting for those operations,
- clearer ownership semantics expressed through the type system.
RFile doesn't cover the entirety of use cases covered by TFile/TDirectory/TDirectoryFile and is not
a 1:1 replacement for them. It is meant to simplify the most common use cases by following newer standard C++ practices.
## Directories | ||
Differently from TFile, the RFile class itself is not also a "directory". In fact, there is no RDirectory class at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Differently from TFile, the RFile class itself is not also a "directory". In fact, there is no RDirectory class at all. | |
Unlike TFile, the RFile class does not function as a directory. Moreover, there is no RDirectory class provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think this is better than the current version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It reads (better) to me 'english-wise'. Unlike
is more (american) English than Differently from
. The rest of that sentence is rephrase in a function way rather than indirectly focusing on the C++ semantic of inheritance. The In fact ... at all.
was making the 2nd sentences stand (too much) as a reinforcement of the first rather than its owns message. Instead of saying 'we also do not plan to offer a standalone RDirectory facility', the In fact ... at all
can be (mis?) read as 'RFile is not a directory solely because we are not implementing RDirectory [but if we did implement it, RFile would inherit from it]' (the last part is a slight exaggeration to clarify the drift)'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for the "unlike", I agree.
The point of the sentence for me was specifically to focus on the C++ semantic rather than the functionality (see here), as I think it's a relatively big change from both TFile and the older RFile prototype. However, rereading the sentence, maybe the intention was not that clear.
I have a bit of a problem with the "function as a directory" wording as it's not very clear in my opinion.
Perhaps I should just drop the paragraph altogether, I don't think it adds much value to the documentation after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can indeed make sense to drop the first 2 sentences and start at Directories are still ...
maybe rephrased now as (for example)
Directories are semantically supported by the `RFile` interfaces as they are a concrete part of the ROOT binary format.
Directories are interacted with solely via the use of filesystem-like string-based paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if the updated phrasing works for you :)
Differently from TFile, the RFile class itself is not also a "directory". In fact, there is no RDirectory class at all. | ||
Directories are still an existing concept in RFile (since they are a concept in the ROOT binary format), | ||
but they are usually interacted with indirectly, via the use of filesystem-like string-based paths. If you Put an object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but they are usually interacted with indirectly, via the use of filesystem-like string-based paths. If you Put an object
Is the "usually" intentional? It implies that there is (or will be) other to interact with the directories. Is that the intent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it implies that we "reserve the right" to also provide directory querying functionality at some point (which we know we'll probably need in some form)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that 'usually' is effectively expressing "reserve the right". In the contrary, it reads as if there is already other ways. A closer adverb might 'for now only interacted with indirectly.'
kPutAllowOverwrite = 0x1, | ||
kPutOverwriteKeepCycle = 0x2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we document these flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are internal and pretty self-explanatory, but I can add doc comments to them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it took me to look at the code to understand the 2nd one, I first read it as:
"kPut Overwrite[/change] [the default behavior] Keep Cycle" [aka 'problably' not keep the cycle]
while I guess it is meant to be read as
"[The] kPut Overwrite [operation will] Keep [the existing last] Cycle"
The misunderstanding is based on my misunderstanding in that context of "Overwrite" as a noun or as a verb. The fact that in kPutAllowOverwrite
the 2nd word is treated as a verb lead my brain to infer that the 2nd word of the other enum was also to be treated as verb :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I added documentation to them in the followup PR.
|
||
EXPECT_FALSE(file->Get<TH1D>("inexistent")); | ||
EXPECT_FALSE(file->Get<TH1F>("hist")); | ||
EXPECT_TRUE(file->Get<TH1>("hist")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we delete the result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, the return value is a unique_ptr
:)
{ | ||
FileRaii fileGuard("does_not_exist.root"); | ||
|
||
// make sure that the file really does not exist, in case a previous test didn't clean it up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any test that intentionally create the file named 'does_not_exist.root' ? The comment seems to imply there is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
// This succeeds because Update creates the file if it doesn't exist. | ||
EXPECT_NO_THROW(RFile::Update("does_not_exist.root")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This answer: https://github.com/root-project/root/pull/19958/files#r2411952125.
I recommend using a different name:
EXPECT_NO_THROW(RFile::Update("does_not_exist.root")); | |
FileRaii updateFileGuard("created_by_update.root"); | |
gSystem->Unlink(updateFileGuard.GetPath().c_str()); | |
EXPECT_NO_THROW(RFile::Update("created_by_update.root")); |
EXPECT_EQ(hist->GetDirectory(), nullptr); | ||
ASSERT_NE(hist, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXPECT_EQ(hist->GetDirectory(), nullptr); | |
ASSERT_NE(hist, nullptr); | |
ASSERT_NE(hist, nullptr); | |
EXPECT_EQ(hist->GetDirectory(), nullptr); |
Otherwise in case of failure of the ASSERT_NE
, the actual result is a segmentation fault in the EXPECT_EQ
{ | ||
FileRaii fileGuard("test_rfile_wrong.root.1"); | ||
ROOT::TestSupport::CheckDiagsRAII diagsRaii; | ||
diagsRaii.requiredDiag(kWarning, "ROOT.File", "preferred file extension is \".root\"", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a warning rather than an Info? (Reminder: tools can tell whether a file is a ROOT file by reading the first few bytes of the file (MIME format)).
|
||
static void CheckExtension(std::string_view path) | ||
{ | ||
if (ROOT::EndsWith(path, ".xml")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is .xml
so special?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because TFile::Open
has a plugin that tries to open the file as XML if that's the extension and we don't want to support XML files in RFile (there might be more unwanted extensions that I missed, but this is the one I'm aware of).
Edit: I guess there's at least also .zip
, we should ban that as well for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also .zip
:
etc/plugins/TArchiveFile/P010_TZIPFile.C: gPluginMgr->AddHandler("TArchiveFile", ".+[.]zip$", "TZIPFile",
etc/plugins/TFile/P080_TXMLFile.C: gPluginMgr->AddHandler("TFile", ".+[.]xml$", "TXMLFile",
This PR adds a first version of the new RFile prototype, implementing basic Open/Append/Recreate/Get/Put methods.
For a sneak peek of the final shape of the prototype, see here
Checklist: