Skip to content

Conversation

firewave
Copy link
Collaborator

@firewave firewave commented May 3, 2024

No description provided.

@firewave

This comment was marked as resolved.

@firewave

This comment was marked as outdated.

@firewave
Copy link
Collaborator Author

The callgrind CI job shows a small reduction in Ir: 64,070,509,276 -> 63,862,736,388.

@firewave firewave force-pushed the charbuf branch 3 times, most recently from 0085e46 to cb2981d Compare June 21, 2024 03:20
@firewave

This comment was marked as outdated.

@firewave

This comment was marked as resolved.

@firewave firewave force-pushed the charbuf branch 2 times, most recently from 10eaa80 to 0b40ad2 Compare July 15, 2024 08:10
@firewave

This comment was marked as outdated.

@firewave firewave force-pushed the charbuf branch 2 times, most recently from 2e85422 to df92530 Compare September 23, 2024 17:29
@firewave firewave marked this pull request as ready for review September 23, 2024 18:40
@firewave
Copy link
Collaborator Author

Maybe we could use more descriptive names like Buffer or FromFile suffixes. That would also make sense for the TokenList::createTokens() functions.

I did that - please have a look.

@firewave
Copy link
Collaborator Author

I might also still need to add convenience stream versions to the interface. And we would need tests for that - I am not so sure anymore about adding essentially dead code.

@danmar
Copy link
Owner

danmar commented Oct 3, 2024

I just don't like this. A istream parameter is better than 2 const uint8_t*, int size parameters imo. An istream is also more explicit than a string_view or a span.

@firewave
Copy link
Collaborator Author

firewave commented Oct 3, 2024

I just don't like this. A istream parameter is better than 2 const uint8_t*, int size parameters imo. An istream is also more explicit than a string_view or a span.

I will add the other safer versions in this PR as well and try to find a way to opt into the unsafe version so it can still be used internally.

@firewave
Copy link
Collaborator Author

Will update after danmar/simplecpp#496 has been merged.

@firewave
Copy link
Collaborator Author

Will update after danmar/simplecpp#496 has been merged.

That change is actually not necessary, so I will just rebase. It will allow further cleanups though which will be made in a follow-up.

@firewave firewave force-pushed the charbuf branch 3 times, most recently from 0b25bb8 to d3e381e Compare August 21, 2025 12:34
@firewave firewave marked this pull request as ready for review August 21, 2025 13:05
Copy link

@@ -799,10 +799,9 @@ unsigned int CppCheck::check(const FileWithDetails &file)
return returnValue;
}

unsigned int CppCheck::check(const FileWithDetails &file, const std::string &content)
unsigned int CppCheck::checkBuffer(const FileWithDetails &file, const uint8_t* data, std::size_t size)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike this interface. 1 parameter is switched to 2 parameters and it's not because we want to separate them.
isn't modern c++ guidelines advocating that span/string_view is used instead of raw-pointer+size this is a step in opposite direction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an rather internal interface and this removes all unnecessary wrappers and shows the actual intention.

As usual the changes made are incremental. The next step after this is to merge danmar/simplecpp#496 and get it downstream so further cleanups can be made. With the code clean and more interfaces into simplecpp available it should be clearer on how to proceed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you want to have is performance, getting rid of memory allocation and data copy, right?

What I want is a function interface that does not have a pointer+size pair and I would like a type that makes it as clear as possible that it is the raw code to process.

In my humble opinion we do not have conflicting requests. How about this:

struct FileData {
    const char* data;
    std::size_t dataSize;
};

unsigned int check(const FileWithDetails &file, const FileData &fileData);

Copy link
Owner

@danmar danmar Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not be against that FileData would have similar interface as string_view (with only the methods that we use). We can then have the option (using #ifdef) to just use a string_view alias instead:

using FileData = std::string_view;

And I guess that could lead to improved static analysis etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is rather awkward and I would like to have something which can be used without adjusting the code when we ever switch. But I do not want to do this in this PR. And without a clean base that cannot be properly tested and we also need a clean baseline to profile against.

As mentioned above that will be done in follow ups.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does more than just adjust that single interface and it will be changed in the future. We need clean code to move to something else instead of some intermediate mishmash. Moving to something else intermediate will only complicate things if I want to compare the implementations - which would actually not be possible at all to compare since there will always be a mix of various interfaces if we never had it clean to begin with.

Copy link
Owner

@danmar danmar Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be changed in the future

I would like to understand what the plan is really. I don't want to have char buffers all over the place. I suggest we remove the buffer inputs from simplecpp directly

string_view is safer than a plain old char pointer. the reason string_view was invented was to improve safety and expressiveness.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I don't understand why we should convert to char buffers everywhere first and then change to something else later. that sounds like "intermediate mishmash" to me :-(

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we remove the buffer inputs from simplecpp directly

ah.. I see that there is work in progress in danmar/simplecpp#496 👍

It would feel much better if we merge 496 first and then use the safe interface..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand what the plan is really.

The plan is to get rid of all unnecessary (inefficient) wrappers so data can be passed through directly and the interfaces can be cleaned up.

This PR for instance gets rid of the std::istream input which was never necessary.

And as outlined above. The next steps are getting the modern simplecpp::TokenList interface in so we clean up more things. And to clean up more of the tests.

The production code would then be clean and only the test might still have some mishmash but the interfaces for that can then be moved there and the production code can get some thin wrappers which allows for modern stuff to drop-in. (As using Qt implicitly switches to C++17 that could already be used and tested).

Doing this all as whole would be hard-to-review, making profiling the various implementations impossible and also might make it harder to pinpoint potential issues.

I don't want to have char buffers all over the place.

In production code we do not have any actual char buffers at all. We only pass as such. The buffers are where data comes from memory (fuzzer, democlient) or string literals in tests. In the latter case we can deduce it via template and can mostly pass it on.

I suggest we remove the buffer inputs from simplecpp directly

That is what danmar/simplecpp#496 is essentially doing.

the reason string_view was invented was to improve safety and expressiveness.

You mainly gain bounds safety and simplified interfaces. But it is still a massive footgun because people think it is all safe but if you use it wrong it might be even harder to figure out what is wrong than before (and also lifetime dependencies). But let's stay on topic...

and I don't understand why we should convert to char buffers everywhere first and then change to something else later. that sounds like "intermediate mishmash" to me :-(

We no longer do that in production code after this PR. The buffer goes straight into simplecpp::TokenList.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants