How to log user defined POD struct in C++

Question

I need to add logging to a legacy c++ project, which contains hundreds of user defined structs/classes. These structs only contain primary types as int, float, char[], enum.

Content of objects need to be logged ,preferred in human readable way , but not a must, as long as the object could be reconstructed.

Instead of writing different serialization methods for each class, is there any alternative method?

If there are no pointers and the types are trivially copyable, you can `write` and `read` them. Not human readable but that's the best you can get. If there are pointers, no such luck. — n. 'pronouns' m., Jul 06 '16 at 08:10
Reflection is possible with C++, not easy but possible http://stackoverflow.com/a/11748131/5076707 — Pumkko, Jul 06 '16 at 08:22
There is a quite magical library which was posted on `reddit/cpp` recently. It can automatically generate reflection capability for pod types like this. However, it uses some crazy tricks to accomplish this, and requires C++14. Technically it causes UB since it attempts to recover pointers to structure members by careful arithmetic rather than proper member pointers (it figures out the types but can't know their names). But it's pretty damn clever. https://github.com/apolukhin/magic_get The situation you are describing is like the only time I would personally consider to use it. — Chris Beck, Jul 06 '16 at 09:14
What about preprocessing the headers & generating the serialization code? — lorro, Jul 06 '16 at 12:29
@Iorro can you elaborate on that? something similar to Ira Baxtor's answer? — Guangyu, Jul 07 '16 at 02:18
@ChrisBeck Very interesting project and seems fit for this situation. However i assume MSVC doesn't have a fully c++14 supported compiler yet, so _cplusplus check won't pass. Will invest some time to try it out. :) — Guangyu, Jul 07 '16 at 06:31
@n.m. tried that, write binary is the easy part, read without knowing the format is quite painful. — Guangyu, Jul 07 '16 at 06:56

score 0 · Answer 1 · answered Jul 06 '16 at 08:06

Since C++ does not have reflection there is no way for you to dynamically inspect the members of an object at runtime. Thus it follows that you need to write a specific serialization/streaming/logging function for each type.

If all the different types had members of the same name, then you could write a template function to handle them, but I assume that is not the case.

Gerardo Hernandez · Answer 2 · 2016-07-06T08:27:42.227

0

As C++ does not have reflection this is not that easy. If you want to avoid a verbose solution you can use a variadic template.

E.g. `class MyStruct { private: int a; float f;

public: void log() { log_fields(a, f); } };`

where log_fields() is the variadic template. It would need to be specialized for all the basic types found on those user defined types and also for a recursive case.

edited Jul 06 '16 at 08:27

answered Jul 06 '16 at 08:21

Gerardo Hernandez

1,570
14
19

Using this approach, OP still has to include a log() definition manually for each of his hundreds of structs. – Ira Baxter Jul 06 '16 at 09:27

Ira Baxter · Accepted Answer · 2016-07-07T12:40:47.900

What you want is a Program Transformation System (PTS). These are tools that can read source code, build compiler data structures (usually ASTs) that represent the source code, and allow you to modify the ASTs and regenerate source code from the modified AST.

These are useful because they "step outside" the language, and thus have no language-imposed limitations on what you can analyze or transform. So it doesn't matter if your langauge doesn't have reflection for everything; a good PTS will give you full access to every detail of the language, including such arcana as comments and radix on numeric literals.

Some PTSes are specific to a targeted language (e.g, "Jackpot" is only usuable for Java). A really good PTS is provided a description of an arbitrary programming langauge, and can then manipulate that language. That description has to enable the PTS to parse the code, analyze it (build symbol tables at least) and prettyprint the parsed/modified result.

Good PTSes will allow you write the modifications you want to make using source-to-source transformations. These are rules specifying changes written in roughly the following form:

   if you see *this*, replace it by *that* when *condition*

where this and that are patterns using the syntax of the target language being processed, and condition is a predicate (test) that must be true to enable the rule to be applied. The patterns represent well-formed code fragmens, and typically allow metavariables to represent placeholders for arbitrary subfragments.

You can use PTSes for a huge variety of program manipulation tasks. For OP's case, what he wants is to enumerate all the structs in the program, pick out the subset of interest, and then generate a serializer for each selected struct as a modification to the original program.

To be practical for this particular task, the PTS must be able to parse and name resolve (build symbol tables) C++. There are very few tools that can do this: Clang, our DMS Software Reengineering Toolkit, and the Rose compiler.

A solution using DMS looks something like this:

domain Cpp~GCC5;  -- specify the language and specific dialect to process

pattern log_members( m: member_declarations ): statements = TAG;
      -- declares a marker we can place on a subtree of struct member declarations

rule serialize_typedef_struct(s: statement, m: member_declarations, i: identifier):
           statements->statements
   = "typedef struct { \m } \i;" -> 
     "typedef struct { \m } \i;
      void \make_derived_name\(serialize,\i) ( *\i argument, s: stream )
          { s << "logging" << \toString\(\i\);
            \log_members\(\m\)
          }"
      if selected(i); -- make sure we want to serialize this one

rule generate_member_log_list(m: member_declarations, t: type_specification, n: identifier): statements -> statements
   " \log_members\(\t \n; \m\)" -> " s << \n; \log_members\(\m\) ";

rule generate_member_log_base(t: type_specification, n: identifier): statements -> statements
   " \log_members\(\t \n; \)" -> " s << \n; ";

ruleset generate_logging {
   serialize_typedef struct,
   generate_member_log_list,
   generate_member_log_base 
}

The domain declaration tells DMS which specific language front-end to use. Yes, GCC5 as a dialect is different than VisualStudio2013, and DMS can handle either.

The pattern log_members is used as a kind of transformational pointer, to remember that there is some work to do. It wraps a sequence of struct member_declarations as an agenda (tag). What the rules do is first mark structs of interest with log_members to establish the need to generate the logging code, and then generate the member logging actions. The log_members pattern acts as a list; it is processed one element at a time until a final element is processed, and then the log_members tag vanishes, having served its purpose.

The rule serialize_typedef_struct is essentially used to scan the code looking for suitable structs to serialize. When it finds a typedef for a struct, it checks that struct is one that OP wants serialized (otherwise one can just leave off the if conditional). The meta-function selected is custom-coded (not shown here) to recognize the names of structs of interest. When a suitable typedef statement is found, it is replaced by the typedef (thus preserving it), and by the shell of a serializing routine containing the agenda item log_members holding the entire list of members of the struct. (If the code declares structs in some other way, e.g., as a class, you will need additional rules to recognize the syntax of those cases). Processing the agenda item by rewriting it repeatedly produces the log actions for the individual members.

The rules are written in DMS rule-syntax; the C++ patterns are written inside metaquotes " ... " to enable DMS to distinguish rule syntax from C++ syntax. Placeholder variables v are declared in the rule header according thier syntactic categories, and show up in the meta-quoted patterns using an escape notation \v. [Note the unescaped i in the selected function call: it isn't inside metaquotes]. Similarly, meta-functions and patterns references inside the metaquotes are similarly escaped, thus initially odd looking \log\( ... \) including the escaped pattern name, and escaped meta-parentheses.

The two rules generate_member_log_xxx hand the general and final cases of log generation. The general case handles one member with more members to do; the final case handles the last member. (A slight variant would be to process an empty members list by rewriting to the trivial null statement ;). This is essentially walking down a list until you fall off the end. We "cheat" and write rather simple logging code, counting on overloading of stream writes to handle the different datatypes that OP claims he has. If he has more complex types requiring special treatment (e.g., pointer to...) he may want to write specialized rules that recognize those cases and produce different code.

The ruleset generate_logging packages these rules up into a neat bundle. You can trivially ask DMS to run this ruleset over entire files, applying rules until no rules can be further applied. The serialize_typdef_structure rule finds the structs of interest, generating the serializing function shell and the log_members agenda item, which are repeatedly re-written to produce the serialization of the members.

This is the basic idea. I haven't tested this code, and there is usually some surprising syntax variations you end up having to handle which means writing a few more rules along the same line.

But once implemented, you can run this rule over the code to get serialized results. (One might implement selected to reject named structs that already have a serialization routine, or alternatively, add rules that replace any existing serialization code with newly generated code, ensuring that the serialization procedures always match the struct definition). There's the obvious extension to generating a serialized struct reader.

You can arguably implement these same ideas with Clang and/or the Rose Compiler. However, those systems do not offer you source-to-source rewrite rules, so you have to write procedural code to climb up and down trees, inspect individual nodes, etc. It is IMHO a lot more work and a lot less readable.

And when you run into your next "C++ doesn't reflect that", you can tackle the problem with the same tool :-}

Thanks for the long answer, very interesting way to solve the problem. — Guangyu, Jul 07 '16 at 02:10

How to log user defined POD struct in C++

3 Answers3