Best way to handle dirty state in an ORM model

Question

I don't want anyone saying "you should not reinvent the wheel, use an open source ORM"; I have an immediate requirement and cannot switch.

I'm doing a little ORM, that supports caching. Even not supporting caching, I would need this feature anyways, to know when to write an object to storage or not. The pattern is DataMapper.

Here is my approach:

I want to avoid runtime introspection (i.e. guessing attributes).
I don't want to use a CLI code generator to generate getters and setters (really I use the NetBeans one, using ALT+INSERT).
I want the model to be the closest to a POPO (plain old PHP object). I mean: private attributes, "hardcoded" getters and setters for each attribute.

I have an Abstract class called AbstractModel that all the models inherit. It has a public method called isDirty() with a private (can be protected too, if needed) attribute called is_dirty. It must return true or false depending if there is a change on the object data or not since it was loaded.

The issue is: is there a way to raise the internal flag "is_dirty" without coding in each setter $this->is_dirty = true? I mean: I want to have the setters as $this->attr = $value most of the time, except a code change is needed for business logic.

Other limitation is that I cannot rely on __set because on the concrete model class the attributes already exists as private, so __set is never called on the setters.

Any ideas? Code examples from others ORMs are accepted.

One of my ideas was to modify the NetBeans setters template, but I think there should be a way of doing this without relying on the IDE.

Another thought I had was creating the setters and then change the private attribute's name with an underscore or something. This way the setter would call to __set and have some code there to deal with the "is_dirty" flag, but this breaks the POPO concept a little, and it's ugly.

score 8 · Accepted Answer · edited Apr 09 '20 at 11:34

_{Attention!
My opinion on the subject has somewhat changed in the past month. While the answer where is still valid, when dealing with large object graphs, I would recommend using Unit-of-Work pattern instead. You can find a brief explanation of it in this answer}

I'm kinda confused how what-you-call-Model is related to ORM. It's kinda confusing. Especially since in MVC the Model is a layer (at least, thats how I understand it, and your "Models" seem to be more like Domain Objects).

I will assume that what you have is a code that looks like this:

  $model = new SomeModel;
  $mapper = $ormFactory->build('something');

  $model->setId( 1337 );
  $mapper->pull( $model );

  $model->setPayload('cogito ergo sum');

  $mapper->push( $model );

And, i will assume that what-you-call-Model has two methods, designer to be used by data mappers: getParameters() and setParameters(). And that you call isDirty() before mapper stores what-you-call-Model's state and call cleanState() - when mapper pull data into what-you-call-Model.

_{BTW, if you have a better suggestion for getting values from-and-to data mappers instead of setParameters() and getParameters(), please share, because I have been struggling to come up with something better. This seems to me like encapsulation leak.}

This would make the data mapper methods look like:

  public function pull( Parametrized $object )
  {
      if ( !$object->isDirty() )
      {
          // there were NO conditions set on clean object
          // or the values have not changed since last pull
          return false; // or maybe throw exception
      }

      $data = // do stuff which read information from storage

      $object->setParameters( $data );
      $object->cleanState();

      return $true; // or leave out ,if alternative as exception
  }

  public static function push( Parametrized $object )
  {
      if ( !$object->isDirty() )
      {
          // there is nothing to save, go away
          return false; // or maybe throw exception
      }

      $data = $object->getParameters();
      // save values in storage
      $object->cleanState();

      return $true; // or leave out ,if alternative as exception
  }

_{In the code snippet Parametrized is a name of interface, which object should be implementing. In this case the methods getParameters() and setParameters(). And it has such a strange name, because in OOP, the implements word means has-abilities-of , while the extends means is-a.}

Up to this part you should already have everything similar...

Now here is what the isDirty() and cleanState() methods should do:

  public function cleanState()
  {
      $this->is_dirty = false;
      $temp = get_object_vars($this);
      unset( $temp['variableChecksum'] );
      // checksum should not be part of itself
      $this->variableChecksum = md5( serialize( $temp ) );
  }

  public function isDirty()
  {
      if ( $this->is_dirty === true )
      {
          return true;
      }

      $previous = $this->variableChecksum;

      $temp = get_object_vars($this);
      unset( $temp['variableChecksum'] );
      // checksum should not be part of itself
      $this->variableChecksum = md5( serialize( $temp ) );

      return $previous !== $this->variableChecksum;
  }

Thanks for pointing out the Model vs Domain Object issue. I used some frameworks and it's always confusing, even some ORMs (like [RedBeans](http://www.redbeanphp.com/manual/models_and_fuse)) call the Domain Objects "model". And thanks for formatting my question. The hash approach looks good, and for this local hash you can even use crc32, since the collition possibility is far, far, far away. — Diego, Jun 14 '12 at 17:37
Effectively the interface of "Model" (DomainObject) has two methods, `initFromStore(array $data)` and `getDataToStore():array`, with more or less the same concept that you use on `getParameters()` and `setParameters()"` (I try not to use "get" and "set" on methods that not belong to the business logic, when possible). `isDirty()` is managed internally by the AbstractModel when the object is loaded or stored. The encapsulation leak is there, yes, but is (at least to me) is tolerable, the other option is to use Reflection but I think it's slower. — Diego, Jun 14 '12 at 17:40
@Diego , yes , th idea is to inherit `isDirty` and `cleanState()`, and for both methods you use checksums, to see, if anything has changed. *Domain Object* should be able to tell if it has been altered. — tereško, Jun 14 '12 at 17:47
@Diego additionally, if your *Domain Object* is composed from other `objects`, you can use result of **`md5( print_r($this, true) )`** to detect changes deeper in object graph. Also, this would let you to differentiate between localized and compositional changes. — tereško, Jun 14 '12 at 17:50
Thanks for the suggestion of print_r. Persistence of the inner objects is something I will have to deal at some point, I was tempted to implement a store() method on AbstractModel and then propagate the call to the inner objects (by looping and checking instanceof with each attribute or element of a list), but it breaks the DataMapper pattern. I probably will ask about it at stack overflow at some point. — Diego, Jun 14 '12 at 18:57

score 1 · Answer 2 · answered Jun 07 '12 at 21:51

1

I would make a proxy to set for example:

class BaseModel {

   protected function _set($attr, $value) {
      $current = $this->_get($attr);
      if($value !== $current) {
         $this->is_dirty = true;
      }

      $this->$attr = $value;
   }
}

Then each child class would implemnt its setter by calling _set() and never set the property directly. Further, you can always inject more class specific code into each sub class's _set and just call parent::set($attr, $processedValue) if needed. Then if you want to use magic methods you make those proxy to property method that proxies to _set. I suppose this isnt very POPO though.

answered Jun 07 '12 at 21:51

prodigitalson

58,127
8
92
110

Yeah, I thought about that, and I already have a proxy (protected abstract on AbstractModel) to load the attributes at loading, and that way I can load the attributes from the parent class (remember that the concrete model has private attributes). I think that a proxy and modifying the setter templates is the better way, but I shoud do protected attributes on the concrete class, or implement those proxies on each class (or use traits, but I still don't feel confident on that ground) I will wait to see if someone comes with something better. – Diego Jun 08 '12 at 02:09
I do exactly this in a good size MVC PHP project. I use protected variables in my models and then override _get and _set allowing me to look for custom getters and setters first (ie. setFoo) before just assigning the attribute. This has proven to be very flexible. – greg Jun 13 '14 at 16:35

score 0 · Answer 3 · answered Dec 18 '13 at 16:56

0

though this post is old BUT how about using events to notify listeners when isDirty() happens? I would approach the solution with events.

answered Dec 18 '13 at 16:56

Victor Odiah

1,031
11
13

Best way to handle dirty state in an ORM model

3 Answers3

Linked