14

I'm really struggling with a recurring OOP / database concept.

Please allow me to explain the issue with pseudo-PHP-code.

Say you have a "user" class, which loads its data from the users table in its constructor:

class User {
    public $name;
    public $height;

    public function __construct($user_id) {
        $result = Query the database where the `users` table has `user_id` of $user_id
        $this->name= $result['name'];
        $this->height = $result['height'];
    }
}

Simple, awesome.

Now, we have a "group" class, which loads its data from the groups table joined with the groups_users table and creates user objects from the returned user_ids:

class Group {
    public $type;
    public $schedule;
    public $users;

    public function __construct($group_id) {
        $result = Query the `groups` table, joining the `groups_users` table,
                    where `group_id` = $group_id
        $this->type = $result['type'];
        $this->schedule = $result['schedule'];

        foreach ($result['user_ids'] as $user_id) {
            // Make the user objects
            $users[] = new User($user_id);
        }
    }
}

A group can have any number of users.

Beautiful, elegant, amazing... on paper. In reality, however, making a new group object...

$group = new Group(21);  // Get the 21st group, which happens to have 4 users

...performs 5 queries instead of 1. (1 for the group and 1 for each user.) And worse, if I make a community class, which has many groups in it that each have many users within them, an ungodly number of queries are ran!

The Solution, Which Doesn't Sit Right To Me

For years, the way I've got around this, is to not code in the above fashion, but instead, when making a group for instance, I would join the groups table to the groups_users table to the users table as well and create an array of user-object-like arrays within the group object (never using/touching the user class):

class Group {
    public $type;
    public $schedule;
    public $users;

    public function __construct($group_id) {
        $result = Query the `groups` table, joining the `groups_users` table,
                    **and also joining the `users` table,**
                    where `group_id` = $group_id
        $this->type = $result['type'];
        $this->schedule = $result['schedule'];

        foreach ($result['users'] as $user) {
            // Make user arrays
            $users[] = array_of_user_data_crafted_from_the_query_result;
        }
    }
}

...but then, of course, if I make a "community" class, in its constructor I'll need to join the communities table with the communities_groups table with the groups table with the groups_users table with the users table.

...and if I make a "city" class, in its constructor I'll need to join the cities table with the cities_communities table with the communities table with the communities_groups table with the groups table with the groups_users table with the users table.

What an unmitigated disaster!

Do I have to choose between beautiful OOP code with a million queries VS. 1 query and writing these joins by hand for every single superset? Is there no system that automates this?

I'm using CodeIgniter, and looking into countless other MVC's, and projects that were built in them, and cannot find a single good example of anyone using models without resorting to one of the two flawed methods I've outlined.

It appears this has never been done before.

One of my coworkers is writing a framework that does exactly this - you create a class that includes a model of your data. Other, higher models can include that single model, and it crafts and automates the table joins to create the higher model that includes object instantiations of the lower model, all in a single query. He claims he's never seen a framework or system for doing this before, either.

Please Note: I do indeed always use separate classes for logic and persistence. (VOs and DAOs - this is the entire point of MVCs). I have merely combined the two in this thought-experiment, outside of an MVC-like architecture, for simplicity's sake. Rest assured that this issue persists regardless of the separation of logic and persistence. I believe this article, introduced to me by James in the comments below this question, seems to indicate that my proposed solution (which I've been following for years) is, in fact, what developers currently do to solve this issue. This question is, however, attempting to find ways of automating that exact solution, so it doesn't always need to be coded by hand for every superset. From what I can see, this has never been done in PHP before, and my coworker's framework will be the first to do so, unless someone can point me towards one that does.

And, also, of course I never load data in constructors, and I only call the load() methods that I create when I actually need the data. However, that is unrelated to this issue, as in this thought experiment (and in the real-life situations where I need to automate this), I always need to eager-load the data of all subsets of children as far down the line as it goes, and not lazy-load them at some future point in time as needed. The thought experiment is concise -- that it doesn't follow best practices is a moot point, and answers that attempt to address its layout are likewise missing the point.

EDIT : Here is a database schema, for clarity.

CREATE TABLE `groups` (
  `group_id` int(11) NOT NULL,  <-- Auto increment
  `make` varchar(20) NOT NULL,
  `model` varchar(20) NOT NULL
)

CREATE TABLE `groups_users` ( <-- Relational table (many users to one group)
  `group_id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL
)


CREATE TABLE `users` (
  `user_id` int(11) NOT NULL, <-- Auto increment
  `name` varchar(20) NOT NULL,
  `height` int(11) NOT NULL,
)

(Also note that I originally used the concepts of wheels and cars, but that was foolish, and this example is much clearer.)

SOLUTION:

I ended up finding a PHP ORM that does exactly this. It is Laravel's Eloquent. You can specify the relationships between your models, and it intelligently builds optimized queries for eager loading using syntax like this:

Group::with('users')->get();

It is an absolute life saver. I haven't had to write a single query. It also doesn't work using joins, it intelligently compiles and selects based on foreign keys.

Leng
  • 2,590
  • 1
  • 17
  • 28
  • If you think about it, you could only have 3 queries. 1 to get city dealerships, 2 to get dealership cars, and 1 to get car wheels. How you retrieve the data is up to you, but you can still have elegant OOP code to reflect the objects. – Kenny Thompson Nov 22 '13 at 02:40
  • @KennyThompson I suspect that dealership would actually be closer to something like a [repository](http://martinfowler.com/eaaCatalog/repository.html)-like structure. Just because your DB contains 10'000 cars, does not mean that it is practical to turn them all in objects. – tereško Nov 22 '13 at 02:46
  • @Kenny No, you only need 1 query to get a city (and -all- of its contained elements). You simply perform joins. However, that is not OOP. – Leng Nov 22 '13 at 02:49
  • I feel like (reading other comments as well) we are forgetting the fact we can get a SET of data back from the database and instantiate objects for each item in the retrieved set. dealer(41) can then query one time to get a set of 50 cars and the cars can each get a set of wheels. – Kenny Thompson Nov 22 '13 at 03:04
  • You can have one repository method to get a subset of cars (total 1 query) and then get each cars wheels (1 query each). So if a page needed to display 10 cars, that would equal 11 queries (instead of 50 as claimed above). – Kenny Thompson Nov 22 '13 at 03:14
  • I'd say this is a good example which will benefit from a data-abstraction layer or DAOs. http://www.odi.ch/prog/design/php/guide.php – James P. Nov 22 '13 at 04:19
  • @James - Yes, I use MVC frameworks. So my VO-like classes, and the Models are the DAOs that perform the DB queries. I simply combined them here for simplicity and clarity, but the same issues persist regardless. – Leng Nov 22 '13 at 04:22
  • @James I might be totally missing something, though. – Leng Nov 22 '13 at 04:23
  • 1
    @Leng this is what is called the N+1 problem which is a classic with ORMs. You could have a DAO specific to groups using a more efficient query. Here is one article on the subject, specific to PHP. Other concepts you can look up are lazy and eager loading. http://webadvent.org/2011/a-stitch-in-time-saves-nine-by-paul-jones – James P. Nov 22 '13 at 13:58
  • 1
    @James Perfect, you've answered my question... query-and-stitch is precisely what I proposed in my solution in my question above, and it's what I've been doing for years. I suppose when my coworker releases this framework he's designing, he will be the first to have created truly OOP eager-fetching object-including models (where you never have to construct the joins and stitches yourself - once you create your child class, all parent classes that relate to it can do the joins to eager fetch all related children themselves.) – Leng Nov 22 '13 at 20:56
  • @Kenny Yes, that is exactly what I proposed in my solution (getting a SET of data back and instantiating objects for each of them). That is -exactly- the problem - that needs to be automated, or you'll have to write the joins yourself for each superset of each child to retrieve them all in a single query. My coworker is writing a framework now that does exactly this, and I'm -shocked- that he's the first to do so. – Leng Nov 22 '13 at 21:01
  • 1
    @Leng If looking to add lazy loading I've heard that the Hibernate framework in Java uses a proxy design pattern. – James P. Nov 23 '13 at 02:54

3 Answers3

9

Say you have a "wheel" class, which loads its data from the wheels table in its constructor

Constructors should not be doing any work. Instead they should contain only assignments. Otherwise you make it very hard to test the behavior of the instance.

Now, we have a "car" class, which loads its data from the cars table joined with the cars_wheels table and creates wheel objects from the returned wheel_ids:

No. There are two problems with this.

Your Car class should not contain both code for implementing "car logic" and "persistence logic". Otherwise you are breaking SRP. And wheels are a dependency for the class, which means that the wheels should be injected as parameter for the constructor (most likely - as a collection of wheels, or maybe an array).

Instead you should have a mapper class, which can retrieve data from database and store it in the WheelCollection instance. And a mapper for car, which will store data in Car instance.

$car = new Car;
$car->setId( 42 );
$mapper = new CarMapper( $pdo );
if ( $mapper->fetch($car) ) //if there was a car in DB
{
    $wheels = new WheelCollection;
    $otherMapper = new WheelMapper( $pdo );

    $car->addWheels( $wheels );
    
    $wheels->setType($car->getWheelType());
    // I am not a mechanic. There is probably some name for describing 
    // wheels that a car can use
    $otherMapper->fetch( $wheels );
}

Something like this. The mapper in this case are responsible for performing the queries. And you can have several source for them, for example: have one mapper that checks the cache and only, if that fails, pull data from SQL.

Do I really have to choose between beautiful OOP code with a million queries VS. 1 query and disgusting, un-OOP code?

No, the ugliness comes from fact that active record pattern is only meant for the simplest of usecases (where there is almost no logic associated, glorified value-objects with persistence). For any non-trivial situation it is preferable to apply data mapper pattern.

..and if I make a "city" class, in its constructor I'll need to join the cities table with the cities_dealerships table with the dealerships table with the dealerships_cars table with the cars table with the cars_wheels table with the wheels table.

Jut because you need data about "available cares per dealership in Moscow" does not mean that you need to create Car instances, and you definitely will not care about wheels there. Different parts of site will have different scale at which they operate.

The other thing is that you should stop thinking of classes as table abstractions. There is no rule that says "you must have 1:1 relation between classes and tables".

Take the Car example again. If you look at it, having separate Wheel (or even WheelSet) class is just stupid. Instead you should just have a Car class which already contains all it's parts.

$car = new Car;
$car->setId( 616 );

$mapper = new CarMapper( $cache );
$mapper->fetch( $car );

The mapper can easily fetch data not only from "Cars" table but also from "Wheel" and "Engines" and other tables and populate the $car object.

Bottom line: stop using active record.

P.S.: also, if you care about code quality, you should start reading PoEAA book. Or at least start watching lectures listed here.

my 2 cents

Community
  • 1
  • 1
tereško
  • 56,151
  • 24
  • 92
  • 147
  • Yes, that's how his original code was made, but I seemed to fail logically =] You can always replace the line with `$wheels->setCarId( $car->getId() );` or if you want to get wheel for more then one car: `$wheels->forCars(array($id1, $id2, .. etc));`. The API might change, but the underlaying idea is: separate SQL from [domain logic](http://c2.com/cgi/wiki?DomainObject). – tereško Nov 22 '13 at 02:48
  • Updated ... the whole `$wheel = new WheelCollection` bit on second look actually seems somewhat .. emm ... really dumb thing to do. – tereško Nov 22 '13 at 03:19
  • Thank you so much. I am trying so hard to create the pseudo-code for your system, [please see this PasteBin](http://pastebin.com/JPW1dyci). As you can see, I'm stuck on what to put in the `WheelCollection` class, as well as in the `WheelMapper`'s `fetch()` method, which accepts a `WheelCollection` class object, whilst the `CarMapper`'s `fetch()` method accepts a `Car` class object. – Leng Nov 22 '13 at 03:20
  • I see you've added onto your answer. Unfortunately, I do indeed need a `Wheel` class. I need to grab individual wheels without needing to grab a car. Bear in mind that this is just a thought experiment, there are no cars or wheels anywhere. It might as well be persons and groups, bricks and walls, letters and words, etc. The parts that make a whole. – Leng Nov 22 '13 at 03:27
  • In general, you can think of "collection class" as glorified array (or maybe a registry). It need methods for setting up some conditions, for adding new "things" to it and for retrieving individual instance from it. The "add stuff" method will used by mapper, when getting data from storage, or by some external code, when adding new "things" before mapper stores them. Maybe [this](http://stackoverflow.com/a/11943107/727208) helps a bit. – tereško Nov 22 '13 at 03:32
  • Thank you so much - I'm so sorry, but I'm simply not getting it. Tere, I would be -so- grateful if you could [complete the pseudo-code in this PasteBin](http://pastebin.com/JPW1dyci) so that I can see what you mean. I'm especially confused by why the `CarMapper`'s `fetch()` method takes a `Car` object whilst the `WheelMapper`'s `fetch()` method takes a "glorified array (or maybe a registry)" of `Wheel` objects. – Leng Nov 22 '13 at 03:35
  • And the bit about "your classes are not required to mirror tables" still applies. Even if the code is just for study. It is even possible that you application will have domain objects that have no tables in DB at all. For example, when working with "reports". It will map data from multiple table (logs, news, comments, etc.) to a `ReportEntry` instance, but there will be no underlaying persistent storage for it. If you separate persistence and domain logic, you are not constrained by "must have table" rule of thumb. – tereško Nov 22 '13 at 03:38
  • I understand that, and I have written countless classes that do not have state or underlying datasets. Thank you so much for your help, but I'm still very confused, and would be just so grateful if you could please, please [complete the pseudo-code in this Pastebin](http://pastebin.com/JPW1dyci) that I've constructed to attempt to understand your methodology. – Leng Nov 22 '13 at 03:41
  • If that would be too time consuming for you, I understand, and I still appreciate your help and will study intensely all of the links that you have given me, especially the ones relating to data mapping and domain objects. – Leng Nov 22 '13 at 03:46
  • I am not sure, if this is what you were looking for, but I added some code that's relevant for the collection: https://gist.github.com/teresko/a22e67ee19360d63946e . Though I find myself struggling with the "wheel has an ID for the car" concept in your example. It's causing some major cognitive dissonance. – tereško Nov 22 '13 at 03:57
  • Thank you Teresko, I am looking at your code now. To clarify, I never said that a wheel has a car_id. Cars, however, have wheel_ids. One wheel_id for each wheel. Two cars can have the same wheel_ids. – Leng Nov 22 '13 at 04:02
  • The `cars` table doesn't contain wheel information. The `cars_wheels` table is a relational table between `cars` and `wheels`, assigning multiple `wheel`s to individual `car`s. – Leng Nov 22 '13 at 04:03
  • So basically, if you manage to fetch car by it's ID, then you can extract the `wheel_id` from it, which you assign to the collection to be used as condition. And then you fetch the wheels and populate the collections. BTW, there is one more nice thing you can take advantage off: if you need to pull out the wheels by different conditions (not only the number , but for example by whether it is fron or back wheel) you can store the `Wheel` instance in two arrays, because that will not duplicate the amount of objects (objects are assigned by handlers). – tereško Nov 22 '13 at 04:09
  • I've added the database schema to the bottom of my question for clarity. – Leng Nov 22 '13 at 04:16
  • Reposted it as public: https://gist.github.com/teresko/7594789 , which should let you edit it, but I honestly don't see the point. This is only vague example. Also, you current scheme means that same wheel can be on multiple cars **simultaneously** =/ – tereško Nov 22 '13 at 04:18
  • Is it possible to achieve this in an MVC? If so, where do collection classes get created? Are they models? Where are mappers? – Leng Nov 22 '13 at 04:19
  • I could pay you for your time if you'd like to enlighten me on Skype: screenname JunkGrave – Leng Nov 22 '13 at 04:20
  • Domain objects and collections of domain objects are created in [service layer](http://martinfowler.com/eaaCatalog/serviceLayer.html) (well, technically, if you were trying to use TDD, they are create by factory which you would inject in the services) same as mappers. The interaction between persistence (mappers) and domain model's implementation (domain objects) is called "application logic" and that is the primary responsibility of services. In the linked page the "MVC model" would be all three of concentric circles together. – tereško Nov 22 '13 at 04:30
  • I want the same wheel on multiple cars simultaneously! The same user can be in multiple groups, can he not? – Leng Nov 22 '13 at 04:30
  • Imagine it as users and groups. I want a group to always contain all users. The same user can be in multiple groups. Any time I fetch a group object, I want all user objects within it to be fetched. – Leng Nov 22 '13 at 04:31
  • Model in MVC is not the same as "domain model". Domain model is a term that describes the accumulated knowledge-base in the project. Domain objects are code, that has been written to implement *part* of that knowledge. The MVC model on other hand is a layer (one of two major layers in MVC, alongside "presentation layer"). Presentation layer (controllers and views .. and some other things) interact with model layer via services. – tereško Nov 22 '13 at 04:34
  • As for the has-and-belongs-to-many, the point was that wheels and cars was just a bad example. – tereško Nov 22 '13 at 04:36
  • Very true - look at my updated question. It uses a much better thought experiment now. Humble apologies for the confusion. – Leng Nov 22 '13 at 04:39
  • Back to "MVC thing". You also have to understand that most of modern frameworks do not really implement MVC architecture. Instead they focus on mimicking Rails. Rails is really good at what it was meant for (rapid prototyping), but for that it required to sacrifice some parts. It simplified model layer by merging domain logic with persistence and moving application logic to the controllers. At the same time the views were replaced with templates and UI logic too was moved to controllers. It kinda tends to create mess, when you try to do more then fast prototyping. – tereško Nov 22 '13 at 04:41
  • Haha yeah, I'm still looking for the little remnants I've missed in my find/replace... – Leng Nov 22 '13 at 04:43
  • Thanks again for all your help. I wish I had a clue of where to go from here. Do you think I should abandon MVC's? Why are they so popular if I can't do something -so- simple in them? – Leng Nov 22 '13 at 04:50
  • Abandon - no. But you should understand what is actually the goal of it. The list of lecture, that I linked you to, should help. It starts with simple concepts and gradually delves in more complicated OOP ideas. As for your code, you should simply focus on separating the persistence from the domain of logic. That will solve your immediate problem. As for "why is MVC so popular?" part - well .. it is popular for advertising frameworks. In reality, frameworks do not implement MVC. The user's code does or (quite often) doesn't. – tereško Nov 22 '13 at 04:59
  • I am, indeed, going to watch these lectures - I'm already started on the first. :) Thanks! – Leng Nov 22 '13 at 05:00
  • For the record: James Paulson has, in his comment to my answer, identified this is as the "N+1 Problem", and there is an article on it [here](http://webadvent.org/2011/a-stitch-in-time-saves-nine-by-paul-jones). It proposes as a solution, -exactly- the same solution that I propose in my question, which I've been doing for years already, which is why I created this question in the first place - to find a way to automate the solution! It looks like my coworker will be the first person to ever write a PHP framework that does this, I'll let you guys know when he releases it. – Leng Nov 22 '13 at 21:05
  • And tere, I do indeed always separate persistence from the logic of my objects. (That's the entire point of MVCs.) In my examples, I've merely combined them for simplicity of the presentation of the issue. – Leng Nov 22 '13 at 21:11
2

ActiveRecord in Rails implements the concept of lazy loading, that is deferring database queries until you actually need the data. So if you instantiate a my_car = Car.find(12) object, it only queries the cars table for that one row. If later you want my_car.wheels then it queries the wheels table.

My suggestion for your pseudo code above is to not load every associated object in the constructor. The car constructor should query for the car only, and should have a method to query for all of it's wheels, and another to query it's dealership, which only queries for the dealership and defers collecting all of the other dealership's cars until you specifically say something like my_car.dealership.cars

Postscript

ORMs are database abstraction layers, and thus they must be tuned for ease of querying and not fine tuning. They allow you to rapidly build queries. If later you decide that you need to fine tune your queries, then you can switch to issuing raw sql commands or trying to otherwise optimize how many objects you're fetching. This is standard practice in Rails when you start doing performance tuning - look for queries that would be more efficient when issued with raw sql, and also look for ways to avoid eager loading (the opposite of lazy loading) of objects before you need them.

Chris Bloom
  • 3,383
  • 1
  • 29
  • 45
  • Thank you so much for your response. Of course, if I have a `class User` which contains a `public $friends` array that is made up of other user objects, I only populate it with `user->getFriends()` when I actually need that data. I'm familiar with that practice, and in that way, I can actually use `user` objects for each friend in the array. However, in the cars/wheels situation, I -always- need -every- car to have -every- wheel, and -every- dealership to have -every- car, etc. Thus, said design model will not work. – Leng Nov 22 '13 at 02:42
  • "The car constructor should query for the car only" - That's not what a constructor should do. – Ja͢ck Nov 22 '13 at 02:45
  • Also, if instantiating a `user` object loads data with a query, then calling `user->getFriends()` is going to perform a query for every single friend a user has, as it instantiates a `user` object for each friend, which brings us back to the same issue. – Leng Nov 22 '13 at 02:45
  • Leng, then wouldn't you then optimize `user->getFriends()` so that it fetchs all of the friends at once in a single query? I'm having trouble thinking of a reasonable scenario in which one would always need to have ALL of the data ALL of the time. That seems like an antipattern. – Chris Bloom Nov 22 '13 at 02:47
  • @Jack - conceivably your constructor could do whatever you needed it to do. I was just saying in the context of his pseudo code, I wouldn't include fetching all child and parent models in whatever method finds and instantiates that one object. – Chris Bloom Nov 22 '13 at 02:49
  • @Chrisbloom7 Just because you *can* do anything you want doesn't mean that you should; "doing stuff" in a constructor is not considered a good practice. – Ja͢ck Nov 22 '13 at 02:57
  • @Chrisbloom But each user object needs to load itself, otherwise I have to replicate the loading code and queries in the `getFriends()` function, which would mean breaking OOP and instead follow exactly the same 2nd example I had, where I simply write every query to join relational tables together by hand. – Leng Nov 22 '13 at 02:57
  • @Jack In reality, my constructors instead call a separate load() function. – Leng Nov 22 '13 at 02:58
  • @Chrisbloom Thank you for your explanation of ORMs. So CodeIgniter's Database Forge class is indeed an ORM. – Leng Nov 22 '13 at 03:00
  • @Jack you're missing the forrest for the trees. Yes, a constructor is a bad place to put a method to load something from the database. I was commenting on what he had in his pseudo code, which by definition is not real code meant for execution. The point was to avoid eager loading of associated data when instantiating a single object. – Chris Bloom Nov 22 '13 at 03:04
  • 1
    @Leng - It depends on how you write your `user` class. Maybe you write it so that you can instantiate it using data that has been prefetched, so that you do a single lookup of all friends, then pass each row's data into the User constructor. There are ways to do it, it just takes more code the more efficient you want it to be. – Chris Bloom Nov 22 '13 at 03:06
  • @Chrisbloom7 I'm aware of what you were trying to say; it's just something to keep in mind lest your statement is somehow misinterpreted by others :) also, there's only one 'r' in my forests heh – Ja͢ck Nov 22 '13 at 03:07
  • I ended up finding a Rails-like ORM in Laravel's [Eloquent](http://laravel.com/docs/eloquent). I'm super happy with it. Thanks again for your help, Chris! :) – Leng Jan 03 '14 at 00:52
0

In general, I'd recommend having a constructor that takes effectively a query row, or a part of a larger query. How do do this will depend on your ORM. That way, you can get efficient queries but you can construct the other model objects after the fact.

Some ORMs (django's models, and I believe some of the ruby ORMs) try to be clever about how they construct queries and may be able to automate this for you. The trick is to figure out when the automation is going to be required. I do not have personal familiarity with PHP ORMs.

Sam Hartman
  • 5,613
  • 2
  • 19
  • 34
  • Apologies, but I do not understand what you mean when you say that I should have a "constructor that takes effectively a query row". I assume you mean this: `public function __construct($query_row) {`. Can you please give a pseudo-code example of a `wheel` and a `car` class from my question's thought-experiment using your proposed design? – Leng Nov 22 '13 at 02:32
  • This sounds like a mapper. – James P. Nov 22 '13 at 14:00