What's the best strategy for unit-testing database-driven applications?

Question

I work with a lot of web applications that are driven by databases of varying complexity on the backend. Typically, there's an ORM layer separate from the business and presentation logic. This makes unit-testing the business logic fairly straightforward; things can be implemented in discrete modules and any data needed for the test can be faked through object mocking.

But testing the ORM and database itself has always been fraught with problems and compromises.

Over the years, I have tried a few strategies, none of which completely satisfied me.

Load a test database with known data. Run tests against the ORM and confirm that the right data comes back. The disadvantage here is that your test DB has to keep up with any schema changes in the application database, and might get out of sync. It also relies on artificial data, and may not expose bugs that occur due to stupid user input. Finally, if the test database is small, it won't reveal inefficiencies like a missing index. (OK, that last one isn't really what unit testing should be used for, but it doesn't hurt.)
Load a copy of the production database and test against that. The problem here is that you may have no idea what's in the production DB at any given time; your tests may need to be rewritten if data changes over time.

Some people have pointed out that both of these strategies rely on specific data, and a unit test should test only functionality. To that end, I've seen suggested:

Use a mock database server, and check only that the ORM is sending the correct queries in response to a given method call.

What strategies have you used for testing database-driven applications, if any? What has worked the best for you?

I think you should still have database indexes in a test environment for cases like unique indexes. — dtc, Feb 15 '19 at 17:35
I pesonally don't mind this question here but if we go by the rules, this question is not for _stackoverflow_ rather it is for _softwareengineering.stackexchange_ website. — ITExpert, Aug 30 '19 at 02:44

score 160 · Accepted Answer · edited Dec 16 '09 at 16:19

160

I've actually used your first approach with quite some success, but in a slightly different ways that I think would solve some of your problems:

Keep the entire schema and scripts for creating it in source control so that anyone can create the current database schema after a check out. In addition, keep sample data in data files that get loaded by part of the build process. As you discover data that causes errors, add it to your sample data to check that errors don't re-emerge.
Use a continuous integration server to build the database schema, load the sample data, and run tests. This is how we keep our test database in sync (rebuilding it at every test run). Though this requires that the CI server have access and ownership of its own dedicated database instance, I say that having our db schema built 3 times a day has dramatically helped find errors that probably would not have been found till just before delivery (if not later). I can't say that I rebuild the schema before every commit. Does anybody? With this approach you won't have to (well maybe we should, but its not a big deal if someone forgets).
For my group, user input is done at the application level (not db) so this is tested via standard unit tests.

Loading Production Database Copy:
This was the approach that was used at my last job. It was a huge pain cause of a couple of issues:

The copy would get out of date from the production version
Changes would be made to the copy's schema and wouldn't get propagated to the production systems. At this point we'd have diverging schemas. Not fun.

Mocking Database Server:
We also do this at my current job. After every commit we execute unit tests against the application code that have mock db accessors injected. Then three times a day we execute the full db build described above. I definitely recommend both approaches.

edited Dec 16 '09 at 16:19

Ivan Nevostruev

26,025
8
61
80

answered Sep 28 '08 at 03:45

Mark Roddy

23,770
18
62
69

40

Loading a production database copy also has security and privacy implications. Once it gets big, taking a copy of it and putting it into your dev environment can be a big deal. – WW. Mar 29 '09 at 23:43
honestly, this is a huge pain. Im new to testing and i also wrote an orm i want to test. I already used your first method, but read that it doesnt make the test unit. I use specific db engine functionality and so mocking a DAO is gonna be hard. I think ill just use my current method since it works and others use it. Automated tests rock btw. Thanks. – frostymarvelous Aug 06 '11 at 14:04
2

I manage two different large projects, in one of them this approach was perfect, but we've been having a lot of troubles trying to implement this is in the other project. So I think that depends on how easily could be recreated the schema each time to execute the tests, I'm currently working on finding a new solution for this ever last problem. – Cross Apr 12 '13 at 15:37
2

In this case, it's definitely worth it to use a database versioning tool like Roundhouse - something that can run migrations. This can be run on any DB instance and should make sure that schemas are up to date. In addition, when migration scripts are written, test data should be written as well - keeping migrations and data in sync. – jedd.ahyoung May 03 '16 at 19:31
better use monkey patching and mocking and avoid writing operations – Nickpick Aug 24 '18 at 09:45
It's worth mentioning that the production database may contain a user's private information. So it requires extra processing for legal reasons. I would rather stick to the first approach. – 이준형 May 17 '20 at 12:47

score 59 · Answer 2 · answered Nov 24 '08 at 15:34

59

I'm always running tests against an in-memory DB (HSQLDB or Derby) for these reasons:

It makes you think which data to keep in your test DB and why. Just hauling your production DB into a test system translates to "I have no idea what I'm doing or why and if something breaks, it wasn't me!!" ;)
It makes sure the database can be recreated with little effort in a new place (for example when we need to replicate a bug from production)
It helps enormously with the quality of the DDL files.

The in-memory DB is loaded with fresh data once the tests start and after most tests, I invoke ROLLBACK to keep it stable. ALWAYS keep the data in the test DB stable! If the data changes all the time, you can't test.

The data is loaded from SQL, a template DB or a dump/backup. I prefer dumps if they are in a readable format because I can put them in VCS. If that doesn't work, I use a CSV file or XML. If I have to load enormous amounts of data ... I don't. You never have to load enormous amounts of data :) Not for unit tests. Performance tests are another issue and different rules apply.

answered Nov 24 '08 at 15:34

Aaron Digulla

297,790
101
558
777

1

Is speed the only reason for using (specifically) an in-memory DB? – rinogo Jan 31 '14 at 22:16
2

I guess another advantage might be its "throwaway" nature - no need to clean up after yourself; just kill the in-memory DB. (But there are other ways to accomplish this, such as the ROLLBACK approach you've mentioned) – rinogo Jan 31 '14 at 22:17
1

The advantage is that each test can chose its strategy individually. We have tests which do the work in child threads which means that Spring will always commit the data. – Aaron Digulla Feb 03 '14 at 08:36
@Aaron: we are following this strategy too. I would like to know what is your strategy to assert that the in-memory model has the same structure than the real db ? – Guillaume Jul 14 '14 at 08:37
1

@Guillaume: I'm creating all databases from the same SQL files. H2 is great for this since it supports most of the SQL idiosyncrasies of the major databases. If that doesn't work, then I use a filter which takes the original SQL and converts that into the SQL for the in-memory database. – Aaron Digulla Jul 14 '14 at 08:57
@Aaron, thank you for this advise.Have you tried this with a very large schema ? We have hundreds of tables, and we test only a rather short subset of the schema. Our current strategy is a bit weak: we programaticaly recreate the subset. Since the full in memory db (we use h2 too) is recreated before each test, I fear that we'll have some performances lag if we recreate the full schema... – Guillaume Jul 14 '14 at 09:58
@Guillaume: Please post a question. I'll answer this there. – Aaron Digulla Jul 14 '14 at 13:25
@AaronDigulla 'Just hauling your production DB into a test system translates to "I have no idea what I'm doing or why"' +1 – Matthew Skelton Dec 18 '15 at 21:46

score 14 · Answer 3 · answered Sep 28 '08 at 03:55

I have been asking this question for a long time, but I think there is no silver bullet for that.

What I currently do is mocking the DAO objects and keeping a in memory representation of a good collection of objects that represent interesting cases of data that could live on the database.

The main problem I see with that approach is that you're covering only the code that interacts with your DAO layer, but never testing the DAO itself, and in my experience I see that a lot of errors happen on that layer as well. I also keep a few unit tests that run against the database (for the sake of using TDD or quick testing locally), but those tests are never run on my continuous integration server, since we don't keep a database for that purpose and I think tests that run on CI server should be self-contained.

Another approach I find very interesting, but not always worth since is a little time consuming, is to create the same schema you use for production on an embedded database that just runs within the unit testing.

Even though there's no question this approach improves your coverage, there are a few drawbacks, since you have to be as close as possible to ANSI SQL to make it work both with your current DBMS and the embedded replacement.

No matter what you think is more relevant for your code, there are a few projects out there that may make it easier, like DbUnit.

score 13 · Answer 4 · edited May 23 '17 at 12:26

Even if there are tools that allow you to mock your database in one way or another (e.g. jOOQ's MockConnection, which can be seen in this answer - disclaimer, I work for jOOQ's vendor), I would advise not to mock larger databases with complex queries.

Even if you just want to integration-test your ORM, beware that an ORM issues a very complex series of queries to your database, that may vary in

syntax
complexity
order (!)

Mocking all that to produce sensible dummy data is quite hard, unless you're actually building a little database inside your mock, which interprets the transmitted SQL statements. Having said so, use a well-known integration-test database that you can easily reset with well-known data, against which you can run your integration tests.

score 5 · Answer 5 · answered Nov 24 '08 at 17:06

I use the first (running the code against a test database). The only substantive issue I see you raising with this approach is the possibilty of schemas getting out of sync, which I deal with by keeping a version number in my database and making all schema changes via a script which applies the changes for each version increment.

I also make all changes (including to the database schema) against my test environment first, so it ends up being the other way around: After all tests pass, apply the schema updates to the production host. I also keep a separate pair of testing vs. application databases on my development system so that I can verify there that the db upgrade works properly before touching the real production box(es).

Roman Konoval · Answer 6 · 2019-03-06T12:32:27.507

I'm using the first approach but a bit different that allows to address the problems you mentioned.

Everything that is needed to run tests for DAOs is in source control. It includes schema and scripts to create the DB (docker is very good for this). If the embedded DB can be used - I use it for speed.

The important difference with the other described approaches is that the data that is required for test is not loaded from SQL scripts or XML files. Everything (except some dictionary data that is effectively constant) is created by application using utility functions/classes.

The main purpose is to make data used by test

very close to the test
explicit (using SQL files for data make it very problematic to see what piece of data is used by what test)
isolate tests from the unrelated changes.

It basically means that these utilities allow to declaratively specify only things essential for the test in test itself and omit irrelevant things.

To give some idea of what it means in practice, consider the test for some DAO which works with Comments to Posts written by Authors. In order to test CRUD operations for such DAO some data should be created in the DB. The test would look like:

@Test
public void savedCommentCanBeRead() {
    // Builder is needed to declaratively specify the entity with all attributes relevant
    // for this specific test
    // Missing attributes are generated with reasonable values
    // factory's responsibility is to create entity (and all entities required by it
    //  in our example Author) in the DB
    Post post = factory.create(PostBuilder.post());

    Comment comment = CommentBuilder.comment().forPost(post).build();

    sut.save(comment);

    Comment savedComment = sut.get(comment.getId());

    // this checks fields that are directly stored
    assertThat(saveComment, fieldwiseEqualTo(comment));
    // if there are some fields that are generated during save check them separately
    assertThat(saveComment.getGeneratedField(), equalTo(expectedValue));        
}

This has several advantages over SQL scripts or XML files with test data:

Maintaining the code is much easier (adding a mandatory column for example in some entity that is referenced in many tests, like Author, does not require to change lots of files/records but only a change in builder and/or factory)
The data required by specific test is described in the test itself and not in some other file. This proximity is very important for test comprehensibility.

Rollback vs Commit

I find it more convenient that tests do commit when they are executed. Firstly, some effects (for example DEFERRED CONSTRAINTS) cannot be checked if commit never happens. Secondly, when a test fails the data can be examined in the DB as it is not reverted by the rollback.

Of cause this has a downside that test may produce a broken data and this will lead to the failures in other tests. To deal with this I try to isolate the tests. In the example above every test may create new Author and all other entities are created related to it so collisions are rare. To deal with the remaining invariants that can be potentially broken but cannot be expressed as a DB level constraint I use some programmatic checks for erroneous conditions that may be run after every single test (and they are run in CI but usually switched off locally for performance reasons).

If you seed the database using entities and the orm instead of sql scripts it also has the advantage that the compiler will force you to fix the seed code if you make changes to your model. Only relevant if you use static typed language of course. — daramasala, Jun 22 '19 at 21:09
So for clarification: are you using the utility functions/classes throughout your application, or just for your tests? — Ella, Mar 17 '20 at 22:58
@Ella these utility functions usually are not needed outside of test code. Think for example about `PostBuilder.post()`. It generates some values for all mandatory attributes of the post. This is not needed in production code. — Roman Konoval, Mar 18 '20 at 21:52

cchantep · Answer 7 · 2018-11-28T21:07:36.077

For JDBC based project (directly or indirectly, e.g. JPA, EJB, ...) you can mockup not the entire database (in such case it would be better to use a test db on a real RDBMS), but only mockup at JDBC level.

Advantage is abstraction which comes with that way, as JDBC data (result set, update count, warning, ...) are the same whatever is the backend: your prod db, a test db, or just some mockup data provided for each test case.

With JDBC connection mocked up for each case there is no need to manage test db (cleanup, only one test at time, reload fixtures, ...). Every mockup connection is isolated and there is no need to clean up. Only minimal required fixtures are provided in each test case to mock up JDBC exchange, which help to avoid complexity of managing a whole test db.

Acolyte is my framework which includes a JDBC driver and utility for this kind of mockup: http://acolyte.eu.org .

What's the best strategy for unit-testing database-driven applications?

7 Answers7

Rollback vs Commit

Linked