136

Are soft deletes a good idea or a bad idea?

Instead of actually deleting a record in your database, you would just flag it as IsDeleted = true, and upon recovery of the record you could just flag it as False.

Is this a good idea?

Is it a better idea to physically delete the record, then move it to an archive database, and if the user wants the record back, then software will look for the record in the archive and recreate it?

Danny Beckett
  • 18,294
  • 21
  • 100
  • 129
001
  • 55,049
  • 82
  • 210
  • 324
  • 17
    @Brian - they use shredders. – Omar Apr 15 '10 at 04:03
  • 30
    Use delete timestamps, not flags. – Dave Jarvis Apr 24 '10 at 00:45
  • 13
    I worked on a application that used an `IsDeleted` attribute on a table. The result was frequent reoccurring bugs. The problem should be obvious: every query that a regular user wanted to run against the table involved `non-deleted` data, meaning 99.9% of queries involving this table had to add `...AND IsDeleted = 'N'` to its `WHERE` clause. Naturally, it was often omitted: either the coder forgot to add it or weren't aware they had to add it in the first place. Of course, there was no requirement for this in the original spec, the coder was using their initiative... – onedaywhen Mar 05 '12 at 10:28
  • 2
    @onedaywhen - the problem you described can easily be fixed by using a `VIEW`. If `IsDeleted` is added afterwards, it should be even possible to rename the original table, say `mytable` to `mytable_all` and then name the view as `mytable` and add another view called `mytable_deleted`. That way no references to the table need to be changed. – tuomassalo Sep 05 '13 at 09:58
  • @DaveJarvis - Wouldn't it be better to use flags and delete timestamps ? That way you can know which ones are soft deleted. You can also find out when they were deleted. – Steam Dec 19 '13 at 19:31
  • 1
    @blasto: Use audit tables and audit columns (deleted, created, updated, and by whom) to track why rows were deleted. Depending on the problem domain, knowing why data was deleted might not be relevant. If it is relevant, then add a notes column: the audit tables will trace the revision history. – Dave Jarvis Dec 19 '13 at 23:30
  • It might be a little late but I suggest everyone to check Pinal Dave's blog post about logical/soft delete: http://stackoverflow.com/a/26125927/538387 – Tohid Sep 30 '14 at 16:42

15 Answers15

100

I say it's a bad idea, generally (with some exceptions, perhaps).

First, your database should be backed up regularly, so you should never be in a situation where you would lose data permanently because of a DELETE (unless it's a deletion of just-added data, of course).

Second, a soft delete like this means you now have to include a WHERE IsDeleted = false clause in every query on this table (and so much worse if you're JOINing these tables). A mistake here would be caught as soon as a user or tester noticed a deleted record showing up again, which might take some time. Also, it would be easy for a developer to omit the WHERE clause from COUNT(*) queries, which might take even longer to discover (I worked on one project where this had been happening for years; not many records were ever "deleted", so the totals were close to what was expected and no one noticed).

Finally, a soft delete will work on a table with artificial keys, but potentially won't work on a table with a natural primary key (e.g. you "delete" someone from a table keyed by Social Security Number - what do you do when you need to add him back? Please don't say "include IsDeleted in a compound primary key".).

In a design review, I would expect the developer to demonstrate an awareness of the costs and benefits and to present an excellent reason for doing soft deletes in this manner. "Why not do it?" is not an excellent reason.

MusiGenesis
  • 71,592
  • 38
  • 183
  • 324
  • 6
    It's fairly trivial to create a view and materialize it if need be, that excludes all is_deleted records. All queries can then be run against the view. As far as recovering from a backup - say for something like re-enabling a user account - you really don't want to go running to the archives to restore their data. You'd like to set is_deleted = false. This is a fairly primitive (and unlikely) scenario as you'd probably use a disabled field - but the point is the same. For some cases, to be able to recover the deleted data instantly could be worthwhile. – Josh Smeaton Apr 01 '10 at 13:06
  • 40
    @Josh: *everything* in SQL programming is fairly trivial; it's the accumulation of lots of trivial things that starts to cause problems. As I said in my last sentence, I would expect a developer to have a good reason for increasing the complexity of the system in this way. I'm disappointed but not at all surprised to see so many developers say they do this as a matter of course, regardless of the actual needs of their application. – MusiGenesis Apr 01 '10 at 13:14
  • 1
    @MG, "unless it's a deletion of just-added data, of course", Your RDBMS doesn't have a transaction log? you can't do point in time recovery? – Stephanie Page Apr 05 '10 at 13:56
  • 2
    Let me support MG. You have a timestamp field. You frequently range scan it via an index... everything between sysdate -2 and sysdate. If you soft delete frequently that range scan could return a high percentage of rows that will have to be filtered out. Filtering = expensive, range scan = cheap. – Stephanie Page Apr 05 '10 at 14:01
  • 1
    @Josh, if you're going to materialize the view, why not just make it a table from the state. Mviews are a good way to fix something that's broken - I'd hesitate to design that way from the start. – Stephanie Page Apr 05 '10 at 14:03
  • @steph I was explicitly referring to coding against the exclusion of is-deleted rows. It'd be much nicer to not have to remember to exclude the column in every where clause. In general I agree with you though. – Josh Smeaton Apr 05 '10 at 20:57
  • 1
    How do you deal with Report when you are using the hard delete. E.g. a common eCommerce site, once product has been deleted, how are you going to display the historical order report? – Vincent Jul 21 '11 at 02:47
  • @JoshSmeaton - This is why most systems partition their tables on the deletion flag. – Travis J Apr 11 '12 at 18:58
  • 3
    @TravisJ I no longer agree with the concept of soft deletes in the majority of cases. 'Just incase' isn't a good reason for soft deletes I think. – Josh Smeaton Apr 11 '12 at 22:32
  • 6
    Also worth mentioning: a `UNIQUE` index in combination with soft deletes can make for a mess. E.g. trying to register a deleted username will fail, even though it is not in use. – Danny Beckett Jul 13 '13 at 21:38
  • 3
    +1 for not using soft deletes unless you really have reasons to do that. What is not mentioned here is that referential integrity stops working the right way with soft deleted rows. If you mark row as deleted you should mark every child rows as deleted as well and if you don't do that DBMS will not prevent such logical deletion. So "not having problems with cascade delete" is not a case here. Why don't you remove the whole FK then? It would also allow you to delete rows without checking for the referenced records. – Nicolai Shestakov Jan 17 '14 at 10:40
  • 2
    @Vincent, if you need to use deleted data in your app, you're not dealing with _delete_ operation, it has another business meaning. In case "IsDeleted" is used only for archiving purpose, I would use separate table. And if you are worried about possible data loss on wrong deletes, what about wrong updates? Maybe you should consider using fully functional audit in this case? – Nicolai Shestakov Jan 17 '14 at 11:03
  • @MusiGenesis - are there any examples of where soft-delete can be disastrous ? – Erran Morad Apr 13 '14 at 01:53
  • 1
    I agree with the notion that "it depends" and "by default, don't soft-delete". There are other strategies to consider, one of which is replication to a soft-deleted archive table updated via triggers. The biggest reasons NOT to soft-delete are performance (filtering), storage space (grow-only behavior), noise (out of 2,423,023 records, only two records are legit), and hassle (setting up views, always being wary to avoid hard deletes, etc). Hard deleting makes life easier and simpler. Not to mention cascade delete feature is built into SQL Server which can make life even easier. – Jon Davis May 15 '14 at 19:57
93

It's never a bad idea to avoid potential data loss.

I always soft-delete. In cases where the database needs to be scrubbed of one or more records, I generally employ either a two-step process of soft deletion and then emptying a "recycle bin" of records, or a document-management-style approach where document records can be aged away, and then go through an approval process prior to hard deletion.

Robert Harvey
  • 168,684
  • 43
  • 314
  • 475
  • 4
    The system often still needs the data for integrity, auditing or history of change... soft delete away! Use cleanup processes for real deletions etc... PK :-) – Paul Kohler Mar 31 '10 at 01:49
  • 35
    +1: `I always soft-delete.` – KMån Mar 31 '10 at 05:45
  • 3
    so you don't have to check for where deleted=0 on every query, you can make a view for the table that does that for you. you can then make a view called recycle_bin or similar that shows only deleted records (doing a union on tables that do soft deletes on their common fields) – Neil McGuigan Jan 29 '12 at 10:25
  • 2
    +1 for using a safe, hybrid, 2-step process – StartupGuy May 07 '13 at 18:19
  • How do you handle relations of the soft-deleted data? For example you delete a record in `company_departments` table. Then you have `users` table and there is a FK `department_id` in there. Would you in this case update users table and set `department_id` FK to null where it matches the soft deleted department? – Primoz Rome Sep 13 '16 at 07:07
  • @PrimozRome: Soft deletes do not require that. They only require that you set the flag. If you want to purge older records, you do it in the usual way. – Robert Harvey Sep 13 '16 at 07:19
  • @RobertHarvey yeah I understand that, but if I have soft deleted a record ant that record's ID is a foreign key to some other table, that table needs to update that foreign key right? – Primoz Rome Sep 13 '16 at 07:23
32

It depends on the circumstances. I could see situations where you are legally required to truly delete something. Maybe someone has requested that their social security number be permanently removed from your system. Or maybe you have a duplicate record that you want to consolidate into a single record. Keeping the duplicate hanging around with a deleted flag might not be advantageous.

There is also one technical disadvantage: You can't do cascading deletions, which automatically clear out any references to the deleted data to prevent foreign key violations. This isn't necessarily a big issue, but it's something to keep in mind.

Otherwise, I think it's a good idea.

devuxer
  • 39,573
  • 46
  • 163
  • 281
24

If you're going to use soft deletion, it's a good idea to have a deleted_date field, instead of an is_deleted field. You get a nice piece of extra data instead of just the bit field.

Josh Smeaton
  • 43,953
  • 24
  • 121
  • 160
  • 3
    +1 for using DateTime fields instead of Bits... the number of old systems I have to manage with an "IsSomething" and "DateSomethinged" is annoyingly high... – Keith Williams Apr 11 '10 at 22:00
  • We have recently been asked to create the isDeleted bit as a column directly in the table... requiring it's value to be managed. We previously had this bit in the view, calculated based on the business rule of deleteDate IS NOT NULL and deleteDate < Getdate(). Now I'm begrudgingly complying. – Brandon Wittwer Jan 15 '14 at 15:25
  • @BrandonWittwer, why? If you had a `first_name` column, and a `last_name` column, would you also have a `full_name` column just because the application wanted had a `full_name` attribute, or would you compute it in the app? By the way, the `deleteDate < Getdate()` should be redundant. Also, without knowing your backend, perhaps you could implement a computed column so you don't have to manage maintaining the integrity of two separate columns yourself. http://technet.microsoft.com/en-us/library/ms191250(v=sql.105).aspx – Josh Smeaton Jan 15 '14 at 21:41
  • @JoshSmeaton, you've exactly framed my frustration. Data which can be constituted entirely from other data in the same record should DEFINITELY be part of a view. If materialization matters (as a performance consideration), a computed column may be considered. Of course, in this case, a computed isn't possible, because the getDate() in there...non-deterministic. – Brandon Wittwer Jul 31 '14 at 12:29
  • 1
    and technically SomeStatusesDateField < GetDate() is necessary when you can postdate the effect... like PublishDate = three weeks from now... isPublished is false until then. – Brandon Wittwer Jul 31 '14 at 12:30
20

One of the major problem for soft delete is those unwanted data will potentially affects the db performance. Several years ago one of my Client requested me to do soft delete on all database items, my solution to that is to move all "deleted" items to a backup table, instead of leaving it to the current running tables.

xandy
  • 26,791
  • 8
  • 57
  • 63
  • 4
    +1 for moving the data to another table! – scunliffe Mar 31 '10 at 01:58
  • 15
    You might instead partition the table by "IsDeleted" to mitigate performance implications if its possible to do so in your db. Then you don't have to worry about two different tables. – Michael Petito Mar 31 '10 at 02:49
17

It's a good idea when and if an invalid delete is absolutely catastrophic and recovery should be simple. It's also a good idea if you want to keep track of everything that has ever been and "delete" really only means "hide." Meaning, it's up to the situation.

Anthony Pegram
  • 114,815
  • 25
  • 210
  • 245
9

I will not try to be "politically correct about it". If you are advocating soft-delete then you need to go for a brain checkup.

1) First, what exactly are you achieving by not deleting the rows in table? Just the fact that sometime in future you can access those rows, right? So why not just create an archive Table and move the rows there? what is wrong with that?

2) With soft-delete you are creating unnecessary query on is_active or query on some time-stamp column. That is just waste when you would be writing simpler queries. Yes, it will work with a view but are views not an extra appendage? Every view is an extra SQL, extra performance cost, down under in any commercial RDBMS everything is a table only. There is nothing magical about views apart from the fact that you do not know how to write queries on top of tables.

3) Yes, it will work with a View or MV. But then I have seen queries in production doing FTS and everything still works! The wonders of modern hardware and solid software. But then that does not make it right either. So by same logic, just because it works does not mean it is RIGHT

4) The complexities of soft delete never ever stops at a simple select.

A) Suppose you had a UNIQUE constraint. Now you soft-delete a row but the column with UNIQUE constraint is still there. When you want to add the same data back in, you cannot do that without additional "tricks".

B) You may have associations going from Table A to Table B and when you soft delete something from Table A, you need to ensure that independent queries on Table B take care of that fact. Suppose a typical detail page was working on some detail_id.

Now a master_id is soft deleted but you still have permalinks with detail_id of that master_id everywhere. When you do hard delete on master_id, those details simply do not exist. Now with soft delete they still exists and they have to be aware of the fact that their master_id is in soft-delete mode.

it will not stop at a simple Table_A.is_active = 0 or 1 stage.

5) Doing hard deletes is simple and right.

A) No one has to add anything extra or worried about anything anywhere.

  1. Your application logic is simpler
  2. Your database is smaller
  3. Your queries are faster

Just archive the data + related pieces and you should be good.

rjha94
  • 4,012
  • 3
  • 27
  • 36
  • 13
    That is a VERY presumptive answer. Without having all details for every situation how can you conclude that there is absolutely never any reason for using soft-deletes? Never say never. It shows you lack imagination. – StartupGuy May 07 '13 at 18:03
  • 3
    And without sharing your details - are you not making assumptions? soft-deletes are not good and I stand by my statement. share your case and I will point out your flaws. – rjha94 May 08 '13 at 19:50
  • 7
    Mmmm, I would say the popular consensus on this page, shows that you stand alone. Not only do people here document different scenarios for where it may be employed, but also practical ways to make it useful. If you choose to suggest that all these people require a "brain checkup", that is your prerogative and it reflects on you. – StartupGuy Jun 08 '13 at 19:28
  • That is fine. Life is about taking stand and validating your stand with evidence. It is not always about going with the majority. Sorry, I do not want to take any flame baits. I have seen people writing queries doing FTS and even that works! I guess all depends on where you set your bar in life. If the bar is not high, anything works. – rjha94 Jun 09 '13 at 14:29
  • 4
    Hard delete is not the only way to go. There are scenarios where you cant just hard-delete the records. Sometimes i need a reference of the deleted records ( linked inside another table ) even if it has been deleted, this is done by soft-deleting it. Moving the records inside " Archive table " is not as simple as you describe it, it will not stop at a simple move from a to b. – Wanny Miarelli Mar 01 '15 at 14:32
  • 1
    4A is a very, very good point – malhal May 14 '15 at 23:46
  • 2
    At one point, popular consensus was that Sun goes around Earth and people were burnt on stake for saying otherwise. if you really think soft deletes are a good idea then you have not managed any complex databases over a period of time. whatever you are trying to achieve by doing soft-deletes can be achieved w/o soft delete too in a clean way. So far people have just whined about being *presumptive* but no one has s upplied any data. Soft deletes are *Lazy* solution to a problem that should be done with archiving. why is the data hanging around in this table if you don't need it? – rjha94 May 19 '15 at 17:45
  • 1
    I really didn't expect the [Galileo gambit](http://rationalwiki.org/wiki/Galileo_gambit) on Stack Overflow, but hey, here it is. – Zano Oct 05 '15 at 13:10
  • 1
    @Zano if people count popular votes and don;t supply any rational argument then it is a Galileo-isque situation only. However I don;t have to recant because puny stack overflow users don;t have Pope;s influence. soft-deletes are a bad idea. The right way is archiving. – rjha94 Oct 20 '15 at 16:24
  • @rjha94: perhaps I'm missing something, but aren't hard deletes impossible in some (if not many) scenarios? You stated that "whatever you are trying to achieve by doing soft-deletes can be achieved w/o soft delete too in a clean way", but have not given any example on how to deal with the classic case of deleting a Customer when there is a Transactions table with many records pointing back to it. You can't just cascade delete all the transactions that customer made. Nor can you hard delete invoices because a product is hard-deleted. So, what is the clean solution here? – BCA Oct 19 '16 at 15:42
8

Soft deletes would also allow you to revoke DELETE privileges from the database account used by the application.

Daniel Vassallo
  • 312,534
  • 70
  • 486
  • 432
  • *sigh* A database account used by an application? That would be lovely. – aehiilrs Mar 31 '10 at 17:02
  • 12
    Not that it would be useful for anything except human mistakes.. If I'm a malicious user and gain access to it I can still "DELETE" them by doing `UPDATE table SET Field1 = 0, Field2 = 0, ... WHERE 1` – Thomas Bonini Mar 31 '10 at 17:03
5

Some times soft deletes are necessary. For example, say you have an Invoice table that references a Products table. Once you have created an Invoice with a specific Product you could then never delete that Product (and if your RI is set up correctly it won't let you).

This specific scenario assumes that you'll never want to delete the Invoices, which in a real company you probably wouldn't want to delete historical financial data.

Though there are many other cases where you would not be able to delete some data as a side effect of a dependency up the chain not being deletable for reasons business or other.

joshperry
  • 37,916
  • 15
  • 83
  • 99
  • 2
    Isn't this more of a "suspend" than a "delete"? So I would argue that "suspended" records is appropriate when you need historic data and "deleting" records is appropriate when you do not need historic data (or means to easily reverse a delete). – Paul Fleming Apr 05 '13 at 12:52
  • 1
    Very true. I sell the idea in terms of it's actual implementation and may name the fields "hidden_on" (timestamp) and "hidden_by" (user id). This is because the record is just supposed to be hidden from the end-user but not the administrators. For administration all records must still be visible as per normal except that records that are "hidden" (deleted) for users would be marked as such to the admins. If I had an archive table for every standard table in my DB my schema would be twice as large. Yuk. – StartupGuy May 07 '13 at 18:07
4

It depends on the data. Some data cannot be deleted due to legal/audit requirements.

Social networking sites on the other hand should provide an option to delete an account with all associated data, including contact info, photos, messages, etc. It's a real annoyance if they don't, e.g. Facebook.

armandino
  • 14,813
  • 16
  • 65
  • 76
4

in oracle, if you add the primary key to a recycle_bin table you make up, then add a row level security policy, you can suppress the values from all queries when the row is in the recycle bin, removing the pk from the recycle bin will automatically restore all data. no need to change your other queries to accomodate the logic.

Randy
  • 15,788
  • 1
  • 33
  • 51
3

It comes with a cost, though, because you need to update your queries and indexes to be able to exclude the deleted rows.

Maybe instead of toggling a flag, move it to another "trash can" table.

Also, one could say that is only a partial solution, because it covers only deletes, but when you update a row, you are still overwriting the old value.

In general, I'd say never delete anything unless you really have to. Disk space is cheap these days. Of course, there are limits, there is data that you are legally bound to erase, there is data that is really not all that important, and maybe you do not need to keep the old data online and in the same table (an archive somewhere would also work).

Thilo
  • 241,635
  • 91
  • 474
  • 626
1

Just to add a cent. I always soft-delete; though it does cost the performance, but very slightly. Think about the cost, when your customer complains regarding your software that stopped functioning after she performed certain actions that even she can't remember. Well, this may be a fat example, but you would never know what went wrong, who did what, what was before and what was inserted afterwards. In that case this would come handy. This functionality comes handy for auditing purpose, and many a customer requests for auditing reports of this sort.

Also, in most workflow based applications, it comes as a software feature/requirement that customer is interested in the "actions" performed on a work item; what values were assigned and who processed it, etc.

KMån
  • 9,776
  • 2
  • 29
  • 40
  • 1
    And that's exactly why it doesn't work. You're assuming every action is an insert. But what about updates ? They won't be audited, and you can't see who performed either the update or the insert this way. Soft deletes are a cheap (and disfunctional) way to not have to put in the work to build a working revision control. – Stefan Steiger Feb 19 '14 at 14:54
0

I am a fan of soft-deletes. Primarily to prevent cascading deletes. However, it takes additional code so that if you are SELECTing a child object, it joins to the parent (and all parent!) objects to make sure none of them are deleted. Alternatively you can cascade the soft-delete, but if you want to restore them later you may not know which children had already been deleted and which were deleted due to the cascade.

Additionally, I keep a revision date time and revision username on each object, so that I know who modified (or soft-deleted) it last. Then for an audit trail, I create a *History (like CustomerHistory) table that is inserted after every UPDATE to the original table. This way after an object is modified or soft-deleted, I have a record of who performed the action as well as the last known state of the object.

Brad Nabholz
  • 2,298
  • 18
  • 19
  • 2
    Wouldn't it be easier to prevent cascading deletes by un-checking the "CASCADE DELETES" box (or however it's done in your database of choice)? – MusiGenesis Apr 06 '10 at 03:52
0

I encountered soft-deletes for the following broad scenarios:

CASE 1: remove the record from being user/code visible, but have the record at the DB level since the business is interested in knowing it had that records.
These requirements are mostly driven by the business & usually at the core is perhaps a legal requirement (like @joshperry & @armandino scenarios) to have the previous record in the database & create a new record for every change made. At this point, I would look at CASE 2 & evaluate if it satifys the requirements before having an IsDeleted flag

CASE 2: audit trails to keep track of the evolution of a record - there are tons of decent articles online for keeping audit trails of records in a database

HTH.

Sunny
  • 6,086
  • 2
  • 23
  • 26