How much duplicated code do you tolerate?

Question

In a recent code review I spotted a few lines of duplicated logic in a class (less than 15 lines). When I suggested that the author refactor the code, he argued that the code is simpler to understand that way. After reading the code again, I have to agree extracting the duplicated logic would hurt readability a little.

I know DRY is guideline, not an absolute rule. But in general, are you willing to hurt readability in the name of DRY?

As many times as I'm willing to fix the same bug. In other words, none/never/nada/nil/zero. — No Refunds No Returns, Feb 19 '10 at 17:37
Could you provide some code? It's hard for me to think of a duplicated code that looks better than a non-duplicated one — Samuel Carrijo, Feb 19 '10 at 17:50
I can't copy the actual code here because of the intellectual property. But the issue is in part about not being able to name the abstraction with something meaningful and therefore making the code harder to understand. I'll try to get back with an example. — Sylvain, Feb 19 '10 at 18:29
if (str == null || str.length() == 0) vs isStringNullOrEmpty(str) which is better, I know which is easier to type over and over. I know which is more descriptive and maintainable if I want to add another test. If one line of code can be refactored out to a function that is more descriptive, I can't imagine in 25 years of programming 15 lines of code that can't be named to a function. — , Feb 19 '10 at 18:50
@fuzzy: isStringNullOrEmpty is *less* descriptive when you "want to add another test". That code is just a bad example in general. — , Feb 24 '10 at 00:34
I have to accept an answer at some point. Since this question is subjective, I will just accept the answer with the most votes. Thank you all for your very valuable input. — Sylvain, Feb 24 '10 at 22:29
I once had problem with code like that, it was combination of while/break and extra code in different places. Doing parent class and 4 children classes would make the code look a lot more complicated. — IAdapter, Feb 25 '10 at 22:48
Well I don't know. How much duplicate code can you tolerate? — Tim Post, Feb 20 '10 at 11:39

Nick Dandoulakis · Accepted Answer · 2010-02-20T11:37:57.493

67

Refactoring: Improving the Design of Existing Code

The Rule of Three

The first time you do something, you just do it. The second time you do
something similar, you wince at the duplication, but you do the duplicate
thing anyway. The third time you do something similar, you refactor.

Three strikes and you refactor.

Coders at Work

Seibel: So for each of these XII calls you're writing an implementation.
Did you ever find that you were accumulating lots of bits of very similar code?

Zawinski: Oh, yeah, definitely. Usually by the second or third time you've cut and pasted
that piece of code it's like, alright, time to stop cutting and pasting and put it in a subroutine.

edited Feb 20 '10 at 11:37

answered Feb 19 '10 at 17:32

Nick Dandoulakis

40,422
14
98
136

8

+1 - Also, the rule of three isn't just a heuristic, it's something that's been studied. It applies to small redundancies as well as whole frameworks (Product Line driven development). Your payoff is actually right around the 2.5 mark for getting back what it costs to do reuse vs. copy/hack. – Chris Kessel Feb 19 '10 at 17:35
5

I believe it should be the "rule of two". See my answer to this OP's question. – Ira Baxter Feb 19 '10 at 17:40
1

The first time isn't a duplicate. The second time is. The third time would be the second duplicate, which is the hint that we should instead refactor. The key here is that you don't *always* want to try to break something out to make it reusable when you've only used it once. – Steven Sudit Feb 19 '10 at 18:56
1

Thus the tempered need to abstract, and the need to *track* where the code clones are. If you can do that, you don't need to abstract as much. If you can't do that, you'd better abstract on the second one, because you'll never find that second clone manually when the first needs updating in two years. – Ira Baxter Feb 19 '10 at 19:47
As in *Planescape*, rules of three applies. At the first time there is usually no time to do that. At the second time, there usually won't be enough time to debug one thing for three times. That's when I refactor. – gruszczy Feb 19 '10 at 22:38
The three times are really true - I do it once, do it twice, and on the third time, I say, "Screw it, time to refactor." – Jimmie Lin Feb 20 '10 at 11:54
@Chris: This rule makes no sense to me, unless coupled with some unit tests to go or the tool Ira has built. Do you have any reference for the studies? – Vinko Vrsalovic Feb 20 '10 at 11:57

score 41 · Answer 2 · answered Feb 19 '10 at 17:33

41

I tolerate none. I may end up having some due to time constraints or whatnot. But I still haven't found a case where duplicated code is really warranted.

Saying that it'll hurt readability only suggests that you are bad at picking names :-)

answered Feb 19 '10 at 17:33

Vinko Vrsalovic

244,143
49
315
361

10

+1 for the comment about picking good names. It's hard for me to see how calling a method with a sensible name would be less readable than a dozen lines of code . . . especially because when you see the same method called a second time you don't have to read a bunch of lines before deciding "Yep, same thing as before" – Tim Goodman Feb 19 '10 at 18:10
7

I tolerate none either. Duplicated code will always haunt you in the future. – GmonC Feb 19 '10 at 18:15
+1 for "picking bad names". Addition: It may also be because you picked an unfortunate refactoring. Try to refactor out a part that is meaningful on its own (as far as possible), then refactoring will actually *improve* readability (by hiding implementation details behind a nice method name). – sleske Feb 20 '10 at 14:14
1

I would like to tolerate none, but real world budget constraints of how much time I can spend on a project can dictate how much refactoring I can do. The alternative is to refactor code for free. – Peter M Feb 20 '10 at 20:12
@Peter M: Exactly. Many of my projects I can't tolerate. The fact that they work and are in production doesn't change that. I'd love to improve them. – Vinko Vrsalovic Feb 20 '10 at 20:40
3

Perhaps the reason it's hard to find a name is because it doesn't map to a nice, packageable concept. Perhaps the repeated code isn't a sensible stand-alone unit of functionality. It hurts readability if you have to track down the code before reading it, and you have to read it because you split your code into chunks that don't make sense. As sleske said, refactor based on what is meaningful, not *just* what is duplicated. – Darryl Feb 23 '10 at 20:24

score 37 · Answer 3 · answered Feb 19 '10 at 17:22

37

Personally, I prefer keeping code understandable, first and foremost.

DRY is about easing the maintenance in code. Making your code less understandable in order to remove repeated code hurts the maintainability more, in many cases, than having some repeated lines of code.

That being said, I do agree that DRY is a good goal to follow, when practical.

answered Feb 19 '10 at 17:22

Reed Copsey

522,342
70
1,092
1,340

6

I agree. DRY is a tool to use to achieve maintainability and readability. DRY itself is not the goal. – Ray Feb 19 '10 at 17:24
True. I think DRY becomes more and more important when the repeated code grows - there's no point in writing a function for a three-liner, because it just doesn't make things easier, it makes it *more complicated*. This changes, however, if these three lines are repeated tens or hundreds of times :) – OregonGhost Feb 19 '10 at 17:24
Can you please say what DRY stands for. – Layke Feb 19 '10 at 17:25
6

@Laykes: "Don't Repeat Yourself." – OregonGhost Feb 19 '10 at 17:25
As long as you have some unit tests (or any other means) to ensure you don't forget to update all repeated portions of code. If you don't have them I'd take the hit on readability. – Vinko Vrsalovic Feb 19 '10 at 17:25
could you give some examples? what is in your mind ok to copy-paste and what is not? – Evgeny Feb 19 '10 at 17:26
Now that I think about it, it would be better if it was "Diminish Repeating Yourself" ;) – OregonGhost Feb 19 '10 at 17:27
I disagree strongly, for the reasons explained in my response to Ira Baxter. – Steven Sudit Feb 19 '10 at 17:32
@Steven: Personally, in general, I agree. You should refactor, *if it's going to make the code easier to read*. There are cases, however, where sometimes the refactor does hurt readability, and the OP was suggesting that this is one of those cases. In those (rare) situations, I'd personally leave the dup. – Reed Copsey Feb 19 '10 at 17:39
1

@Reed: Exactly. So rare I haven't found one. This discussion would be a lot more enlightening if Sly would provide us with some code as Samuel says in a comment to the question. – Vinko Vrsalovic Feb 19 '10 at 17:57
3

@Vinko: The only times I've ever seen this are when the total lines of code are VERY short (2-3 lines total), AND you have logic that must run in between the duplicated lines. This can be refactored using a functional style to create DRY, but it tends to escalate the maintenance cost to do so, since it almost always forces introduction of a method pointer/lambda/delegate/etc (depending on language). – Reed Copsey Feb 19 '10 at 18:01
@Reed: Certainly, any rule might have an exception, but I've yet to see one for this rule. – Steven Sudit Feb 19 '10 at 18:47
@Reed: I think that the maintenance cost of introducing a functional style refactor would only be high on languages where doing it would require the use of tons of boilerplate code. – Vinko Vrsalovic Feb 19 '10 at 22:52
1

If it is something like if(test == -1 || test > 5) then I would just repeat it - but if it is a large piece of code like querying a Database and retrieving a set of data (from different tables), I'd definitely make a subroutine. The cost to refactor a large, full-of-duplicates application is large: In that case, I wouldn't do it. If you fix early, good. Otherwise, screw it. Just leave it duplicated. :P – Jimmie Lin Feb 20 '10 at 11:57
1

@Jimmie: Nice example :-). Even "if(test == -1 || test > 5)" could be a candidate for refactoring. If you could e.g. replace it with a call "checkInputBounds(test)", that would both remove duplication (thus easing maintenance) *and* make it easier to understand, because the method explains the *meaning* of the comparison. – sleske Feb 20 '10 at 14:17
1

@sleske: Agreed. The benefits of encapsulating the predicate is that two magic numbers are hidden away and replaced with some hint at their purpose. – Steven Sudit Feb 22 '10 at 15:10

Ira Baxter · Answer 4 · 2010-03-04T15:42:15.560

12

If the code in question has a clear business or technology-support purpose P, you should generally refactor it. Otherwise you'll have the classic problem with cloned code: eventually you'll discover a need to modify code supporting P, and you won't find all the clones that implement it.

Some folks suggest 3 or more copies is the threshold for refactoring. I believe that if you have two, you should do so; finding the other clone(s) [or even knowing they might exist] in a big system is hard, whether you have two or three or more.

Now this answer is provided in the context of not having any tools for finding the clones. If you can reliably find clones, then the original reason to refactor (avoiding maintenance errors) is less persausive (the utility of having a named abstraction is still real). What you really want is a way to find and track clones; abstracting them is one way to ensure you can "find" them (by making finding trivial).

A tool that can find clones reliably can at least prevent you from making failure-to-update-clone maintenance errors. One such tool (I'm the author) is the CloneDR. CloneDR finds clones using the targeted langauge structure as guidance, and thus finds clones regardless of whitespace layout, changes in comments, renamed variables, etc. (It is implemented for a number a languages including C, C++, Java, C#, COBOL and PHP). CloneDR will find clones across large systems, without being given any guidance. Detected clones are shown, as well as the antiunifier, which is essentially the abstraction you might have written instead. Versions of it (for COBOL) now integrate with Eclipse, and show you when you are editing inside a clone in a buffer, as well as where the other clones are, so that you may inspect/revise the others while you are there. (One thing you might do is refactor them :).

I used to think cloning was just outright wrong, but people do it because they don't know how the clone will vary from the original and so the final abstraction isn't clear at the moment the cloning act is occurring. Now I believe that cloning is good, if you can track the clones and you attempt to refactor after the abstraction becomes clear.

edited Mar 04 '10 at 15:42

answered Feb 19 '10 at 17:25

Ira Baxter

88,629
18
158
311

3

I'd have to agree with you. If the repeated code is repeatedly doing the same thing, whatever it's doing has a name and deserves to be factored out so that it can be seen in terms of the semantics of that name, not the details of the syntax. This simplifies maintenance, and actually makes the code *easier* to read. – Steven Sudit Feb 19 '10 at 17:31
2

+1: This is acually the approach many agile methods recommend: First implement new functionality, copying&pasting as you feel warranted. Then, when it's done, you refactor. The idea being that after the implementation is done, you'll know better what abstraction is appropriate. – sleske Feb 20 '10 at 14:19
@Ira: I really think encouraging cloning makes little sense. Your tool is certainly useful, but I don't follow going from there to saying that the existence of such a tool makes cloning desirable. If people do it because they are not sure yet, they might very well create the method and then create a second method, add a parameter, of even put the lines back if warranted, there are alternatives to cloning. – Vinko Vrsalovic Feb 20 '10 at 14:44
1

You can't construct the abstraction until you understand what should be abstracted. If you could look at a block of code and magically know how to produce the abstraction, then you could adopt a purist policy. Since you can't, you realistically have to do this process in stages... of which the first is to actually clone. Now, when you are *done* (and it may not be just that piece you are modifying) you *should* try to abstract. But... your language may not have a suitable abstraction mechanism (COBOL doesn't have generics) ... so if you can't declone, you should track. – Ira Baxter Feb 20 '10 at 16:13
... and of course there's the real problem of you facing a legacy system which has clones (this is a certainty, I'll take a bet on 10% minimum on anything over 100K SLOC), and you have to maintain that. So... is that block that you are editing a clone of something? You need tools to find and track the ones you haven't had a chance to hunt down and kill. – Ira Baxter Feb 20 '10 at 16:16
1

Fully agree when the languages don't allow for a suitable abstraction. But when they do, you absolutely can abstract without understanding it fully (potentially wrongly) and then refactor the abstraction when you see it failing. You might even end up concretizing that wrong abstraction if you realize it was a mistake. – Vinko Vrsalovic Feb 20 '10 at 16:22
@Ira: I totally agree on the usefulness of your tool (I've already said so.) What I don't agree with is that cloning is to be encouraged. – Vinko Vrsalovic Feb 20 '10 at 16:24
@Vinko: "... cloning makes little sense..." Er, how do you stop it? I have said explicitly you should try to declone after copy-paste-edit and there I think we agree. You can try weasel-wording the decloning attempt ("... there are alternatives to cloning...") but that's just decloning in my eyes. But if you fail to declone there needs to be some way to manage it. – Ira Baxter Feb 20 '10 at 17:09
@Ira: You stripped the 'encouraging' part that was my whole point. I think **encouraging** cloning is what makes little sense, which is what I understood you were advocating: "Clone away, with no worries, for the tool will handle it afterwards". You should still worry about it, even with a great clone tracking tool. – Vinko Vrsalovic Feb 20 '10 at 17:12
@Vinko: That's because I didn't say "encourage", you did. People clone code because they think they know what it does, and believe they can change it, and believe that's much easier than re-inventing what it does. So there's an abstraction ("what they think it does") and a parameterization ("how it needs to be different"), and work saved. So cloning will continue (whether I encourage it or not). Yes, we agree that one should worry about it. – Ira Baxter Feb 20 '10 at 17:18
@Ira: I just mentioned it because you misquoted me. But we seem to be in agreement for the most part. For the record, I've never claimed you could stop it, you can't. But you should take action (be it code reviews, thorough unit tests, tools or all of the above and more.) – Vinko Vrsalovic Feb 20 '10 at 17:24

score 8 · Answer 5 · answered Feb 19 '10 at 17:53

As soon as you repeat anything you're creating multiple places to have make edits if you find that you've made a mistake, need to extend it, edit, delete or any other of the dozens of other reasons you might come up against that force a change.

In most languages, extracting a block to a suitably named method can rarely hurt your readability.

It is your code, with your standards, but my basic answer to your "how much?" is none ...

score 3 · Answer 6 · answered Feb 19 '10 at 17:24

3

you didn't say what language but in most IDEs it is a simple Refactor -> Extract Method. How much easier is that, and a single method with some arguments is much more maintainable than 2 blocks of duplicate code.

answered Feb 19 '10 at 17:24

1

@fuzzy: My point is not that extracting the code would be hard; it would take a few second to do. My point is that it would hurt readability because it would raise the level of abstraction. – Sylvain Feb 19 '10 at 17:26
2

@Sly: I find it hard to believe you cannot find a name for the function that would make it at least as readable as the original. – Vinko Vrsalovic Feb 19 '10 at 17:30
2

It might hurt readability of the abstracted code, but likely not; its good to know what the variation points are explicitly. But the call site, where it is used, will now say "DoFrobbish(...)" where Frobbish is well understood by the maintainers of the system. That's far more understandable than, "here's 15 lines of code, you guess what it does". (PS: Is it 15 lines? 17? 12? how do you know?) Invoking a well-named abstraction is a big win for the reader. – Ira Baxter Feb 19 '10 at 17:31
@Ira Baxter: Except that a random dozen lines or so does not constitute a nameable function. If you can point at the repetition and describe what it says, that's one thing, but breaking programs apart based on code structure is a bad thing. – David Thornley Feb 19 '10 at 17:35
2

@David: Why have you repeated the code if it's not describable or 'function-able'? – Vinko Vrsalovic Feb 19 '10 at 17:37
1

@David: Yes, the block of code has to have an identifiable purpose. See my answer this question. If it does, you should be able to give that purpose a good (perhaps domain-specific) name. – Ira Baxter Feb 19 '10 at 17:39

score 3 · Answer 7 · answered Feb 19 '10 at 17:26

3

Very difficult to say in abstract. But my own belief is that even one line of duplicated code should be made into a function. Of course, I don't always achieve this high standard myself.

answered Feb 19 '10 at 17:26

IF that one line does something in specific, I agree. – Steven Sudit Feb 19 '10 at 17:33
3

You do need to set a lower threshold. a*b is a clone of x*y if you parameterize it, but it isn't an interesting clone. I've been building clone detectors for about 10 years (http://www.semanticdesigns.com/Products/Clone) and my experience suggests that "several lines" is a good threshold for detecting them. I'll agree, however, that if you have a "one line clone" that has enough structure and sufficient business- or technology relevance, you should refactor. My clone detector measures "clone mass" rather than "lines" as a way to come closer to this ideal. – Ira Baxter Feb 19 '10 at 17:37
Yeah, "don't duplicate one line" is clearly overkill . . . even if you wrap it in a function, you'll be duplicating the function call, so how is that any better? If adding the function increases the length of your code more than removing the duplication decreases it, it's probably a sign that you shouldn't have bothered. – Tim Goodman Feb 19 '10 at 18:16
5

@Tim : "How is that any better?" It's better because it gives a name to that statement/expression. In fact, for that reason, I often extract a method for single statement even if that statement is not duplicated elsewhere. For instance, I tend to do that a lot with LINQ expressions. Just for the sake of readability. – Sylvain Feb 19 '10 at 18:23
1

@Tim: Presumably, the one line you're wrapping is complex and contains some non-obvious logic that is better hidden behind an explanatory name. For example, if it looks up a value in a table and evaluates to true when the value is in some range, this would be better as IsLegalSize() or whatver domain-specific name is best. – Steven Sudit Feb 19 '10 at 18:53
1

@Sly, @Steven: Fair enough, I could certainly see how a one line method could enhance readability if the one line is something that's hard to understand. Of course in that case the duplication wasn't really the problem (although I suppose having a hard to understand line occur twice is twice as bad as having it occur once...) – Tim Goodman Feb 19 '10 at 19:33
@Tim: Good point. The real issue isn't the presence of duplication but the absence of encapsulation. – Steven Sudit Feb 22 '10 at 15:12

score 3 · Answer 8 · answered Feb 19 '10 at 22:55

Refactoring can be difficult, and this depends on the language. All languages have limitations, and sometimes a refactored version of duplicated logic can be linguistically more complex than the repeated code.

Often duplications of code LOGIC occur when two objects, with different base classes, have similarities in the way they operate. For example 2 GUI components that both display values, but don't implement a common interface for accessing that value. Refactoring this kind of system either requires methods taking more generic objects than needed, followed by typechecking and casting, or else the class hierarchy needs to be rethought & restructured.

This situation is different than if the code was exactly duplicated. I would not necessarily create a new interface class if I only intended it to be used twice, and both times within the same function.

score 2 · Answer 9 · answered Jul 06 '10 at 08:29

2

I accept NO duplicate code. If something is used in more than one place, it will be part of the framework or at least a utility library.

The best line of code is a line of code not written.

answered Jul 06 '10 at 08:29

Turing Complete

919
2
11
19

score 1 · Answer 10 · answered Feb 19 '10 at 17:24

1

The point of DRY is maintainability. If code is harder to understand it's harder to maintain, so if refactoring hurts readability you may actually be failing to meet DRY's goal. For less than 15 lines of code, I'd be inclined to agree with your classmate.

answered Feb 19 '10 at 17:24

Darryl

5,326
1
21
30

2

Why would it hurt readability? – Steven Sudit Feb 19 '10 at 18:05
1

First of all, because the original question said it did. Both he and his classmate agreed. But beyond simply taking the OP at his word, let's look at why that might be. If the driver for what and when you refactor is purely motivated by duplication of code, the repeated code chunk may not map to a nice abstract concept. If you have to locate and examine the code to understand how it affects the calling function, you've added complexity instead of removing it. Duplication by itself isn't reason enough to pull out a chunk of code. Duplication plus a nice concept to wrap it in is. – Darryl Feb 23 '10 at 20:17
@StevenSudit Consider abstracting out an if/else logic flow into a generic method that takes some number of functions as its arguments ("if successful do f(x), else do g(x), etc). I'd argue that this is less readable compared to an idiomatic if/else flow duplicated across methods. I'd also say the abstraction adds rigid constraints that might cause maintainability issues later (if one client changes its if/else flow requirement). – Noel Jul 31 '16 at 11:30

score 1 · Answer 11 · answered Feb 19 '10 at 17:35

In general, no. Not for readability anyway. There is always some way to refactor the duplicated code into an intention revealing common method that reads like a book, IMO.

If you want to make an argument for violating DRY in order to avoid introducing dependencies, that might carry more weight, and you can get Ayende's opinionated opinion along with code to illustrate the point here.

Unless your dev is actually Ayende though I would hold tight to DRY and get the readability through intention revealing methods.

BH

score 0 · Answer 12 · answered Feb 19 '10 at 17:25

It really depends on many factors, how much the code is used, readability, etc. In this case, if there is just one copy of the code and it is easier to read this way then maybe it is fine. But if you need to use the same code in a third place I would seriously consider refactoring it into a common function.

score 0 · Answer 13 · answered Feb 19 '10 at 17:33

Readability is one of the most important things code can have, and I'm unwilling to compromise on it. Duplicated code is a bad smell, not a mortal sin.

That being said, there are issues here.

If this code is supposed to be the same, rather than is coincidentally the same, there's a maintainability risk. I'd have comments in each place pointing to the other, and if it needed to be in a third place I'd refactor it out. (I actually do have code like this, in two different programs that don't share appropriate code files, so comments in each program point to the other.)

You haven't said if the lines make a coherent whole, performing some function you can easily describe. If they do, refactor them out. This is unlikely to be the case, since you agree that the code is more readable embedded in two places. However, you could look for a larger or smaller similarity, and perhaps factor out a function to simplify the code. Just because a dozen lines of code are repeated doesn't mean a function should consist of that dozen lines and no more.

Arguably, you can refactor out the dozen lines, then further refactor the new method to break it up further to reflect its internal cohesion. — Steven Sudit, Feb 19 '10 at 18:54
Or factor out a larger number of lines and use parameters, perhaps. However, a dozen lines by themselves do not necessarily have much cohesion. — David Thornley, Feb 19 '10 at 19:05

How much duplicated code do you tolerate?

13 Answers13

Linked