Questions tagged [cpd]

CPD: Copy and Paste Detector: a tool for finding where source code has been duplicated/cloned.

CPD: Copy and Paste Detector: a tool for finding where source code has been duplicated/cloned.

There are 3 fundamental types of these tools:

  • Those that match text strings or lines exactly; they have essentially zero knowledge of the actual language being processed. These find exact clones; changes in formatting or additional comments prevent detection of larger matches. They can be fast and scalable, but only find exact copies, and thus don't produce good answers if the cloned code has been edited, which is the common case. Summary: cheap, easy, weak detection ability.
  • Token-based detectors. These detectors know roughly have to break a source code into its constituent atoms ("tokens") such as identifiers, numbers, keywords, operators, comments and whitespace. Knowledge of whitespace and comments allows the detector to match code that has been reformatted. Ignoring the content of identifiers and numbers allows such detectors to match code where names have been changed or different values have been used. But these detectors don't understand language structure, and tend to treat "} {" as clones, in spite of the fact they are uninteresting clones. As a consequence, token based detectors have to match rather long sequences of tokens to avoid producing a lot of false positive matches. Summary: better, requires very long matches to avoid flood of false positives.
  • Structure-based detectors. These know the token and language structure. Like token detectors, reformatting doesn't prevent matches. Unlike token detectors, these tools only identify clones that match language structures, such as expressions, statements, or blocks; they can never propose "} {". So they can find smaller clones reliably. They can also allow gaps between the identical parts that match stuctures, so they can recognize two identical statements separated by third differing statement, as a clone with the third statement as a parameter. This allows detection of sophisticated clones. Summary: slower, but more accurate and more interesting clones detected.

[Thanks to Semantic Designs for this background knowledge].

See http://en.wikipedia.org/wiki/Duplicate_code for more details.

30 questions
18
votes
3 answers

PMD/CPD: Ignore bits of code using comments

Is there a way to tell PMD to ignore checking parts of code for duplication? For example, can I do something like this: // CPD-Ignore-On ... // CPD-Ignore-Off Currently I have PMD set up like this using Maven, but don't see any arguments that would…
digiarnie
  • 20,378
  • 29
  • 75
  • 124
7
votes
2 answers

using cpd on python

I am trying to run the GUI version of CPD on my python codebase, but no duplicate code is returned even when i set the min chunk size to 1. My code isnt that good. has anyone ever had any success running CPD on a python project?
mkoryak
  • 54,015
  • 59
  • 193
  • 252
7
votes
4 answers

Suppress warnings from CPD for C/C++ code

We are using PMD Copy Paste Detector (CPD) to analyze our C and C++ code. However, there are a few parts of the code that are very similar, but with a good reason and we would like to suppress the warnings for these parts. The documentation of PMD…
Arno Moonen
  • 1,054
  • 2
  • 9
  • 27
5
votes
1 answer

What's CPD of the Sonarqube?

I works with Sonarqube every day in my job. But, I realized that I don't know what means CPD. Phrases like "INFO: CPD calculation finished", etc. I would like some help to know this.
5
votes
1 answer

PMD CPD exclude methods like equals and hashcode?

I cannot find an option how to tell PMD-CPD to skip specific methods. We use generated equals() and hashCode() methods, so the methods look often very similar and CPD reports a lot of them as duplicate code. I can use some //NOPMD comments in the…
hkais
  • 328
  • 3
  • 15
3
votes
1 answer

intellij idea: terminal window: making a filepath/file clickable

I'm implementing the Maven CPD PMD plug-in to to spot and (fail the build) if any instances of code duplication are presennt in the project. This all works fine. However, The output error to the terminal in intellij idea is in the form: Terminal…
Luke_P
  • 559
  • 5
  • 20
3
votes
1 answer

Jenkins static code analysis for sbt project

I have sbt project with findbugs4sbt, cpd4sbt plugins. This project is builded by Jenkins with Static Code Analysis Plug-ins. I run "sbt findbugs" and "sbt cpd" build steps after compile and see target/findbugs/report.xml and target/cpd/cpd.xml…
Dmitry Meshkov
  • 911
  • 7
  • 18
3
votes
1 answer

PMD/CPD can't detect duplicate code

I am new to PMD/CPD. I have configured PMD in my maven project as below: org.parent CustRestExampleOsgi 1.0 pom CustRestExampleOsgii
Amrit
  • 2,053
  • 4
  • 23
  • 38
3
votes
1 answer

Sonar CPD detecting blocks duplications

I have done so much analysis on how sonar cpd detects duplicate blocks.But I am not able to trigger out exactly what the process it takes to detect blocks or lines of code.Do that have any minimum number of lines. For example if I am writing as…
satheesh
  • 1,327
  • 6
  • 27
  • 40
2
votes
0 answers

copy paste detector - how to avoid delegate pattern

We use CPD and works very well. Some interfaces are implemented in multiple classes. These classes share the implementation code using 'delegate' pattern [ http://en.wikipedia.org/wiki/Delegation_pattern ]. The resulting code sometime gets bigger…
Jayan
  • 16,628
  • 12
  • 79
  • 131
2
votes
1 answer

How to get PMD maven plugin to skip generated source code?

So I'm creating a maven plugin using the maven-plugin-plugin. The HelpMojo in maven-plugin-plugin generates a java source file. Unfortunately, PMD is picking this up and complaining about it. Is there a way to have PMD ignore just a single source…
Jonathan S. Fisher
  • 7,058
  • 6
  • 37
  • 78
2
votes
0 answers

PMD detecting package and import statements as duplicate code

Is there any configuration in PMD to prevent it from identifying package statements and import statements as duplicate code? Note that we are running PMD as a Sonar plugin. Specifically, package declarations in multiple classes within the same…
Anand
  • 43
  • 5
1
vote
2 answers

How to use static code analyzer CPD ignoreLiterals and ignoreIdentifiers from command-line?

That's about PMD static analyzer's feature: Copy-Paste Detector. Yes, I read http://pmd.sourceforge.net/cpd.html thoroughly. But if I run CPD from ant-task, I can fine-tune its work by specifying ignoreLiterals and ignoreIdentifiers. How can I…
Andrey Regentov
  • 3,499
  • 3
  • 27
  • 39
1
vote
1 answer

How to discover repeating code lines with PMD

A popular attempt to bypass Salesforce Apex code coverage rules are code busters where one statement is used in thousands of repeating lines. We have found variations like i=1; or a++; or a=b; with endless variations for the variable name. All have…
stwissel
  • 19,390
  • 6
  • 44
  • 90
1
vote
1 answer

How can I ignore Annotations when using Maven CPD?

I know there is an option to ignoreAnnotations in the CPD CLI reference guide but I can't seem to get this to work using maven pmd:cpd plugin. When I view the mvn pmd page it doesn't list 'ignoreAnnotations' as a usable parameter but seems like it…
1
2