17

I am soon to join a PHP project that has been developed over the course of several years. It's going to be huge, sparsely documented, many files, piles of code, no consitent quality level is to be expected.

How would you go about gathering as much information as possible about what is going on?

  • Autoloading is not be expected, at least not extensively, so inclued might do a good job revealing the interdependencies.

  • Having phpDocumentor digest the project files might give an idea about which classes/methods/functions are present.

  • Maybe phpCallGraph for method/function relations.

  • Profiling some generic use cases with XDebug to gain an idea about the hierarchies and concepts.

  • Inspecting important log-files ... checking out warnings, deprecated usages, errors.

  • phpinfo().

  • Maybe extracting all comments and process them into a html-file.

Didn't cover Unit-Tests, Databases, ....

What would you do? What are your experiences with mentioned tools to get the most out of them?

You can assume any condition necessary.

What statistical information could be useful to extract?

Has somebody experience with those tools?

EDIT from "PHP Tools for quality check":

EDIT 2 from Bryan Waters' answer:

Setting up a deployment / build / CI cycle for PHP projects - suggested by Pekka

EDIT 3

Just found this PDF of a talk by Gabriele Santini - "Statistical analysis of the code - Listen to your PHP code". This is like a gold mine.

Community
  • 1
  • 1
Raffael
  • 18,058
  • 12
  • 73
  • 140

5 Answers5

3

I agreee that your question does have most of the answers.

This is what I would probably do. I would probably start with Sebastian Bergman's tools, especially phploc so you can get an idea of the scope of the mess (codebase) you are looking at. It gives you class, function counts, etc not just lines of code.

Next I would look in the apache logs or google analytics and get the top 10 most requested php url's. I'd setup XDebug with profiling and run through those top 10 requests and get the files, call tree. (You can view these with a cachegrinder tool)

Finally, I'd read through the entire execution path of 1 or two of those traces, that is most representative of the whole. I'd use my Eclipse IDE but print them out and go to town with a highlighter is valid as well.

The top 10 method might fail you if there are multiple systems cobbled together. You should see quickly with Xdebug whether the top 10 are coded similarliy are if each is a unique island.

I would look at the mysql databases and try to understand what they are all for, espacially looking at table prefixes, you may have a couple of different apps stacked on top of each other. If there are large parts of the db not touched by the top 10 you need to go hunting for the subapps. If you find other sub apps run them through the xdebug profiler and then read through one of the paths that is representative of that sub app.

Now go back and look at your scope numbers from phploc and see what percentage of the codebase (probably count classes, or functions) was untouched during your review.

You should have a basic understanding of the most often run code and and idea of how many nooks and crannies and closets for skeleton storage there are.

Bryan Waters
  • 636
  • 4
  • 11
  • phploc is a GOLD-suggestion! This is just the tool I was already thinking about how to write it myself. / Looking for similarities in the code pathes for the top 10 is also gold - intuitive, yet, easy to neglect. / Relating app to code to db, then looking for unused parts in DB and figuring out to what code/app they are related to - interesting strategy! - Thanks! – Raffael Apr 03 '11 at 10:29
  • Thanks for the points! I hope you get things sorted out on your new project quickly, and I hope the previous php coders were better than most :) – Bryan Waters Apr 04 '11 at 14:46
2

Perhaps you can set up a continuous integration enviroment. In this enviroment you could gather all the statistics you want.

Jenkins is a fine CI server with loads of plugins and documentation.

pderaaij
  • 1,327
  • 12
  • 32
  • What kind of statistics could I gather with Jenkins? – Raffael Mar 26 '11 at 17:11
  • All statistics you named already. It just uses the output of the plugns to generate a report. You could take a look at: phpcpd, phpcodesniffer, pdepend, pmd – pderaaij Mar 26 '11 at 17:22
  • And anything I didn't mention as well? – Raffael Mar 26 '11 at 17:23
  • Yes, every statistic tool which outputs an xml file can be handled by Jenkins/Hudson. Jenkins is a fork of Hudson. If you google around for php and Hudson/Jenkins you will find loads of information about these tools – pderaaij Mar 26 '11 at 17:37
2

For checking problems which could be expected (duplicate code, potential bugs...), you could use some of those tools:

https://stackoverflow.com/questions/4202311/php-tools-for-quality-check

HTH

PS. I Have to say that your question, IMO, contains already a lot of great answers.

Community
  • 1
  • 1
Frosty Z
  • 20,022
  • 9
  • 74
  • 102
  • thanks ... those answers are the ones that just popped up into my mind immediately ... though being a humble and curious person, I guess there are more powerful means to that end. – Raffael Mar 28 '11 at 18:13
1

If you're into statistical stuff, have a look at the CRAP index (Change Risk Analysis and Predictions), which measures code quality.

There are a two-part nice introductory article:

First part
Second part

Imi Borbas
  • 3,493
  • 1
  • 16
  • 16
1

Having both built, and suffered from, huge spaghetti-y legacy PHP projects, I think there is only so much you will be able to do using analysis tools. Most of them will simply tell you that the project is of terrible quality :)

Unit Testing and source code documentation tools usually need some active contribution inside the code to produce usable results. That said, they all are surely worth trying out - I'm not familiar with phpCallGraph and Sebastian Bergmann's tools. Also, phpDocumentor may be able to make some sense out of at least parts of the code. PHPXref is also a cool tool to get an overview, here's a (slow) demo.

The best way to start could be just taking a simple task that needs to be done in the code base, and fight your way through the jungle, follow includes, try to grasp the library structure, etc. until the job is done. If you're joining a team, maybe have somebody nearby you can ask for guidance.

I would concentrate on making the exploring process as convenient as possible. Some (trivial) points include:

  • Use an IDE that can quickly take you to function/method, class and variable definitions

  • Have a debugger running

  • Absolutely keep everything under source control and commit every change

  • Have an environment that allows you to easily deploy a change for testing, and as easily switch to a different branch or roll everything back altogether. Here is a related question on how to set something like that up: Setting up a deployment / build / CI cycle for PHP projects

Community
  • 1
  • 1
Pekka
  • 418,526
  • 129
  • 929
  • 1,058