During camera calibration, the usual advice is to use many images (>10) with variations in pose, depth, etc. However I notice that usually the fewer images I use, the smaller the reprojection error. For example with 27 images, cv::calibrateCamera returns 0.23 and with just 3 I get 0.11 This may be due to the the fact that during calibration we are solving a least squares problem for an overdetermined system.
QUESTIONS:
Do we actually use the reprojection error as an absolute measure of how good a calibration is? For example, if I calibrate with 3 images and get 0.11, and then calibrate with 27 other images and get 0.23 can we really say that "the first calibration is better"?
OpenCV uses the same images both for calibration and for calculating the error. Isn't that some form of overfitting? Wouldn't it be more correct if I actually used 2 different sets -one to compute the calibration parameters and one to compute the error-? In that case, I would use the same (test) set to calculate the error for all my calibration results from different (training) sets. Wouldn't that be more fair?