This is pretty well studied, as mcdowella's comment indicates. Here is how the Cantor-Zassenhaus random algorithm works for the case where you want to find the roots of a polynomial, instead of the more general factorization.
Note that in the ring of polynomials with coefficients mod p, the product x(x-1)(x-2)...(x-p+1) has all possible roots, and equals x^p-x by Fermat's Little Theorem and unique factorization in this ring.
Set g = GCD(f,x^p-x). Using Euclid's algorithm to compute the GCD of two polynomials is fast in general, taking a number of steps that is logarithmic in the maximum degree. It does not require you to factor the polynomials. g has the same roots as f in the field, and no repeated factors.
Because of the special form of x^p-x, with only two nonzero terms, the first step of Euclid's algorithm can be done by repeated squaring, in about 2 log_2 (p) steps involving only polynomials of degree no more than twice the degree of f, with coefficients mod p. We may compute x mod f, x^2 mod f, x^4 mod f, etc, then multiply together the terms corresponding to nonzero places in the binary expansion of p to compute x^p mod f, and finally subtract x.
Repeatedly do the following: Choose a random d in Z/p. Compute the GCD of g with r_d = (x+d)^((p-1)/2)-1, which we can again compute rapidly by Euclid's algorithm, using repeated squaring on the first step. If the degree of this GCD is strictly between 0 and the degree of g, we have found a nontrivial factor of g, and we can recurse until we have found the linear factors hence roots of g and thus f.
How often does this work? r_d has as roots the numbers that are d less than a nonzero square mod p. Consider two distinct roots of g, a and b, so (x-a) and (x-b) are factors of g. If a+d is a nonzero square, and b+d is not, then (x-a) is a common factor of g and r_d, while (x-b) is not, which means GCD(g,r_d) is a nontrivial factor of g. Similarly, if b+d is a nonzero square while a+d is not, then (x-b) is a common factor of g and r_d while (x-a) is not. By number theory, one case or the other happens close to half of the possible choices for d, which means that on average it takes a constant number of choices of d before we find a nontrivial factor of g, in fact one separating (x-a) from (x-b).