I let $p_1=1+11!$ for convenience. By Wilson's theorem if there's a prime $p$ that divides $1+11!+(11!)! = p_1 + (p_1-1)!$ then

$$(p-1)!\equiv -1\pmod p$$

And also

$$(p_1-1)!\equiv -p_1$$

So

$$(p-1)(p-2)...p_1\cdot(p_1-1)!\equiv -1$$

$$(p-1)(p-2)...p_1\cdot p_1\equiv 1$$

This way I was able to check all the primes from $p_1$ to 74000000 in 12 hours. This gives a 3.4% chance of finding a factor according to big prime country's heuristic. The algorithm has bad asymptotic complexity because to check a prime $p$ you need to perform $p-11!$ modular multiplications so there's not much hope of completing the calculation.

Note that I haven't used that $p_1$ is prime, so maybe that can still help somehow. Here's the algorithm in c++:

```
// compile with g++ main.cpp -o main -lpthread -O3
#include <iostream>
#include <vector>
#include <string>
#include <boost/process.hpp>
#include <thread>
namespace bp = boost::process;
const constexpr unsigned int p1 = 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 * 11 + 1; // 11!+1
const constexpr unsigned int max = 100'000'000; // maximum to trial divide
std::vector<unsigned int> primes;
unsigned int progress = 40;
void trial_division(unsigned int n) { // check the primes congruent to 2n+1 mod 16
for(auto p : primes) {
if(p % 16 != (2 * n + 1)) continue;
uint64_t prod = 1;
for(uint64_t i = p - 1; i >= p1; --i) {
prod = (prod * i) % p;
}
if((prod * p1) % p == 1) {
std::cout << p << "\n";
}
if(n == 0 && p > progress * 1'000'000) {
std::cout << progress * 1'000'000 << "\n";
++progress;
}
}
}
int main() {
bp::ipstream is;
bp::child primegen("./primes", std::to_string(p1), std::to_string(max), bp::std_out > is);
// this is https://cr.yp.to/primegen.html
// the size of these primes don't really justify using such a specialized tool, I'm just lazy
std::string line;
while (primegen.running() && std::getline(is, line) && !line.empty()) {
primes.push_back(std::stoi(line));
} // building the primes vector
// start 8 threads, one for each core for on my computer, each checking one residue class mod 16
// By Dirichlet's theorem on arithmetic progressions they should progress at the same speed
// the 16n+1 thread owns the progress counter
std::thread t0(trial_division, 0);
std::thread t1(trial_division, 1);
std::thread t2(trial_division, 2);
std::thread t3(trial_division, 3);
std::thread t4(trial_division, 4);
std::thread t5(trial_division, 5);
std::thread t6(trial_division, 6);
std::thread t7(trial_division, 7);
t0.join();
t1.join();
t2.join();
t3.join();
t4.join();
t5.join();
t6.join();
t7.join();
}
```

I only need to multiply integers of the order of $11!$ so standard 64 bit ints suffice.

**EDIT:** Divisor found! $1590429889$

So first of all, the Wilson's theorem trick slows down instead of speeding up after $2p_1$. Secondly, the trial division function is nearly infinitely parallelizable, which means that it's prone to being computed with a GPU. My friend wrote an implementation that can be found here. This can be run on CUDA compatible nvidia GPUs. Finding the factor took about 18 hours on a Nvidia GTX Titan X pascal.