2

I am trying to write a program that spawns a bunch of threads and then joins the threads at the end. I want it to be interruptible, because my plan is to make this a constantly running program in a UNIX service.

The idea is that worker_pool will contain all the threads that have been spawned, so terminate can be called at any time to collect them.

I can't seem to find a way to utilize the chan_select crate to do this, because this requires I spawn a thread first to spawn my child threads, and once I do this I can no longer use the worker_pool variable when joining the threads on interrupt, because it had to be moved out for the main loop. If you comment out the line in the interrupt that terminates the workers, it compiles.

I'm a little frustrated, because this would be really easy to do in C. I could set up a static pointer, but when I try and do that in Rust I get an error because I am using a vector for my threads, and I can't initialize to an empty vector in a static. I know it is safe to join the workers in the interrupt code, because execution stops here waiting for the signal.

Perhaps there is a better way to do the signal handling, or maybe I'm missing something that I can do.

The error and code follow:

MacBook8088:video_ingest pjohnson$ cargo run
   Compiling video_ingest v0.1.0 (file:///Users/pjohnson/projects/video_ingest)
error[E0382]: use of moved value: `worker_pool`
  --> src/main.rs:30:13
   |
24 |     thread::spawn(move || run(sdone, &mut worker_pool));
   |                   ------- value moved (into closure) here
...
30 |             worker_pool.terminate();
   |             ^^^^^^^^^^^ value used here after move
<chan macros>:42:47: 43:23 note: in this expansion of chan_select! (defined in <chan macros>)
src/main.rs:27:5: 35:6 note: in this expansion of chan_select! (defined in <chan macros>)
   |
   = note: move occurs because `worker_pool` has type `video_ingest::WorkerPool`, which does not implement the `Copy` trait

main.rs

#[macro_use]
extern crate chan;
extern crate chan_signal;
extern crate video_ingest;

use chan_signal::Signal;
use video_ingest::WorkerPool;
use std::thread;
use std::ptr;

///
/// Starts processing
/// 
fn main() {
    let mut worker_pool = WorkerPool { join_handles: vec![] };

    // Signal gets a value when the OS sent a INT or TERM signal.
    let signal = chan_signal::notify(&[Signal::INT, Signal::TERM]);

    // When our work is complete, send a sentinel value on `sdone`.
    let (sdone, rdone) = chan::sync(0);

    // Run work.
    thread::spawn(move || run(sdone, &mut worker_pool));

    // Wait for a signal or for work to be done.
    chan_select! {
        signal.recv() -> signal => {
            println!("received signal: {:?}", signal);
            worker_pool.terminate(); // <-- Comment out to compile
        },
        rdone.recv() => {
            println!("Program completed normally.");
        }
    }
}

fn run(sdone: chan::Sender<()>, worker_pool: &mut WorkerPool)  {
    loop {
        worker_pool.ingest();
        worker_pool.terminate();
    }
}

lib.rs

extern crate libc;

use std::thread;
use std::thread::JoinHandle;
use std::os::unix::thread::JoinHandleExt;
use libc::pthread_join;
use libc::c_void;
use std::ptr;
use std::time::Duration;

pub struct WorkerPool {
    pub join_handles: Vec<JoinHandle<()>>
}

impl WorkerPool {

    ///
    /// Does the actual ingestion
    ///
    pub fn ingest(&mut self) {

        // Use 9 threads for an example.
        for i in 0..10 {
            self.join_handles.push(
                thread::spawn(move || {

                    // Get the videos
                    println!("Getting videos for thread {}", i);
                    thread::sleep(Duration::new(5, 0));
                })
            );
        }
    }

    ///
    /// Joins all threads
    ///
    pub fn terminate(&mut self) {
        println!("Total handles: {}", self.join_handles.len());

        for handle in &self.join_handles {
            println!("Joining thread...");

            unsafe {
                let mut state_ptr: *mut *mut c_void = 0 as *mut *mut c_void;
                pthread_join(handle.as_pthread_t(), state_ptr);
            }
        }

        self.join_handles = vec![];
    }
}
  • 2
    Welcome to Stack Overflow! Do you already understand why stopping an arbitrary thread is a [Very Bad Idea](http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html) (not a language-specific issue)? Beyond that, you need to provide a [MCVE]. Right now, the code presented seems more like a wishlist and an implicit request for the community to write the implementation for you. The surface-visible issue **appears** to be that `WorkerPool` doesn't implement `Copy`, so moving it [transfers ownership](https://doc.rust-lang.org/stable/book/ownership.html). – Shepmaster Oct 13 '16 at 17:35
  • You should also include the *error messages* you receive and show what [research and attempts at fixing it you've already performed](http://meta.stackoverflow.com/q/261592/155423). – Shepmaster Oct 13 '16 at 17:36
  • Thanks for the quick response. I've included the full code of the WorkerPool struct as well as the errors I'm receiving on compile. I don't want to stop the threads; I want to collect them with join. I agree stopping them would not be a good idea. – Paul Howard Johnson Oct 13 '16 at 19:08
  • I was able to remove the first error using guidance from [here](http://stackoverflow.com/questions/28158738/cannot-move-out-of-borrowed-content). Thanks for that tip. I wish I didn't have to make it unsafe, though. – Paul Howard Johnson Oct 14 '16 at 06:07
  • Also, the join isn't working even though it compiles. It looks like I'm getting a memory address for the pthread_t, but the join never completes. – Paul Howard Johnson Oct 14 '16 at 12:57
  • Got the join working by changing the way I initialized the state_ptr. Now I just have that last problem with early termination. The joins work correctly now and terminate waits till they complete, but I still can't get it to work on interrupt; the program just exits since I need to comment out the call to wait on the threads. – Paul Howard Johnson Oct 14 '16 at 13:56

1 Answers1

2

terminate can be called at any time to collect them.

I don't want to stop the threads; I want to collect them with join. I agree stopping them would not be a good idea.

These two statements don't make sense to me. You can only join a thread when it's complete. The word "interruptible" and "at any time" would mean that you could attempt to stop a thread while it is still doing some processing. Which behavior do you want?

If you want to be able to stop a thread that has partially completed, you have to enhance your code to check if it should exit early. This is usually complicated by the fact that you are doing some big computation that you don't have control over. Ideally, you break that up into chunks and check your exit flag frequently. For example, with video work, you could check every frame. Then the response delay is roughly the time to process a frame.

this would be really easy to do in C.

This would be really easy to do incorrectly. For example, the code currently presented attempts to perform mutation to the pool from two different threads without any kind of synchronization. That's a sure-fire recipe to make broken, hard-to-debug code.

// Use 9 threads for an example.

0..10 creates 10 threads.


Anyway, it seems like the missing piece of knowledge is Arc and Mutex. Arc allows sharing ownership of a single item between threads, and Mutex allows for run-time mutable borrowing between threads.

#[macro_use]
extern crate chan;
extern crate chan_signal;

use chan_signal::Signal;
use std::thread::{self, JoinHandle};
use std::sync::{Arc, Mutex};

fn main() {
    let worker_pool = Arc::new(Mutex::new(WorkerPool::new()));

    let signal = chan_signal::notify(&[Signal::INT, Signal::TERM]);

    let (work_done_tx, work_done_rx) = chan::sync(0);

    let worker_pool_clone = worker_pool.clone();
    thread::spawn(move || run(work_done_tx, worker_pool_clone));

    // Wait for a signal or for work to be done.
    chan_select! {
        signal.recv() -> signal => {
            println!("received signal: {:?}", signal);
            let mut pool = worker_pool.lock().expect("Unable to lock the pool");
            pool.terminate();
        },
        work_done_rx.recv() => {
            println!("Program completed normally.");
        }
    }
}

fn run(_work_done_tx: chan::Sender<()>, worker_pool: Arc<Mutex<WorkerPool>>)  {
    loop {
        let mut worker_pool = worker_pool.lock().expect("Unable to lock the pool");
        worker_pool.ingest();
        worker_pool.terminate();
    }
}

pub struct WorkerPool {
    join_handles: Vec<JoinHandle<()>>,
}

impl WorkerPool {
    pub fn new() -> Self {
        WorkerPool {
            join_handles: vec![],
        }
    }

    pub fn ingest(&mut self) {
        self.join_handles.extend(
            (0..10).map(|i| {
                thread::spawn(move || {
                    println!("Getting videos for thread {}", i);
                })
            })
        )
    }

    pub fn terminate(&mut self) {
        for handle in self.join_handles.drain(..) {
            handle.join().expect("Unable to join thread")
        }
    }
}

Beware that the program logic itself is still poor; even though an interrupt is sent, the loop in run continues to execute. The main thread will lock the mutex, join all the current threads1, unlock the mutex and exit the program. However, the loop can lock the mutex before the main thread has exited and start processing some new data! And then the program exits right in the middle of processing. It's almost the same as if you didn't handle the interrupt at all.

1: Haha, tricked you! There are no running threads at that point. Since the mutex is locked for the entire loop, the only time another lock can be made is when the loop is resetting. However, since the last instruction in the loop is to join all the threads, there won't be anymore running.

I don't want to let the program terminate before all threads have completed.

Perhaps it's an artifact of the reduced problem, but I don't see how the infinite loop can ever exit, so the "I'm done" channel seems superfluous.

I'd probably just add a flag that says "please stop" when an interrupt is received. Then I'd check that instead of the infinite loop and wait for the running thread to finish before exiting the program.

use std::sync::atomic::{AtomicBool, Ordering};

fn main() {
    let worker_pool = WorkerPool::new();

    let signal = chan_signal::notify(&[Signal::INT, Signal::TERM]);
    let please_stop = Arc::new(AtomicBool::new(false));

    let threads_please_stop = please_stop.clone();
    let runner = thread::spawn(|| run(threads_please_stop, worker_pool));

    // Wait for a signal
    chan_select! {
        signal.recv() -> signal => {
            println!("received signal: {:?}", signal);
            please_stop.store(true, Ordering::SeqCst);
        },
    }

    runner.join().expect("Unable to join runner thread");
}

fn run(please_stop: Arc<AtomicBool>, mut worker_pool: WorkerPool)  {
    while !please_stop.load(Ordering::SeqCst) {
        worker_pool.ingest();
        worker_pool.terminate();
    }
}
Shepmaster
  • 274,917
  • 47
  • 731
  • 969
  • I really appreciate the detailed answer and edits. When you say the program logic is "poor", do you mean the solution above doesn't solve the program in the best way, or was that part of your footnote joke? What would you do differently? – Paul Howard Johnson Oct 14 '16 at 14:59
  • In regards to your first comment about interrupting the threads, I do want to make sure they complete, and that is why I was using join. I don't want to let the program terminate before all threads have completed. Do you still see a conflict there? – Paul Howard Johnson Oct 14 '16 at 15:01
  • @PaulHowardJohnson by "poor", I mean "probably not the expected behavior". The code does all of that work to accept a signal and do something in response, but the program can *still exit while work is happening*. If it's OK to exit while work is in process, then I wouldn't do anything special to handle an interrupt and just let it kill the program; the default behavior. – Shepmaster Oct 14 '16 at 15:12
  • Also, the footnote isn't intended as a joke, *per se*. It's just another way of pointing out that there seem to be mismatched expectations present in the code. There's no way that there can be any threads running if we only exit the loop when all the threads have stopped running... – Shepmaster Oct 14 '16 at 15:17
  • But that is the crux of the issue: the program can't stop running until all work is completed, because it could potentially corrupt the data that is being received from the calls to get the video objects. If the database insert was happening at the time the program is terminated, then that would be a serious issue. Your `please_stop` code is exactly what I was thinking about before I saw your response. We just need to make sure that the loop exits in an orderly fashion, and then we are guaranteed that all the threads have been joined and we are good to go. – Paul Howard Johnson Oct 14 '16 at 17:15
  • And you are right about the C thing. Rust actually prevented me from making the logic error we are talking about with the need for the please stop signal for the main thread, which is really cool. I was just frustrated :). – Paul Howard Johnson Oct 14 '16 at 18:23