2

I have a scrapper project that works with the asynchronous requests asks library and trio. I would like to chose how many concurrent tasks are made based on input, but my code is long and primitive

I use trio's spawning and nursery object for concurrent tasks (docs: https://trio.readthedocs.io/en/latest/reference-core.html)

Here's my sloppy code:

import trio
import asks
Number_of_workers = input("how many workers do you want?: ") #How many tasks I want between 1 and 5

async def child1(s):
    r = await s.get("https://example.com", params={"example":"example"})
    print("do something with", r.text)

async def child2():
    r = await s.get("https://example.com", params={"example":"example"})
    print("do something with", r.text)

async def child3():
    r = await s.get("https://example.com", params={"example":"example"})
    print("do something with", r.text)

async def child4():
    r = await s.get("https://example.com", params={"example":"example"})
    print("do something with", r.text)

async def child5():
    r = await s.get("https://example.com", params={"example":"example"})
    print("do something with", r.text)

async def parent(): 
    s = Session(connections=5)
    async with trio.open_nursery() as nursery:
        if int(Number_of_workers) == 1:
            nursery.start_soon(child1, s)

        elif int(Number_of_workers) == 2:
            nursery.start_soon(child1, s)
            nursery.start_soon(child2, s)

        elif int(Number_of_workers) == 3:
            nursery.start_soon(child1, s)
            nursery.start_soon(child2, s)
            nursery.start_soon(child3, s)

        elif int(Number_of_workers) == 4:
            nursery.start_soon(child1, s)
            nursery.start_soon(child2, s)
            nursery.start_soon(child3, s)
            nursery.start_soon(child4, s)

        elif int(Number_of_workers) == 5:
            nursery.start_soon(child1, s)
            nursery.start_soon(child2, s)
            nursery.start_soon(child3, s)
            nursery.start_soon(child4, s)
            nursery.start_soon(child5, s)
trio.run(parent)

I think you can understand where I'm getting at, this example code theoritically works, but it's very long for something that could probably be cut down to way less lines of code.

This kind of scheme gets especially long when dealing with 10 or 20 workers, and is always limited to a predefined amount.

Within of itself, each child is the same, same code, it just gets different data (such as the params, and the url) from an external module .py file with importlib.

Is there a way to cut this down to a more optimized code?

Tom
  • 454
  • 6
  • 24

1 Answers1

1

You can use a loop!

async def child(s):
    r = await s.get("https://example.com", params={"example":"example"})
    print("do something with", r.text)

async def parent(): 
    s = Session(connections=5)
    async with trio.open_nursery() as nursery:
        for i in range(Number_of_workers):
            nursery.start_soon(child, s)

Edit: here's a self-contained demo you can run to convince yourself that this does in fact run concurrent tasks. It also demonstrates how you can pass different parameter values to the different tasks, so they do different things – in this case, print different messages:

import trio

Number_of_workers = 10

async def child(i):
    print("child {}: started".format(i))
    await trio.sleep(5)
    print("child {}: finished".format(i))

async def parent():
    async with trio.open_nursery() as nursery:
        for i in range(Number_of_workers):
            nursery.start_soon(child, i)

trio.run(parent)

Try it and see!

Nathaniel J. Smith
  • 9,038
  • 4
  • 35
  • 46
  • Right but that defeats the purpose of concurrent tasks. There your code will run one task, wait for it to finish, and then run it again. The whole point of concurrency is to have multiple tasks running asynchronously at the same time. What I would like is to choose how many concurrent tasks are made based on user input (similar to how in other progamming languages some scrapping softwares will ask you how many threads you would like to run). My above code works for that but as previously said is quite a bit long. – Tom Apr 13 '19 at 13:13
  • No, `start_some` schedules the function you give it to start running in a task "soon", and then returns immediately. So the loop starts lots tasks, and then they all run concurrently. It's just like your original code. I'll edit in a demo. – Nathaniel J. Smith Apr 14 '19 at 18:24