3

I'm trying to use the new assignment expression for the first time and could use some help.

Given three lines of log outputs:

sin = """Writing 93 records to /data/newstates-900.03-07_07/top100.newstates-900.03-07_07/Russia.seirdc.March6-900.12.csv ..
Writing 100 records to /data/newstates-900.03-07_07/top100.newstates-900.03-07_07/India.seirdc.March6-900.6.csv ..
Writing 100 records to /data/newstates-900.03-07_07/top100.newstates-900.03-07_07/US.seirdc.March6-900.15.csv ..
"""

The intent is to extract just the State (Russia, India and US) and the record count (93,100,100) . So the desired result is:

[['Russia',93],['India',100],['US',100]]

This requires the following steps to be translated into Python:

  • Convert each line into a list element
  • Split by space e.g. ['Writing', '93', 'records', 'to', '/data/newstates-900.03-07_07/top100.newstates-900.03-07_07/Russia.seirdc.March6-900.12.csv', '..']
  • Split the fifth such token by '/' and retain the last element: e.g. Russia.seirdc.March6-900.12.csv
  • Split that element by '.' and retain the first (0'th) element e.g. Russia

Here is my incorrect attempt:

import fileinput
y = [[ z[4].split('/')[-1].split('.')[0],z[1]] 
     for (z:=x.split(' ')) in 
     (x:=sin if sin else fileinput.input()).splitlines())]
martineau
  • 99,260
  • 22
  • 139
  • 249
StephenBoesch
  • 46,509
  • 64
  • 237
  • 432

3 Answers3

3

For what it's worth you can also get this using regex as well which would probably be more preferred/efficient.

[list(reversed(l)) for l in re.findall(r'Writing (\d+).+\/([A-Z,a-z]+)\.', sin)]

Or more accurately (to convert the int) and for readability (as per @chepner in comments):

[[country, int(count)] for count, country in re.findall(r'Writing (\d+).+\/([A-Z,a-z]+)\.', sin)]
Jab
  • 21,612
  • 20
  • 66
  • 111
  • `[ [country, int(count)] for count, country in ... ]` would be more readable (and match the requested output better). – chepner Mar 09 '20 at 18:24
  • Useful approach. I do want to use the `walrus` for many other data munging tasks that do not lend to clever heuristics: but specifically for test parsing your way makes much sense. The addition by @chepner is also helpful. – StephenBoesch Mar 09 '20 at 18:27
  • Oh you just removed the `reverse(list)`. I think that is also helpful to mention (and not just lose completely) – StephenBoesch Mar 09 '20 at 18:27
  • I removed the reverse as it's more readable and it makes converting the count to int easier as well – Jab Mar 09 '20 at 18:31
  • Ya i "got" that - but the trick of doing reverse() is actually an additional one to keep in toolkit . Well I absorbed it already - but future readers will see the end product and not that (interesting) intermediate solution. – StephenBoesch Mar 09 '20 at 18:32
2

Is this good enough?

[[(wrds := line.split())[4].split("/")[-1].split('.')[0], wrds[1]] for line in sin.splitlines()]

I find using assignment expression redundant. You can also do this:

[[line.split('/')[-1].split('.')[0], line.split()[1]] for line in sin.splitlines()]
ori6151
  • 533
  • 5
  • 11
  • The assignment is not redundant: your second one does the `split()` twice. Imagine if that were an expensive operation. I'm awarding because this is a good answer. Actually the second one is kind of clever but also cheating: it makes use of the '/' is only existing in the last space-delimited token. – StephenBoesch Mar 09 '20 at 18:14
  • @javadba I agree with you but what I meant was in this specific problem. Also if my answer is what you were looking for you can check the checkmark near my answer. Thanks :) – ori6151 Mar 09 '20 at 18:16
  • If you have a chance: I would actually like to see the doubly nested structure. Consider not using a trick to collapse this into a single level: i.e. the OP is a toy example but the intent is to understand how to do multiple levels of nesting (where the multi levels are truly needed) and with the assignment expression. – StephenBoesch Mar 09 '20 at 18:52
  • re: doubly nested structure. You can wait on this : i'm going to create a separate question in which it will _not_ be possible to squash the levels. It will involve grouping and aggregation operations on numerical data. – StephenBoesch Mar 09 '20 at 19:22
0

Here's one way:

results = []
for line in sin.split('..'):
    if len(z := line.split(' ')) > 1 :
        results.append([line.split('/')[-1].split('.')[0], z[1]])