Of course, other ideas except running char by char iteratively are welcome as well, run-time is important to me.
Of course, any regex also has to run through the string character by character, too. Don't rule out the "naive" solution so easily: it turns out the simple way is more efficient than all three of the posted answers so far.
Here's a solution like your "naive" one: but it doesn't require a stack, because there is only one kind of open-bracket. Even with multiple kinds of brackets, you only need a stack if you also want to detect when the brackets are mismatched.
def chars_at_level(s):
out = ['<']
nesting_level = 0
for c in s:
if c == '<':
nesting_level += 1
elif c == '>':
nesting_level -= 1
elif nesting_level == 1:
out.append(c)
out.append('>')
return ''.join(out)
Example:
>>> s = '<5629476219<421fsaas42f>14222<2412f2<2421savsar21>12vsaf21>412<<<142avsa1>1a24>421>421>'
>>> chars_at_level(s)
'<562947621914222412421>'
Now for the performance comparison. It beats the other three solutions, though Seb's solution is close.
>>> timeit(lambda: chars_at_level(s))
7.594452977000401
>>> timeit(lambda: parse(s)) # Seb's solution using re.sub
7.817124693000096
>>> timeit(lambda: regex_sub(s)) # bobble bubble's recursive regex
9.322779934999744
>>> timeit(lambda: nested_list(s)) # Ajax1234's nested list solution
17.795835303999866
However, Seb's solution is much less efficient in the worst case, on strings like <<<<<<1>>>>>>
, because it does O(n) replacements on strings of length O(n), giving a running time of O(n²). The other two posted solutions still seem to be about O(n) on this kind of string, though I had to increase the system recursion limit for Ajax1234's solution to work. The "naive" solution is still fastest.
>>> t = (1000 * '<') + '1' + (1000 * '>')
>>> timeit(lambda: chars_at_level(t), number=1000)
0.1329130509998322
>>> timeit(lambda: parse(t), number=1000) # Seb's solution using re.sub
31.281542531000014
>>> timeit(lambda: regex_sub(t), number=1000) # bobble bubble's recursive regex
0.705901896999876
>>> timeit(lambda: nested_list(t), number=1000) # Ajax1234's nested list solution
1.1296931150000091
By the way, even if you do want to augment the "naive" solution with a stack, it still only takes O(n) time. It's also fairly trivial to change this algorithm to get the characters at any other nesting level, too.