Regex to filter out CPU info (Python)

Question

I try to filter out below CPU information both cpu model and cpu frequency by using Regex in Python.

Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
Genuine Intel(R) CPU T2400 @ 1.83GHz

So far that's what I come up but still having hard time to filter out the second one.

(?(?=.*\sCPU\s@)([a-zA-Z]\d+-\d+[a-zA-Z]+)|\d+.\d+GHz)

I'm looking for something like this in my output:

i5-2520M  2.50GHz
Genuine T2400  1.83GHz

Thank you all in advanced

Here is [a quick example](https://regex101.com/r/oKJOL3/1) that covers both of the cases that you provided... although I am not sure if it will work on other input strings. — Josh Crozier, Feb 26 '17 at 03:58

score 1 · Answer 1 · edited May 23 '17 at 12:16

On this link you can play/personalized it: https://regex101.com/r/sr3zjR/1

(?x) # Free spacing mode, to allow comment and better view

# Matching the first line `i5-2520M`
([^ ]+\s*)(?=CPU\s*@)

# Matching the first line `2.50GHz`
|(?<=CPU)(\s*@\s*\d+.\d+GHz)

# Matching the second line `CPU T2400`
|(CPU\s*[^ ]+\s*)(?=@)

# Matching the second line `1.83GHz`
|\s*(?<=@)(\s*\d+.\d+GHz)

Due the regex nature, we cannot skip/jump regex sequences, this is why we need to create several matches using the | operator, for each capture group. Therefore, you can see this other question for more insight: Regular expression to skip character in capture group

These are golden places to pass by:

user · Accepted Answer · 2017-02-26T05:15:59.830

This answer is somehow different from the first one I posted. Here I attempt to match exactly what is matched on the question.

This is the new live link for this answer: https://regex101.com/r/sr3zjR/3

(?x) # Free spacing mode, to allow comment and better view

# Matching the first line `i5-2520M`                (capture group 1)
([^ ]+\s*)(?=CPU\s*@)

# Matching the first line `@ 2.50GHz`               (capture group 2)
|(?<=CPU)(\s*@\s*\d+.\d+GHz)

# Matching the `first word` on the second line.     (capture group 3)
# The `\s*$` is used to not match empty lines.
|(^[^ ]+)(?!(?:.*CPU\s*@)|\s*$) 

# Matching the second line `CPU T2400`              (capture group 4)
|(?<=CPU)(\s*[^ ]+\s*)(?=@)

# Matching the second line `1.83GHz`                (capture group 5)
|\s*(?<=@)(\s*\d+.\d+GHz)

Here as on the other answer, each capture group hold one of the elements required, therefore you can manipulate each one of them individually by refering to them by their capture group index.

On the group 2, there is the trick where I am matching the @ to allow indefinitely spaces between it and the word before it, due the positive look-behind (?<=) do not allow the use of the * operator. You can change the second group expression to this bellow, if it is of interest not match the @:

# Matching the first line `2.50GHz`                 (capture group 2)
|(?<=CPU\s@)(\s*\d+.\d+GHz)

This is the new live link for this change: https://regex101.com/r/sr3zjR/5

As on the other places on this answer we are on free spacing mode. Moreover we are required to escape the white-space with \ or just use the \s.

Regex to filter out CPU info (Python)

2 Answers2