0

I have gotten the page source from Amazon and used Regex to find the names of the monitors I want. On amazon you can see 3 monitors per line with prices. I essentially want the monitors at the start of each line which means every 1st, 4th line and so on.

https://www.amazon.com/Best-Sellers-Electronics-Computer-Monitors/zgbs/electronics/1292115011

My code is: (?<='true'>\n\s+)\w.*.*(?=\n\s+</div>)

How do I get every highlighted 1st, 4th, 7th, 10th line (starting at line 1 then +3) regex? Maybe find and replace?

  • Your link is broken. You should include the text you are trying to match directly in the question. The question already has a bit of a smell, because it looks like you are trying to parse HTML using regex. Instead, consider using an HTML parser. – Tim Biegeleisen Jul 16 '17 at 14:55
  • @TimBiegeleisen Link should work now sorry about that –  Jul 16 '17 at 14:59
  • @TimBiegeleisen Any recommendations? Hmm, it appears java or python html parsers. Looks like I'll be installing quite a lot –  Jul 16 '17 at 15:01
  • The link is still broken. You have neglected my comment. – Tim Biegeleisen Jul 16 '17 at 15:02
  • @TimBiegeleisen It works when I copy and paste it. For some reason stack overflow seems to create an error –  Jul 16 '17 at 15:17
  • You are using EditPad Pro, I suggest you do not add Notepad++ tag to your question. – Wiktor Stribiżew Jul 16 '17 at 17:27
  • If this is for notepad++ as tagged before the edit: To identify the 1st, 4th, 7th item eg by the number. In np++ check the checkbox *. matches newline* and try regex like [`zg_rankNumber[^>]+>\s*\b(?:[147]|1[0369]|2[258])\b.*?'true'>\s*\K[^ – bobble bubble Jul 16 '17 at 17:30
  • @bobblebubble Thank you. I tried that in notepad++ and editpad and it gave me an error. In notepad++ I got Search "zg_rankNumber[^>]+>\s*\b(?:[147]|1[0369]|2[258])\b.*?'true'>‌​\s*\K[^ –  Jul 17 '17 at 06:44
  • @bobblebubble I will try and modify it to see if it works in editpadpro. \k is not supported so have to use something else –  Jul 17 '17 at 07:30
  • Hmm have been trying to modify.. not really sure how to use a \K alternative as it just does not work in my job. I also am not too sure how to get every nth match either. I will keep on digging... –  Jul 17 '17 at 08:35
  • If your tool supports lookbehind of variable length, try something like `(?<=_(?:[147]|1[0369]|2[258])\?[^>]*>(?:]*>\s*){3}]*'true'>\s*)\b[^ – bobble bubble Jul 17 '17 at 10:43
  • @bobblebubble Hmmm no such luck. Did it work for you in editpad? I'll try and change around a few things –  Jul 17 '17 at 11:12
  • Na it worked for me in [regexstorm tool](http://www.regexstorm.net/tester). – bobble bubble Jul 17 '17 at 11:14
  • @bobblebubble Uh okay. I think a lot of regex editors vary in capabilities I guess. Hopefully I can get it working in editpadpro as the job I have tends to have capabilities similar to that. –  Jul 17 '17 at 11:23
  • @bobblebubble Are there any guides on getting nth line of a match or something of the sort. Might help me work out what to change. –  Jul 17 '17 at 11:45
  • You're parsing html. if you view the source you see, it's not nth line you need but all 3 enumerated items. You need an identifier from source (I used the link or the rank number in first regex). – bobble bubble Jul 17 '17 at 12:38
  • @bobblebubble I can't really understand how [147]|1[0369]|2[258] represent anything I'm viewing in the source. Or how that would get every 3rd? Sorry –  Jul 18 '17 at 01:57
  • In first pattern I used `zg_rankNumber` in second pattern this part from link `/ref=zg_bs_1292115011_1?`, `/ref=zg_bs_1292115011_4?`, `/ref=zg_bs_1292115011_7?` (it's a responsive website. scale your broswerwindow and you have two rows instead three. You need some item-number from source to get each 3rd item. – bobble bubble Jul 18 '17 at 09:25
  • @bobblebubble Thanks. Are there any guides online or anywhere you can point me to in the right direction so I can use this. I understand what you're saying but I feel like I would not be able to replicate. –  Jul 20 '17 at 13:39
  • You can download [notepad++](https://notepad-plus-plus.org/download/v7.4.2.html) and use [this pattern](https://regex101.com/r/Hnum3C/1). If you need a regex tutorial, try [regular-expressions.info](http://www.regular-expressions.info/) or see the [SO regex faq](https://stackoverflow.com/a/22944075/5527985). Wish you good luck. – bobble bubble Jul 20 '17 at 14:47

0 Answers0