2

Im using OpenOffice and Notepad++.

Need to match around first 1000 symbols (or less) in text until end of the sentence (dot sign). For example:

"Once upon a time ... around 1000 symbols ... the end.",

Then you click next search and get match of another around 1000 symbols that ends with . sign and so on.

I tried regex (?s).* that matches everything and .{0,1000} that stops when reaches line break.

I think I need something like .{0,1000}\.\n\r or .{0,1000}\.\S\s. I noticed that I need include things like e.g. in the regex, otherwise it matches ...e. and leaves g. apart. How to do that?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
Stan Hf.
  • 23
  • 4
  • what do you mean when you say "**symbols**"? – Vishal Singh Mar 20 '21 at 18:59
  • `.` in regex means any character so you have to escape it like `\.` to capture the end of a sentence that ends with a period. Or you can use `[.]` as well. – Michael Vine Mar 20 '21 at 19:00
  • Vishal, by symbol I mean any character – Stan Hf. Mar 20 '21 at 19:04
  • 1
    Use `[\s\S]{1000}.*?\.`. – 41686d6564 Mar 20 '21 at 19:05
  • 1
    You want the single line flag. Some regex parsers have a way to specify flags outside the pattern, others allow you to set flags inside the pattern. For the latter, you want `(?s).{0,1000}` – Charlie Armstrong Mar 20 '21 at 19:06
  • MIchael, I tried ".{0,1000}\.\n" with no luck. – Stan Hf. Mar 20 '21 at 19:06
  • Does this answer your question? [How do I match any character across multiple lines in a regular expression?](https://stackoverflow.com/questions/159118/how-do-i-match-any-character-across-multiple-lines-in-a-regular-expression) – Charlie Armstrong Mar 20 '21 at 19:08
  • Actually,`(?s).{0,1000}\.` must work for you. – Wiktor Stribiżew Mar 20 '21 at 19:19
  • [\s\S]{1000}.*?\. by @41686d6564 works great in notepad++, thanks. I noticed that I need include things like "e.g." in the end otherwise regex match "...e." and leaves "g." apart. How to do that? – Stan Hf. Mar 20 '21 at 19:21
  • Try `(?s).{1000}(?-s).*?\.\B` – Wiktor Stribiżew Mar 20 '21 at 20:03
  • I added e.g. and ?: `(?s).{0,600}(\.|e.g.|\?)`. It works fine, but not totally tested. – Stan Hf. Mar 20 '21 at 20:27
  • @WiktorStribiżew, `(?s).{1000}(?-s).*?\.\B` matches sometimes much more than 1000 characters, it must be 1000 or less. – Stan Hf. Mar 20 '21 at 20:30
  • @StanHf. If it must be 1000 chars at most (which you haven't mentioned in the question) including the dot, then `[\s\S]{1,999}\.` should work for you. This is the same as `(?s).{1,999}\.` (or just `.{1,999}\.` when the `. matches newline` checkbox is checked). – 41686d6564 Mar 20 '21 at 20:56
  • Your requirements are far from clear, we are just trying to follow your comments. Now, you say it must be less than 1K chars, so, my suggestion must be adjusted to `(?s).{1,999}\.\B`. This will match max 1K chars, or less if the `.` occurs closer to the beginning. It can really match just two chars, BTW. – Wiktor Stribiżew Mar 20 '21 at 21:01
  • @WiktorStribiżew, `(?s).{1,999}\.\B` does not match if there is "?" in the end. Look like `(?s).{0,1000}(\.|e.g.|\?)` works best. – Stan Hf. Mar 20 '21 at 21:20
  • 1
    Ok, you just want to match sentence end punctuation. Use `(?s).{1,1000}[.?!…]\B` – Wiktor Stribiżew Mar 20 '21 at 21:22
  • @WiktorStribiżew, as I said "1000 or less", it means =< 1000, so {0,1000} is the right code. – Stan Hf. Mar 20 '21 at 21:24
  • @WiktorStribiżew, `(?s).{0,1000}[.?!…]\B` works great, thank you so much! – Stan Hf. Mar 20 '21 at 21:35

1 Answers1

3

You can use

(?s).{0,1000}[.?!…]\B

See the regex demo.

Details:

  • (?s) - a DOTALL modifier, . now matches line break chars
  • .{0,1000} - any 0 to 1000 chars
  • [.?!…]\B - a ., ?, ! or that is either at the end of string or that is followed with a non-word char.
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397