10

I discovered this issue in connection with Elastic Search queries, but since the ES date format documentation links to the API documentation for the java.time.format.DateTimeFormatter class, the problem is not really ES specific.

Short summary: We are having problems with dates beyond year 9999, more exactly, years with more than 4 digits.

The documents stored in ES have a date field, which in the index descriptor is defined with format "date", which corresponds to "yyyy-MM-dd" using the pattern language from DateTimeFormatter. We are getting user input, validate the input using org.apache.commons.validator.DateValidator.isValid also with the pattern "yyyy-MM-dd" and if valid, we create an ES query with the user input. This fails with an execption if the user inputs something like 20202-12-03. The search term is probably not intentional, but the expected behaviour would be not to find anything and not that the software coughs up an exception.

The problem is that org.apache.commons.validator.DateValidator is internally using the older SimpleDateFormat class to verify if the input conforms to the pattern and the meaning of "yyyy" as interpreted by SimpleDateFormat is something like: Use at least 4 digits, but allow more digits if required. Creating a SimpleDateFormat with pattern "yyyy-MM-dd" will thus both parse an input like "20202-07-14" and similarly format a Date object with a year beyond 9999.

The new DateTimeFormatter class is much more strict and means with "yyyy" exactly four digits. It will fail to parse an input string like "20202-07-14" and also fail to format a Temporal object with a year beyond 9999. It is worth to notice that DateTimeFormatter is itself capable of handling variable-length fields. The constant DateTimeFormatter.ISO_LOCAL_DATE is for example not equivalent to "yyyy-MM-dd", but does, conforming with ISO8601, allow years with more than four digits, but will use at least four digits. This constant is created programmatically with a DateTimeFormatterBuilder and not using a pattern string.

ES can't be configured to use the constants defined in DateTimeFormatter like ISO_LOCAL_DATE, but only with a pattern string. ES also knows a list of predefined patterns, occasionally the ISO standard is also referred to in the documentation, but they seem to be mistaken and ignore that a valid ISO date string can contain five digit years.

I can configure ES with a list of multiple allowed date patterns, e.g "yyyy-MM-dd||yyyyy-MM-dd". That will allow both four and five digits in the year, but fail for a six digit year. I can support six digit years by adding yet another allowed pattern: "yyyy-MM-dd||yyyyy-MM-dd||yyyyyy-MM-dd", but then it fails for seven digit years and so on.

Am I overseeing something, or is it really not possible to configure ES (or a DateTimeFormatter instance using a pattern string) to have a year field with at least four digits (but potentially more) as used by the ISO standard?

jarnbjo
  • 32,366
  • 6
  • 65
  • 87
  • I’m not sure I understand, and maybe I need not. Entering a 5 digit year is an error. I see nothing wrong with reporting it as such. Even if you insist on accepting a 5 digit year, no one will enter 6 digits, so whether or not that is reported as an error, who cares? – Ole V.V. Jun 25 '20 at 03:21
  • 1
    A 5 digit year is not an error. Why do you think so? – jarnbjo Jun 26 '20 at 15:52
  • *The search term is probably not intentional…* That’s what I call an error. Have you got an explicit requirement to be able to search years after 9999, and if so, why? – Ole V.V. Jun 26 '20 at 20:07
  • [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601#Years) always allows years 1583 through 9999. Allowing years outside that range is optional. ISO 8601 doesn’t tell you to. – Ole V.V. Jun 27 '20 at 03:52
  • 1
    @OleV.V. Searching for the last name 'oqgfqhf' is also probably not intentional, but still not an error. The expected behaviour would be that no records are found. The same applies to the date field. Searching for a valid, but unknown value should return no results, but not be an error. We do not have an explicit requirement to be able to search for oqgfqhf as a last name, nor for dates outside the expected range, nor is 2020-06-28 explicitely mentioned as a search term we are supposed to support. – jarnbjo Jun 28 '20 at 12:48
  • 1
    Your code. Your decision. Since you cannot use this thought of mine, please throw it over your shoulder, I got no problem with that. – Ole V.V. Jun 29 '20 at 16:55

3 Answers3

7

Edit

ISO 8601

Since your requirement is to conform with ISO 8601, let’s first see what ISO 8601 says (quoted from the link at the bottom):

To represent years before 0000 or after 9999, the standard also permits the expansion of the year representation but only by prior agreement between the sender and the receiver. An expanded year representation [±YYYYY] must have an agreed-upon number of extra year digits beyond the four-digit minimum, and it must be prefixed with a + or − sign instead of the more common AD/BC (or CE/BCE) notation; …

So 20202-12-03 is not a valid date in ISO 8601. If you explicitly inform your users that you accept, say, up to 6 digit years, then +20202-12-03 and -20202-12-03 are valid, and only with the + or - sign.

Accepting more than 4 digits

The format pattern uuuu-MM-dd formats and parses dates in accordance with ISO 8601, also years with more than four digits. For example:

    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("uuuu-MM-dd");
    LocalDate date = LocalDate.parse("+20202-12-03", dateFormatter);
    System.out.println("Parsed: " + date);
    System.out.println("Formatted back: " + date.format(dateFormatter));

Output:

Parsed: +20202-12-03
Formatted back: +20202-12-03

It works quite similarly for a prefixed minus instead of the plus sign.

Accepting more than 4 digits without sign

    yyyy-MM-dd||yyyyy-MM-dd||yyyyyy-MM-dd||yyyyyyy-MM-dd||yyyyyyyy-MM-dd||yyyyyyyyy-MM-dd

As I said, this disagrees with ISO 8601. I also agree with you that it isn’t nice. And obviously it will fail for 10 or more digits, but that would fail for a different reason anyway: java.time handles years in the interval -999 999 999 through +999 999 999. So trying yyyyyyyyyy-MM-dd (10 digit year) would get you into serious trouble except in the corner case where the user enters a year with a leading zero.

I am sorry, this is as good as it gets. DateTimeFormatter format patterns do not support all of what you are asking for. There is no (single) pattern that will give you four digit years in the range 0000 through 9999 and more digits for years after that.

The documentation of DateTimeFormatter says about formatting and parsing years:

Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years as per SignStyle.NORMAL. Otherwise, the sign is output if the pad width is exceeded, as per SignStyle.EXCEEDS_PAD.

So no matter which count of pattern letters you go for, you will be unable to parse years with more digits without sign, and years with fewer digits will be formatted with this many digits with leading zeroes.

Original answer

You can probably get away with the pattern u-MM-dd. Demonstration:

    String formatPattern = "u-MM-dd";
    
    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern(formatPattern);
    
    LocalDate normalDate = LocalDate.parse("2020-07-14", dateFormatter);
    String formattedAgain = normalDate.format(dateFormatter);
    System.out.format("LocalDate: %s. String: %s.%n", normalDate, formattedAgain);
    
    LocalDate largeDate = LocalDate.parse("20202-07-14", dateFormatter);
    String largeFormattedAgain = largeDate.format(dateFormatter);
    System.out.format("LocalDate: %s. String: %s.%n", largeDate, largeFormattedAgain);

Output:

LocalDate: 2020-07-14. String: 2020-07-14.
LocalDate: +20202-07-14. String: 20202-07-14.

Counter-intuituvely but very practically one format letter does not mean 1 digit but rather as many digits as it takes. So the flip side of the above is that years before year 1000 will be formatted with fewer than 4 digits. Which, as you say, disagrees with ISO 8601.

For the difference between pattern letter y and u for year see the link at the bottom.

You might also consider one M and/or one d to accept 2020-007-014, but again, this will cause formatting into just 1 digit for numbers less than 10, like 2020-7-14, which probably isn’t what you want and again disagrees with ISO.

Links

Ole V.V.
  • 65,573
  • 11
  • 96
  • 117
  • 1
    We want to follow the ISO standard, so a single pattern letter (be it u or y) is not acceptable. – jarnbjo Jun 26 '20 at 15:52
2

Maybe this will work:

[uuuu][uuuuu][...]-MM-dd

Format specifiers placed between square brackets are optional parts. Format specifiers inside brackets can be repeated to allow for multiple options to be accepted.

This pattern will allow a year number of either four or five digits, but rejects all other cases.

Here is this pattern in action. Note that this pattern is useful for parsing a string into a LocalDate. However, to format a LocalDate instance into a string, the pattern should be uuuu-MM-dd. That is because the two optional year parts cause the year number to be printed twice.

Repeating all possible year number digit counts, is the closest you can get in order to make it work the way you expect it to work.

The problem with the current implementation of DateTimeFormatter is that when you specify 4 or more u or ys, the resolver will try to consume exactly that number of year digits. However, with less than 4, then the resolver will try to consume as many as possible. I do not know whether this behavior is intentional.

So the intended behavior can be achieved with a formatter builder, but not with a pattern string. As JodaStephen once pointed out, "patterns are a subset of the possible formatters".


Maybe the characters #, { and }, which are reserved for future use, will be useful in this regard.

MC Emperor
  • 17,266
  • 13
  • 70
  • 106
  • I already mentioned the possibility to configure multiple patterns in the question (the ES syntax is different) and why that is not a solution. The configuration you are suggesting here would allow 4 or 5 digit years and not '4 or more' as is required. – jarnbjo Jun 29 '20 at 10:12
  • @jarnbjo Well, from the point of view of Elastic Search, this is a single pattern string. I have updated the answer. – MC Emperor Jun 29 '20 at 12:25
1

Update

You can use DateTimeFormatterBuilder#appendValueReduced to restrict the number of digits in a year in the range of 4-9 digits.

import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeFormatterBuilder;
import java.time.temporal.ChronoField;

public class Main {
    public static void main(String[] args) {
        DateTimeFormatter formatter = new DateTimeFormatterBuilder()
                .appendValueReduced(ChronoField.YEAR, 4, 9, 1000)
                .appendPattern("-MM-dd")
                .toFormatter();

        String[] dateStrArr = { "2017-10-20", "20171-10-20", "201712-10-20", "2017123-10-20" };
        for (String dateStr : dateStrArr) {
            System.out.println(LocalDate.parse(dateStr, formatter));
        }
    }
}

Output:

2017-10-20
+20171-10-20
+201712-10-20
+2017123-10-20

Original answer

You can use the pattern [uuuu][u]-MM-dd where [uuuu] conforms to a 4-digit year and [u] can cater to the requirement of any number of digits allowed for a year.

Demo:

import java.time.LocalDate;
import java.time.format.DateTimeFormatter;

public class Main {
    public static void main(String[] args) {
        DateTimeFormatter formatter = DateTimeFormatter.ofPattern("[uuuu][u]-MM-dd");
        String[] dateStrArr = { "2017-10-20", "20171-10-20", "201712-10-20", "2017123-10-20" };
        for (String dateStr : dateStrArr) {
            System.out.println(LocalDate.parse(dateStr, formatter));
        }
    }
}

Output:

2017-10-20
+20171-10-20
+201712-10-20
+2017123-10-20
Arvind Kumar Avinash
  • 50,121
  • 5
  • 26
  • 72
  • While it works for parsing, your formatter formats the same dates back as 20172017-10-20, +2017120171-10-20, +201712201712-10-20 and +20171232017123-10-20, contrary to requirements. – Ole V.V. Jun 29 '20 at 17:03
  • [uuuu][u] does not make much sense, since the pattern uuuu is already covered by the pattern u. For parsing purposes, "[uuuu][u]-MM-dd" and "u-MM-dd" are equivalent and as I already wrote in a comment to Ole's answer, simply using u is not acceptable since it would allow years with less than 4 digits and that is not wanted. – jarnbjo Jun 29 '20 at 17:26
  • @jarnbjo - I've posted an update which will restrict the digits in the range of `4-9`. Please let me know if it still doesn't fulfil the requirement. – Arvind Kumar Avinash Jun 29 '20 at 18:05
  • I already wrote in the question that I can get a DateFormatBuilder doing what I want by creating it with a DateTimeFormatterBuilder, but it is not possible to use a DateTimeFormatterBuilder to configure Elastic Search. This does also not answer my question. – jarnbjo Jun 29 '20 at 18:40