8

I need to parse date-times as strings coming as two different formats:

  • 19861221235959Z
  • 1986-12-21T23:59:59Z

The following dateTimeFormatter pattern properly parses the first kind of date strings

DateTimeFormatter.ofPattern ("uuuuMMddHHmmss[,S][.S]X")

but fails on the second one as dashes, colons and T are not expected.

My attempt was to use optional sections as follows:

DateTimeFormatter.ofPattern ("uuuu[-]MM[-]dd['T']HH[:]mm[:]ss[,S][.S]X")

Unexpectedly, this parses the second kind of date strings (the one with dashes), but not the first kind, throwing a

java.time.format.DateTimeParseException: Text '19861221235959Z' could not be parsed at index 0

It's as if optional sections are not being evaluated as optional...

Michael
  • 34,340
  • 9
  • 58
  • 100
Cec
  • 1,376
  • 2
  • 14
  • 26
  • 1
    The `19861221235959` appears to be the year. It doesn't stop at 4 digits when parsing, only has a 4 digit minimum when formatting. – Peter Lawrey Jul 04 '18 at 15:30
  • @Peter Lawrey can you elaborate a bit more on that? I don't understand your point – Cec Jul 04 '18 at 15:33
  • The first number `19861221235959` is too large to be a year so it fails to parse it. – Peter Lawrey Jul 04 '18 at 15:35
  • But with the first pattern it worked without issues... The fact that the second pattern fails, seems as if the optional is not treated as such – Cec Jul 04 '18 at 15:37
  • 1
    I take your point that it probably should work, however I suspect you will need to peek at the contents or length and try one or the other format. – Peter Lawrey Jul 04 '18 at 15:44
  • 1
    Thank you guys, I'll go for the workaround you suggested, using both formatters and discriminating by string content. I wonder if this is an actual bug in java... – Cec Jul 04 '18 at 15:49
  • 4
    The first format uses "adjacent value parsing", where the first field can be variable width if all subsequent fields are fixed width. The second format does not use adjacent value parsing, because the fields are separated by the dash (they are not adjacent!). See https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatterBuilder.html#appendValue-java.time.temporal.TemporalField-int- – JodaStephen Jul 04 '18 at 17:17
  • @JodaStephen my problem with the docs is that something this important should have been in DateTimeFormatter docs, with a special mention in the optional section part, warning about the way it can break adjacent parsing – Cec Jul 05 '18 at 06:58
  • I’ll just point out that it seems like the unwritten question is parsing both forms of ISO 8601 Date Time formats. As such, it *seems* like you would want the first pattern to actually be: `19861221T235959Z`. – Paul Féraud Mar 05 '20 at 16:37

4 Answers4

13

As Peter stated in the comments, the problem is that your pattern is considering the entire string as the year. You can use .appendValue(ChronoField.YEAR, 4) to limit it to four characters:

DateTimeFormatter formatter = new DateTimeFormatterBuilder()
    .appendValue(ChronoField.YEAR, 4)
    .appendPattern("[-]MM[-]dd['T']HH[:]mm[:]ss[,S][.S]X")
    .toFormatter();

This parses correctly with both of your examples.

If you fancy being even more verbose, you could do:

DateTimeFormatter formatter = new DateTimeFormatterBuilder()
    .appendValue(ChronoField.YEAR, 4)
    .optionalStart().appendLiteral('-').optionalEnd()
    .appendPattern("MM")
    .optionalStart().appendLiteral('-').optionalEnd()
    .appendPattern("dd")
    .optionalStart().appendLiteral('T').optionalEnd()
    .appendPattern("HH")
    .optionalStart().appendLiteral(':').optionalEnd()
    .appendPattern("mm")
    .optionalStart().appendLiteral(':').optionalEnd()
    .appendPattern("ss")
    .optionalStart().appendPattern("X").optionalEnd()
    .toFormatter();
Michael
  • 34,340
  • 9
  • 58
  • 100
  • 3
    It works. Bravo. You should use `X` to parse the offset as in the question, though. – Ole V.V. Jul 04 '18 at 15:52
  • 1
    Nice one @Michael, thank you. I'm glad there is an option to avoid maintaining two different patterns. Shame to the Java Docs not mentioning this. – Cec Jul 04 '18 at 15:55
  • I’m curious why it’s enough to state the number of digits in the year and you don’t need to do it for the subsequent numeric fields. Could have to do with the fact that 999999999 is allowed as a year, whereas month can never be more than 12, and so on. – Ole V.V. Jul 04 '18 at 15:56
  • @OleV.V. Yes, precisely. Pretty much every other pattern has an upper bound in terms of number of possible characters. I think year is the only one that could potentially be arbitrarily long – Michael Jul 04 '18 at 15:59
  • Indeed, that was the catch :) – Cec Jul 04 '18 at 16:20
  • @Michael: Thanks for the answer and the explanation, and I understand a year doesn't have an upper bound, but isn't that the point of putting yyyy instead of yy or yyyyyy? That is, shouldn't it be honoring the humber of placeholders you provide? – rjcarr Nov 20 '18 at 21:30
  • @rjcarr Nope, it doesn't work that way. See [the documentation](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html): "The count of letters determines the minimum field width below which padding is used..." – Michael Nov 21 '18 at 13:23
2

It’s not clear from the documentation, but my guess is that the following is what happens.

When you use uuuuMMddHHmmss in your format pattern string, the formatter can easily see that there are several adjacent numeric fields and therefore uses the field widths to separate the fields. The first 4 digits are taken to mean the year, and so on.

When instead you use uuuu[-]MM[-]dd['T']HH[:]mm[:]ss, the formatter doesn’t perceive it as adjacent numeric fields. I agree with the comments by Peter Lawrey that it therefore takes a longer run of digits for year and in the end overflows the maximum year (999999999) and throws the exception.

The solution? Please refer to Michael’s answer.

Ole V.V.
  • 65,573
  • 11
  • 96
  • 117
1

DateTimeFormatter based on patterns are not smart enough to handle both an optional section and the possibility to have two numeric fields without separation. When you do need your numeric fields to be without separator, no question asked, then the pattern understands that the change of pattern letter from u to M means that it needs to count the digits to know which digit is part of which fields. But when this is not a certainty, then the pattern doesn't try that. It sees one numeric field described entirely and not immediately followed with another numeric fields. Therefore, there is no reason to count digits. All the digits are part of the field supposed to be represented here.

To do that, you shouldn't try to build your DateTimeFormatter with a pattern, but rather with a Builder. Get your inspiration from DateTimeFormatter.BASIC_ISO_DATE and the others nearby.

kumesana
  • 2,481
  • 1
  • 6
  • 10
-1

At first glance your second format should be working for both cases. Not sure why it doesn't. BTW I am curious why you used 'u' as opposed to 'y' for a year. So I would try using 'y' as well just to see if it makes a difference. But in general you are touching on the interesting point - how to parse a date from unknown format (imagine that instead of 2 possible formats you are dealing with unknown number of formats). I actually wrote once a parser like that. The idea that I used to solve this problem is described in my article Java 8 java.time package: parsing any string to date. You might find the idea useful. In short the idea is to have external file that holds all supported formats in it and try to apply each format one-by-one until one works.

Michael Gantman
  • 4,318
  • 1
  • 12
  • 31
  • Trying to parse all possible formats one after the other is a performance killer to never ever use in any production code. Furthermore in real world applications the amount of date formats is known and limited (e.g.: APIs don't produce dates in 10 different formats). – Cec Jul 04 '18 at 16:00
  • @Cec obviously you can not be familiar with all use-cases in the world. In my case the data was coming from unknown sources and could come from anywhere in the world, so yes I could expect ANY date format (in my case we had over 30-40 different ones). Also the processing was done asynchronously and "off-line" so we could afford less-then-perfect performance but could NOT afford to miss some date un-parsed. So I stand by my idea. If you really want to discuss the issue please read my article in full – Michael Gantman Jul 04 '18 at 16:19
  • I'm not trying to discredit your approach to the particular case you had to tackle. I'm just saying that in the common case such an approach is to be avoided. You know just in case that someone was intrigued by the idea of supporting all formats and applied your approach to problems for which it is not suited. – Cec Jul 04 '18 at 16:26
  • I see your point. But on the other hand you have to trust this forum readers to figure out for themselves what suitable or not for their case. In practice "common" case is actually very rarely is the case. There is always some twist. BTW I presented the idea, but you could easily make a lot of optimizations (such as read all the formats into memory and access them in memory which will solve a lot of performance issues) So I still disagree with your formulation of "to never ever use in any production code". But still, I do see a valid point in your statement – Michael Gantman Jul 04 '18 at 16:47
  • [`uuuu` versus `yyyy` in `DateTimeFormatter` formatting pattern codes in Java?](https://stackoverflow.com/questions/41177442/uuuu-versus-yyyy-in-datetimeformatter-formatting-pattern-codes-in-java) – Ole V.V. Feb 27 '19 at 09:25