I've spent about 3 hours trying to create a regex to validate a coordinate string that's homebrew (but based on MGRS). I've always had a rough time with regex and this one's starting to make me slam my head on my desk.
I normally wouldn't ask for something this specific, but none of my friends who know regex were available to help, and my own research/attempts to learn at this point are taking too long.
How would I go about creating a regex to validate the following (or is it even possible to do)?
AZ AA 0123456789 0123456789 HG-20
- All whitespace is optional
- The first and second set of 2 characters is case insensitive, but must be a-z and be 2 characters for each section (first is grid zone identifier, second is subsection identifier).
- The easting and northing digit portion can be any arbitrary length, so long as the total number of digits between the 2 "sections" is even (e.g. "1 1" is valid, "04563213245 54986187995" is valid, "5598 549" is invalid). This is the part that I'm really just at a loss for.
- The 'H' is mandatory, but case insensitive (it's a delimiter for the "height" component of the coordinate string)
- The 'G' is optional (it indicates whether to use the locations ground height or not when converting to Unity coordinates)
- The last bit is a float, which can be negative. If a period is there, then a digit must follow.
- The entire height section is optional. If "H" is present, at a minimum a digit must come after (same rules as the last 2 bullet points basically).
I'm using C# and System.Text.RegularExpressions for the regex engine.
The part that's really getting to me is the main digit component. I'm not sure if this is possible (especially considering that the easting and northing can be seperated by whitespace).
So far I've been able to come up with:
/^[a-z]{2}\s?[a-z]{2}\s?\d+\s?\d+\s?hg?[-+]?\d+.?\d+?/ig
But it doesn't actually validate whether the length of the coordinates sans whitespace is even or not and isn't able to tell if the entire height section is optional or not (it technically makes the H required...no clue how to subsection it...).
Ultimately, since I break the string apart in code I could validate it with actual code (if the digit sub string is not even throw InvalidArgument or ArgumentOutOfRange exception since this logic happens in a constructor for a Location class). That seems like it would be bad juju if the validation can be done in regex though.
Examples:
AZ AA 012345 012345 HG-20
(valid)AZAA 012345 012345 HG-20
(valid)AZAA012345012345HG-20
(valid)AZ AA 012345 012345
(valid)AZ AA 012345 01234
(invalid, easting and northing are different lengths)AZ AA 012345 012345 HG
(invalid, must have digits if 'H' is present)AZ AA 012345 012345 H
(invalid, must have digits if "H" is present)
Thanks!
In case anyone is curious about how my code currently looks:
public Location(string coordinateString)
{
//TODO: Validate string
//Calling ToUpper() as we never want to worry about casing. Everything is upper case.
var stripped = coordinateString.ToUpper().Replace(" ", "");
var gridZonedesignator = stripped.Substring(0, 2);
var subLocationId = stripped.Substring(2, 2);
var identifiersRemoved = stripped.Remove(0, 4);
var heightParsed = identifiersRemoved.Split('H');
float height = 0;
bool useGroundHeight = false;
//If the height component is there, parse ground height flag (if there) and set
//height.
if (heightParsed.Length > 1)
{
if (heightParsed[1].StartsWith("G"))
{
useGroundHeight = true;
heightParsed[1] = heightParsed[1].Remove(0);
}
height = float.Parse(heightParsed[1], CultureInfo.InvariantCulture);
}
//Since the total digits of the easting/northing section must be equal,
//simply divide by 2 to separate the number of digits the easting and
//northing each consist of (accuracy).
var accuracy = heightParsed[0].Length / 2;
// It's possible to end up with accuracy 1, 2, or 3, in which case we want
//to pad 0s to the right as a 1 digit coordinate translates to thousands,
//not ones as a grid zone is currently 10k x 10k meters.
//TODO: Base the digits off the scale of a grid zone instead of hard
//coding to 4. If we change the scale the following will no longer be
//valid.
var eastingString = heightParsed[0].Substring(0, accuracy).PadRight(4,'0');
var northingString = heightParsed[0].Substring(accuracy, accuracy).PadRight(4,'0');
CoordinateString = coordinateString;
GridZoneDesignation = gridZonedesignator;
SubLocationId = subLocationId;
Easting = eastingString;
EastingInt = int.Parse(Easting);
Northing = northingString;
NorthingInt = int.Parse(Northing);
IsStartFromGroundHeight = useGroundHeight;
Height = height;
}