I have a piece of Java code using Apache Spark to join two dataframes with a conditional that relies on a VM argument -DearlyData=TRUE
for an inner join
, and -DearlyData=FALSE
for a leftanti join
depending on whether the VM argument is set to TRUE
or FALSE
(Technically, if it is set to TRUE
or any other value.)
This is a simplified version of my code:
``
String earlyData = System.getProperty(Constants.EARLY_DATA);
if(earlyData.equalsIgnoreCase("TRUE")){
log.trace("Running Early Data");
DataBo.processData(earlyDF.join(cassandraDF,
earlyDF.col(AA).equalTo(example.col(BB))
.and(earlyDF.col(CC).equalTo(example.col(DD))),"inner")
drop(Constants.AA, Constants.CC));
}else{
log.trace("Running Late Data");
DataBo.processData(earlyDF.join(cassandraDF,
earlyDF.col(AA).equalTo(example.col(BB))
.and(earlyDF.col(CC).equalTo(example.col(DD))), "leftanti")
.drop(Constants.AA, Constants.CC));
``
My code works, but my question is this:
- Should I use an
Environment Variable
or aVM Argument
for the StringearlyData
? - Are there drawbacks or unforeseen complications of using one versus the other in a
conditional
like this?