We are using Pivotal Gemfire as a cache for our data. Recently we migrated from gemfire 8.2.1 to 9.5.1 with exactly same regions, data and indexes. But the indexes creation on particularly one region is taking too much of time which has entrycount of 7284500. We have used Spring data gemfire v2.4.1.RELEASE for defining the cache server. Below is the configuration of the problematic region:
<gfe:replicated-region id="someRegion"
shortcut="REPLICATE_PERSISTENT" concurrency-level=100
persistent="true" disk-synchronous="true" statistics="true">
<gfe:eviction action="OVERFLOW_TO_DISK" type="ENTRY_COUNT"
threshold=1000></gfe:eviction>
</gfe:replicated-region>
Below are the index definitions:
<gfe:index id="someRegion_idx1" expression="o1.var1" from="/someRegion o1" />
<gfe:index id="someRegion_idx2" expression="o2.var2" from="/someRegion o2"/>
<gfe:index id="someRegion_idx3" expression="o3.var3" from="/someRegion o3"/>
<gfe:index id="someRegion_idx4" expression="o4.var4" from="/someRegion o4"/>
<gfe:index id="someRegion_idx5" expression="o5.var5" from="/someRegion o5"/>
<gfe:index id="someRegion_idx6" expression="o6.var6" from="/someRegion o6"/>
<gfe:index id="someRegion_idx7" expression="o7.var7" from="/someRegion o7"/>
<gfe:index id="someRegion_idx8" expression="o8.var8" from="/someRegion o8"/>
Below is the cache defination:
<gfe:cache
properties-ref="gemfireProperties"
close="true"
critical-heap-percentage=85
eviction-heap-percentage=75
pdx-serializer-ref="pdxSerializer"
pdx-persistent="true"
pdx-read-serialized="true"
pdx-ignore-unread-fields="false" />
Below are the Java parameters:
java -Xms50G -Xmx80G -XX:+UseConcMarkSweepGC
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark
-XX:+UseParNewGC -XX:+UseLargePages
-XX:+DisableExplicitGC
-Ddw.appname=$APPNAME \
-Dgemfire.Query.VERBOSE=true \
-Dgemfire.QueryService.allowUntrustedMethodInvocation=true \
-DDistributionManager.MAX_THREADS=20 \
-DDistributionManager.MAX_FE_THREADS=10 \
-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.port=11809 \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Dconfig=/config/location/ \
com.my.package.cacheServer
When run without XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC
, we used to get the following error while indexes were applied:
org.apache.geode.ForcedDisconnectException: Member isn't responding to heartbeat requests gemfire pivotal
We tried increasing the member-timeout
property from 5000 to 300000 but the same issue persisted.
After adding the above GC related java parameters, every index is taking around 24 minutes to get applied, but this time without errors. This is resulting server to take too much time to come up along with around 15 other regions. There is no such issue faced with other regions.(The region in question has the largest data count. Other regions have around 500K to 3M entry count)