I have been puzzling over this problem for some time now. I was wandering if there is some "Pandas" like way to get there. I have a simple DataFrame with two columns PivotHigh and PivotLow, representing high values and low values. I need to "connect the dots" in pairs from the lowest low to the highest high. Sounds simple but there is some complications:
- Rows do not clearly alternate, there will be some overlaps.
- There will be runs in a column, when such occurs the highest/lowest value of the run must be selected
In the image I have annotated the "runs" in yellow, and the values (max/min) that should be selected in green
I have been trying something along the lines of:
def find_swings(df_pivots,
column_names={"HighPivots":"HighPivots","LowPivots":"LowPivots"}):
"""
Given a Dataframe of PivotHigh and PivotLow will return a Dataframe describing the pivots joined together.
Parameters:
df_pivots (DataFrame): Contains pivot information
PivotHigh - A sereis of pivot highs
PivotLow - A series of pivot lows
column_names (dictionary): Specify the column names for the relevant columns
Returns:
DataFrame:
StartBar - Start Date of the Bar
EndBar - End Date of the Bar
StartValue - Starting value of the swing
EndValue - Ending value of the swing
Diff - EndValue - StartValue
SwingType - "Up" | "Down"
N_Bars - Number of Bars
Gradient - (EndValue-StartValue)/N_Bars
Angle - Angle of the Swing in Radians
Magnitude - Magnitude of the Swing: Sqrt(Diff^2 + N_Bars^2)
(Next Phase)
ClosePriceArea - Close Price Area: The unsigned area enclosed by the swing
Lower means tight fit on trend
"""
df_ret = pd.DataFrame()
df = df.reindex()
start_index = 0
current_pivot = "PivotLow" if df["PivotLow"].index.min() <= df["PivotHigh"].index.min() else "PivotHigh"
alternate_pivot = "PivotHigh" if current_pivot == "PivotLow" else "PivotLow"
while(True):
swing_end = df_t.loc[df_t[alternate_pivot].notnull()].index[0]
swing_start = df.iloc[df.iloc[:swing_end]]
df_trimmed = df_ret[swing_start:swing_end]
if (current_pivot == "PivotLow"):
df_ret["StartDate"] = df_trimmed.loc[df_trimmed[current_pivot].idxmin(axis=1)]["Date"]
#....
else:
df_ret["StartDate"] = df_trimmed.loc[df_trimmed[current_pivot].idxmax(axis=1)]["Date"]
#....
But from there things seem to snowball out of control.
Is there a way to introduce a new sequential label column on the "groups", then I will be able to perform some max/min aggregation on them?
Or any other general advice/approaches?
Thanks
Data:
Date,PivotHigh,PivotLow
2020-01-01 02:45:00,,7210.59
2020-01-01 04:30:00,,7195.01
2020-01-01 06:30:00,,7204.53
2020-01-01 07:15:00,7209.15,
2020-01-01 11:15:00,7215.0,
2020-01-01 11:30:00,,7198.99
2020-01-01 14:30:00,,7200.19
2020-01-01 15:30:00,7238.14,
2020-01-01 17:15:00,,7221.18
2020-01-01 18:15:00,,7219.01
2020-01-01 20:30:00,,7209.6
2020-01-01 21:00:00,7222.6,
2020-01-01 21:45:00,,7210.73
2020-01-02 00:45:00,7186.18,7179.77
2020-01-02 06:30:00,,7112.27
2020-01-02 10:30:00,7156.65,7139.93
2020-01-02 14:45:00,7137.23,
2020-01-02 19:45:00,6976.17,6965.0
2020-01-02 22:45:00,,6954.34
2020-01-03 04:30:00,7281.24,
2020-01-03 07:15:00,7220.0,
2020-01-03 10:15:00,7361.7,7316.31
2020-01-03 12:15:00,7339.31,7329.66
2020-01-03 13:45:00,7369.61,7319.0
2020-01-03 16:15:00,7401.2,
2020-01-03 16:45:00,,7358.18
2020-01-03 20:00:00,7349.44,
2020-01-03 21:15:00,,7319.45
2020-01-03 23:30:00,,7315.0
2020-01-04 00:00:00,7339.99,
2020-01-04 03:00:00,7332.89,
2020-01-04 05:30:00,7359.99,
2020-01-04 07:00:00,,7341.89
2020-01-04 07:15:00,7354.98,
2020-01-04 10:15:00,7357.39,7343.95
2020-01-04 11:45:00,,7319.19
2020-01-04 15:15:00,7317.58,
2020-01-04 15:30:00,,7304.6
2020-01-04 18:00:00,,7328.74
2020-01-04 18:30:00,7398.0,
2020-01-04 21:30:00,7358.51,7341.03
2020-01-05 02:00:00,7483.28,7441.4
2020-01-05 03:45:00,,7460.0
2020-01-05 04:00:00,7488.87,
2020-01-05 06:00:00,7478.24,
2020-01-05 07:45:00,,7462.22
2020-01-05 08:00:00,7475.0,
2020-01-05 13:00:00,7460.03,7446.36
2020-01-05 16:30:00,7479.52,
2020-01-05 17:45:00,,7460.0