Is there a way to hint about a pandas DataFrame's schema "statically" so that we can get code completion, static type checking, and just general predictability during coding?
I wouldn't mind duplicating the schema info in code and type annotation for this to work..
So maybe something roughly like mypy comment type annotations:
df = pd.DataFrame({'a': [1.0, 2.4, 4.5], 'B': [1,2,3]}) # pd.schema: ('a': np.dtype(float)), ('B': np.dtype(int))
(or better yet have the schema specified in some external JSON file or such)
Then you can image things like df.
auto-completing during coding to df.a
or df.B
. Or mypy (and any other static code analyzer) being able to infer the type of df.B[0]
and such.
Although hopeful, I'm guessing this isn't really possible (or desired...). If so, what would be a good standard for writing good reusable code that returns pd.DataFrame
's with specific columns? So imagine there's a function get_data() -> pd.DataFrame
that returns data with columns that are known in advance - how would you make this transparent to a user of this function? Anything smarter / more standardized than just spelling it out in the function's docstring?