I'm using Pandas to automate analysis of a variety of different 3rd party reports. Most are in csv
format.
Assuming only correct files are loaded into the program, I need to:
- identify the origin of the report (3rd party), based on
- schema
- predictable column values
- store historical reports of same origin,
- return origin, maybe some other thing-ys
I only need to manage 10 reports in the beginning. I imagine it could grow into identifying upwards of several hundred--noting that a flat file and some dictionaries couldn't handle. But why reinvent the wheel, ...
Are there packages to register/identify schemas for a Pandas data analysis workflow?