1

I work with a variety of data sources (Salesforce, Google, CSVs, other REST APIs etc.) and types (tabular, key-value) and have a number of different Python wrappers that expose these feeds.

These are difficult to maintain, and at present, don't follow a consistent interface/convention - which I recognise is a bad place to be. The list of sources from which I need to retrieve data is growing all the time, and so I want to find/come up with a better solution.

My gut feel says to try and design some kind of 'generic' data gateway, built in Python, that can provide a consistent interface to the data I require, regardless of underlying source and method of retrieval (e.g. HTTP request vs. database call). Problem is, I'm struggling to find any concrete examples of this type of setup in use, which to me seems odd, as this appears to be a common problem for many organisations, where data comes from many sources with lots of different schemas and structures.

I'd appreciate any suggestions on how to possibly begin thinking about this, and/or directions towards existing solutions that may help.

Connor Goddard
  • 565
  • 4
  • 13
  • You're probably looking for ORM term. For databases in Python its typically SQLAlchemy. For files its usually a big problem, but depending on your case, I might assume its python XB classes generated out of XSD file (for XML), json schemas (for JSON) and something similar for CSV. For large scale ETL processes people usually convert the source data using already coming connectors (from modules of tools like Scoop/Spark) to stagng object that can be used for generic mapping. Spark is known for its DataFrame object and typical parquet files as storage mechanism, which basically is your goal. – dimon222 May 19 '19 at 14:44
  • If your company is open to buy instead of build, there are various data virtualization software packages that can help you merge data from multiple datasources. For example you could use Denodo (pure virtualization), or Periscope (BI with some virtualization) or even Istio (open source integration, telemetry and policy management) – camba1 May 21 '19 at 15:23

0 Answers0