0

With the following class:

class Test:
    a : str
    b : str

and the following data frame:

output = pd.DataFrame(columns=['a', 'b']

how can I convert an array, or list, of class Test into a pandas dataframe with matching columns?


Edit:

Let me add a more concrete example:

class Test:
    a: int
    b: int

    def __init__(self, a: int, b: int):
        self.a = a
        self.b = b

l = [Test(10, 20), Test(50, 60)]

output = pd.DataFrame(l,
                  columns=['a', 'b'],
                  index=range(len(l)))

and the error I get is:

ValueError: Shape of passed values is (2, 1), indices imply (2, 2)

Akaisteph7
  • 4,092
  • 1
  • 10
  • 33
Thomas
  • 7,832
  • 5
  • 39
  • 87
  • Are you having trouble with the typical way to create a DataFrame? For instance, `output = pd.DataFrame([test.a, test.b], columns=['a', 'b'])`, where `test = Test()` – PyNoob Jul 24 '19 at 22:59
  • Possible duplicate of [Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?](https://stackoverflow.com/questions/20763012/creating-a-pandas-dataframe-from-a-numpy-array-how-do-i-specify-the-index-colum) – tim Jul 24 '19 at 23:00
  • @PyNoob: I put a concrete example with the error – Thomas Jul 24 '19 at 23:15
  • @tim: the questions may be related, but they're not exactly the same since the other question involves part of the list to become the header, which is not the case here – Thomas Jul 24 '19 at 23:16
  • I'm not sure what you intended to do, but `Test(10, 20)` evaluates to `<__main__.test at="">` - one element; so `pd.DataFrame(l)` tells panda to expect one column and two rows, while `columns=['a', 'b']` implies two columns. Hence the error. – Jack Fleeting Jul 25 '19 at 00:08
  • You should usually try to show an expected output as well. – Akaisteph7 Jul 25 '19 at 02:36
  • @Akaisteph7: yes, I am not used to formulate Pandas questions, but I'm slowly realizing they require quite a lot more detail than in other fields :) – Thomas Jul 25 '19 at 12:41

2 Answers2

1

You can call vars to convert all the attributes of the class into a dict:

class Test:
    def __init__(self, a: int, b: int):
        self.a = a
        self.b = b

tests = [Test(10, 20), Test(50, 60)]
df = pd.DataFrame([vars(t) for t in tests])
Code Different
  • 73,850
  • 14
  • 125
  • 146
  • this is working, thanks! can you explain me the [vars(t)..] section and why it works? kind of new to python, but very new to pandas (like 4 days :)) – Thomas Jul 25 '19 at 12:38
  • to clarify, I understand vars(t), but I don't understand the vars(t) for ...; I would understand for t in tests: somelist.append(vars(t)) – Thomas Jul 25 '19 at 12:48
  • 1
    It's called list comprehension. Basically a one-liner loop. `[vars(t) for t in tests]` apply the function `vars` on every element in `tests`. Since `vars(t)` returns a dictionary, `[vars(t) for ...]` return a list of dictionaries – Code Different Jul 25 '19 at 12:51
  • I didn’t know about list comprehension; I’m reading about it now, thanks! – Thomas Jul 25 '19 at 14:16
1

Another way to achieve this is to do:

df = pd.DataFrame([test.__dict__ for test in tests])
Akaisteph7
  • 4,092
  • 1
  • 10
  • 33