0

I'm still only a few months into python, so please excuse the ugly code. I have a dataset composed of unique ID's. Consider this format of 3 rows each with 3 ID's:

zList = [[1412,2521,53522],
[52632,1342,1453],
[3413,342,25232]]

I am attempting to replace each ID with some corresponding data (First Name, Last Name, State, etc). Ideal output looks like this:

resultList = [[Bob, Smith, Ohio, Jane, Doe, Texas, John, Smith, Alaska],
[Jim, Bob, California, Jack, White, Virginia, John, Smith, Nevada],
[Hank, Black, Kentucy, Sarah, Hammy, Florida, Joe, Blow, Mississipi]]

I realize that it would be cleaner to add a new dimension to the results, since I am essentially expanding each ID into a new list. I avoided this because I assumed it would be easier to keep it flat, and I fear iterating through anything over 2 dimensions! Willing to consider all options...

The data I am using to match against is what you might expect:

matchData = [[1412, Bob, Smith, Ohio, lots of additional data],
[2521, Jane, Doe, Texas, Lots of Additional Data],
[3411], Jim, Black, New York, Lots of Additional Data],
[...etc...]]

Here is how I have been attempting this:

resultList = []
for i, valz in enumerate(zList):
    for j, ele in enumerate(valz):
        check = False
        for k, valm in enumerate(matchData):
            if ele == valm[0]:
                resultList.append(valm)
                check = True
                break
        if check == False:
            print "error during rebuild"
pprint.pprint(resultList, width=400)

Now although it almost works, its missing 2 key things that I cant' figure out. My code dumps everything into one big list. I must be able to preserve the order and logical separation from the original data set. (remember, the original dataset was 3 rows of 3 ID's).

I also need to throw an error if there is no match found. You can see my attempt in the code above, but it does not work properly. I have tried adding this after my first if statement:

elif all(ele not in valm[15):
    check = False

But I get this error: "TypeError: argument of type 'int' is not iterable"

Yu Hao
  • 111,229
  • 40
  • 211
  • 267
nodoze
  • 127
  • 1
  • 13

2 Answers2

0

I think your main problem is structuring the list. From what it looks like, you have 3 "entries" per row in zList and per row in resultList. I would recommend altering the zList to a one dimensional list and putting the different entries from results list in it's own list (inside resultList like so:

zList = [ 1412, 2521, 53522, 52632, 1342, 1453, 3413, 342, 25232 ]
resultList = [[ "Bob", "Smith", "Ohio" ],[ "Jane", "Doe", "Texas" ],[ "John", "Smith", "Alaska" ],
          [ "Jim", "Bob", "California" ],[ "Jack", "White", "Virginia" ],[ "John", "Smith", "Nevada" ],               
          [ "Hank", "Black", "Kentucy" ],[ "Sarah", "Hammy", "Florida" ],[ "Joe", "Blow", "Mississipi"]]

Now you can check that both lists have the same length (9 in this case):

>>> len(zList) == len(resultList
True
>>> len(zList)
9

From here, you can use dictionaries or lists. Being a novice programmer, you may not be familiar with dictionaries yet, so check out the documentation.

List:

Just loop over the length of the list, add it to a newlist, and append that newlist to your output list like so:

zList = [...]
resultList = [[...]]
matchList = [] #or whatever you want to call it

for i in range(len(zList)): #the index is needed, you can also use enumerate
    element_list = []
    element_list.append(zList[i]) #note zList[i] = 2nd iterator of enumerate
    for j in resultList[i]:  #the index is not needed, so use the value 
        element_list.append(j)
    matchList.append(elementList)

>>> print matchList
[1412, 'Bob', 'Smith', 'Ohio']
[2521, 'Jane', 'Doe', 'Texas']
[53522, 'John', 'Smith', 'Alaska']
[52632, 'Jim', 'Bob', 'California']
[1342, 'Jack', 'White', 'Virginia']
[1453, 'John', 'Smith', 'Nevada']
[3413, 'Hank', 'Black', 'Kentucy']
[342, 'Sarah', 'Hammy', 'Florida']
[25232, 'Joe', 'Blow', 'Mississipi'] #split in separate lines for clarity here

To add more data, simply increase the size of the lists inside of resultList so you can add job like:

resultList = [[ "Bob", "Smith", "Ohio", "Tech Support" ], ...

Dictionaries

I think this is the simpler way to go. Simply create a dict, then use the elements from zList to form keys with the corresponding elements from resultList as the entries like so:

matchDict = {}
for n in range(len(zList)): #need the index, remember?
    matchDict[zList[n]] = resultList[n]

>>> print matchDict
{ 1412 : ['Bob', 'Smith', 'Ohio'] ,
  1453 : ['John', 'Smith', 'Nevada'] ,
  25232 : ['Joe', 'Blow', 'Mississipi'] ,
  53522 : ['John', 'Smith', 'Alaska'] ,
  3413 : ['Hank', 'Black', 'Kentucy'] ,
  342 : ['Sarah', 'Hammy', 'Florida'] ,
  52632 : ['Jim', 'Bob', 'California'] ,
  2521 : ['Jane', 'Doe', 'Texas'] ,
  1342 : ['Jack', 'White', 'Virginia']  }

*Note, you can call elements from dictionaries with their keys, so print matchDict[1412] -> ["Bob", "Smith", "Ohio"]. Likewise, you can expand the data by adding more info to the resultList as was shown above.

Yu Hao
  • 111,229
  • 40
  • 211
  • 267
Matthew
  • 662
  • 4
  • 11
  • this is helpful, but it is very important that I retain the logical separation from the original zList. ie: (bob, jane, and john from group 1), (jim, jack, and john, from group 2). im not sure if your answer takes that into consideration? – nodoze Sep 03 '14 at 08:32
0

To get cleaner code you should consider the use of class to encapsulate data.

Let's see :

class Person(object):
    def __init__(self, identifier, firstname, name, state):
        self.id = identifier
        self.firstname = firstname
        self.name = name
        self.state = state

    def __repr__(self):
        return "<{0} {1} (id : {2}) living in {3}>".format(self.firstname, self.name, self.id, self.state)

    def as_list(self):
        return [self.firstname, self.name, self.state]

class PersonList(list):
    def __init__(self, *args, **kwargs):
        list.__init__(self, *args, **kwargs)

    def getById(self, identifier):
        """ return the person of this list whose the id is equals to the requested identifier. """
        # filter(boolean function, iterable collection) -> return a collection hat contain only element that are true according to the function.
        # here it is used a lambda function, a inline function declaration that say for any given object x, it return x.id == identifier.
        # the list is filtered to only get element with attribut id equals to identifier. See https://docs.python.org/3.4/library/functions.html#filter
        tmp = list(filter(lambda x: x.id == identifier, self))
        if len(tmp)==0:
            raise Exception('Searched for a Person whose id is {0}, but no one have this id.'.format(identifier))
        elif len(tmp) > 1:
            raise Exception('Searched for a Person whose id is {0}, and many people seem to share this id. id are supposed to be unique.'.format(identifier))
        return tmp[0]

##CONSTANTS##   
#id list - modified to not instanciate 9 Person
ids = [[1412,2521,3411],#bob, jane, jim
        [3411,1412,1412],#jim, bob, bob
        [3411,2521,2521]]#jim, jane, jane

#person list 
index=PersonList([Person(1412, 'Bob', 'Smith', 'Ohio'),
         Person(2521, 'Jane', 'Doe', 'Texas'),
         Person(3411, 'Jim', 'Black', 'New York')])

def computeResult(id_list, personList): 
    personList = [ [personList.getById(identifier) for identifier in subList] for subList in id_list]

    resultList= []
    for sublist in personList:
        tmp = []
        for person in sublist:
            tmp += person.as_list()
        resultList.append(tmp)

    return resultList

if __name__ == "__main__":

    print(computeResult(ids, index))

On one hand, in my opinion the code is harder to write as you are using wrong data structure. An application should handle things like Person object instead of list of list of string. Anyway it's your apps. I just recommend you to consider the use of personList as a better data structure to handle your data than this ugly list of list. On other hand, if id are unique as I think, if you succeed in put your data into a dictionary such as

index={1412 : Person(...), 2500 : Person(...), ...}

or

index={1412: ['Bob', 'Doe', ...], 2500 : [...], ...}` 

It would be definitively more practical as you can then delete the PersonList class and just make use of index.get(1412) for example to get data corresponding to the id.

EDIT : ADD example of trace as requested.

this script is saved in a file named "sof.py"

python3
>>> import sof
>>> sof.index
[<Bob Smith (id : 1412) living in Ohio>, <Jane Doe (id : 2521) living in Texas>, <Jim Black (id : 3411) living in New York>]
>>> sof.index.getById(666)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vaisse/Bureau/sof.py", line 25, in getById
  raise Exception('Searched for a Person whose id is {0}, but no one have this id.'.format(identifier))
Exception: Searched for a Person whose id is 666, but no one have this id.

As you can see in case of error everything stops. In case this behaviour isn't the one you want, you can also for example return a None value and keep a trace somewhere of what failed instead of rising an Exception, and then continue to process data. You should take a look at https://docs.python.org/3.1/library/warnings.html if you want your app still run even in case of error. Else the simple exception raising is enough

Arthur Vaïsse
  • 1,411
  • 1
  • 12
  • 23
  • i will look this over. i like the ability to raise exceptions on errors. the original list can have hundreds of 'groups of 3' so it is important to know if there was an error while trying to match. curious if you could post example output? – nodoze Sep 03 '14 at 08:37