Python 201: How to sort a dictionary by value

The other day I was asked if there was a way to sort a dictionary by value. If you use Python regularly, then you know that the dictionary data structure is by definition an unsorted mapping type. Some would define a dict as a hash table. Regardless, I needed a way to sort a nested dictionary (i.e. a dictionary of dictionaries) based on a value in the nested dictionaries so I could iterate over the keys in the specified order. We'll spend some time looking at an implementation I found.

After Googling for ideas, I came across an answer on StackOverflow that did most of what I wanted. I had to modify it slightly to make it sort using my nested dictionary values though, but that was surprisingly easy. Before we get to the answer, we should take a quick look at the data structure. Here is a variation of the beast minus the private parts that were removed for your safety:


mydict = {'0d6f4012-16b4-4192-a854-fe9447b3f5cb': 
          {'CLAIMID': '123456789',
           'CLAIMDATE': '20120508', 
           'AMOUNT': '365.64', 'EXPDATE': '20120831'}, 
          'fe614868-d0c0-4c62-ae02-7737dea82dba': 
          {'CLAIMID': '45689654', 
           'CLAIMDATE': '20120508', 
           'AMOUNT': '185.55', 'EXPDATE': '20120831'}, 
          'ca1aa579-a9e7-4ade-80a3-0de8af4bcb21': 
          {'CLAIMID': '98754651',
           'CLAIMDATE': '20120508', 
           'AMOUNT': '93.00', 'EXPDATE': '20120831'},
          'ccb8641f-c1bd-45be-8f5e-e39b3be2e0e3': 
          {'CLAIMID': '789464321',
           'CLAIMDATE': '20120508', 'AMOUNT': '0.00',
           'EXPDATE': ''}, 
          'e1c445c2-5148-4a08-9b7e-ff5ed51c43ed': 
          {'CLAIMID': '897987945', 
           'CLAIMDATE': '20120508', 
           'AMOUNT': '62.66', 'EXPDATE': '20120831'}, 
          '77ad6dd4-5704-4060-9c38-6a93721ef98e': 
          {'CLAIMID': '23212315',
           'CLAIMDATE': '20120508', 
           'AMOUNT': '41.05', 'EXPDATE': '20120831'}
          }

Now we know what we're dealing with. Let's take a quick look at the slightly modified answer I came up with:

sorted_keys = sorted(mydict.keys(), key=lambda y: (mydict[y]['CLAIMID']))

That's a pretty spiffy one-liner, but I think it's a little confusing. Here's my understanding of how it works. The sorted function sorts a list (the dict's keys) based on the key, which in this case is an anonymous function (the lambda). The anonymous function is passed the dictionary plus one of the outer keys and the inner key we want to sort on, which in this case is 'CLAIMID'. Once it's sorted, it returns the new list. Personally I find lambdas a little confusing, so I usually spend a little time deconstructing them into a named function just so I can understand them a little better. So without further ado, here's a function version of the same script:

#----------------------------------------------------------------------
def func(key):
    """"""
    return mydict[key]['CLAIMID']

sorted_keys = sorted(mydict.keys(), key=func)

for key in sorted_keys:
    print mydict[key]['CLAIMID']

And just for fun, let's write a script that can sort the nested dictionary by ANY of the keys inside it.

mydict = {'0d6f4012-16b4-4192-a854-fe9447b3f5cb': 
          {'CLAIMID': '123456789',
           'CLAIMDATE': '20120508', 
           'AMOUNT': '365.64', 'EXPDATE': '20120831'}, 
          'fe614868-d0c0-4c62-ae02-7737dea82dba': 
          {'CLAIMID': '45689654', 
           'CLAIMDATE': '20120508', 
           'AMOUNT': '185.55', 'EXPDATE': '20120831'}, 
          'ca1aa579-a9e7-4ade-80a3-0de8af4bcb21': 
          {'CLAIMID': '98754651',
           'CLAIMDATE': '20120508', 
           'AMOUNT': '93.00', 'EXPDATE': '20120831'},
          'ccb8641f-c1bd-45be-8f5e-e39b3be2e0e3': 
          {'CLAIMID': '789464321',
           'CLAIMDATE': '20120508', 'AMOUNT': '0.00',
           'EXPDATE': ''}, 
          'e1c445c2-5148-4a08-9b7e-ff5ed51c43ed': 
          {'CLAIMID': '897987945', 
           'CLAIMDATE': '20120508', 
           'AMOUNT': '62.66', 'EXPDATE': '20120831'}, 
          '77ad6dd4-5704-4060-9c38-6a93721ef98e': 
          {'CLAIMID': '23212315',
           'CLAIMDATE': '20120508', 
           'AMOUNT': '41.05', 'EXPDATE': '20120831'}
          }

outer_keys = mydict.keys()
print "outer keys:"
for outer_key in outer_keys:
    print outer_key
    
print "*" * 40
inner_keys = mydict[outer_key].keys()

for key in inner_keys:
    sorted_keys = sorted(mydict.keys(), key=lambda y: (mydict[y][key]))
    print "sorted by: " + key
    print sorted_keys
    for outer_key in sorted_keys:
        print mydict[outer_key][key]
    print "*" * 40
    print

This code works, but it doesn't give the results I expected. Try running this and you'll notice that the output is kind of weird. The sorting is being done on strings, so all the values that look like numbers are sorted like strings. Oops! Most people would want the numbers sorted like numbers, so we need to do a quick conversion of the number-like values into integers or floats. Here's the final version of the code (yes, it's a little sloppy):

mydict = {'0d6f4012-16b4-4192-a854-fe9447b3f5cb': 
          {'CLAIMID': '123456789',
           'CLAIMDATE': '20120508', 
           'AMOUNT': '365.64', 'EXPDATE': '20120831'}, 
          'fe614868-d0c0-4c62-ae02-7737dea82dba': 
          {'CLAIMID': '45689654', 
           'CLAIMDATE': '20120508', 
           'AMOUNT': '185.55', 'EXPDATE': '20120831'}, 
          'ca1aa579-a9e7-4ade-80a3-0de8af4bcb21': 
          {'CLAIMID': '98754651',
           'CLAIMDATE': '20120508', 
           'AMOUNT': '93.00', 'EXPDATE': '20120831'},
          'ccb8641f-c1bd-45be-8f5e-e39b3be2e0e3': 
          {'CLAIMID': '789464321',
           'CLAIMDATE': '20120508', 'AMOUNT': '0.00',
           'EXPDATE': ''}, 
          'e1c445c2-5148-4a08-9b7e-ff5ed51c43ed': 
          {'CLAIMID': '897987945', 
           'CLAIMDATE': '20120508', 
           'AMOUNT': '62.66', 'EXPDATE': '20120831'}, 
          '77ad6dd4-5704-4060-9c38-6a93721ef98e': 
          {'CLAIMID': '23212315',
           'CLAIMDATE': '20120508', 
           'AMOUNT': '41.05', 'EXPDATE': '20120831'}
          }

outer_keys = mydict.keys()
print "outer keys:"
for outer_key in outer_keys:
    print outer_key
    
print "*" * 40
inner_keys = mydict[outer_key].keys()

for outer_key in outer_keys:
    for inner_key in inner_keys:
        if mydict[outer_key][inner_key] == "":
            continue
        try:
            mydict[outer_key][inner_key] = int(mydict[outer_key][inner_key])
        except ValueError:
            mydict[outer_key][inner_key] = float(mydict[outer_key][inner_key])
        
for key in inner_keys:
    sorted_keys = sorted(mydict.keys(), key=lambda y: (mydict[y][key]))
    print "sorted by: " + key
    print sorted_keys
    for outer_key in sorted_keys:
        print mydict[outer_key][key]
    print "*" * 40
    print

So now we have it sorted in a way that's more natural to human perceptions. Now there's one other way we could do this and that's sorting the data the way we want to BEFORE we put it into our data structure. However, that will only work if we use an OrderedDict from the collections module starting in Python 2.7. You can read about it in the official documentation.

Now you know what I know about this topic. I'm sure my readers will have other solutions or ways to do it too. Feel free to mention them or link to them in the comments.

Further Reading

Copyright © 2024 Mouse Vs Python | Powered by Pythonlibrary