Planet Linux Australia

Syndicate content
Planet Linux Australia - http://planet.linux.org.au
Updated: 1 hour 25 min ago

Jonathan Adamczewski: Learn more about caches & library implementations. Use fewer linked lists.

Sat, 2015-01-03 14:27

CppCon 2014 talk by Chandler Carruth:

“Efficiency with Algorithms, Performance with Data Structure”

It starts slow, but there’s plenty of good things in this one:

  • Don’t use linked lists [0]
  • std::map is terrible
  • Computers are made of memory hierarchies [1]
  • [If you’re going to use it, ] know how the standard library works [2]
  • std::unordered_map has a ridiculously long name and it isn’t a good hash map

 

[0] The thing about linked lists is that they’re at their worst when there is no spatial locality between the elements. Because of the way that std::list allocates elements, this will almost always be the case.

Linked lists can be useful for gathering data, but they’re expensive to traverse and modify. If you need to traverse the list more than once, consider using the first traversal to repack the list into a vector.

 

[1] The presentation does give some insight into the relative cost of cached memory accesses, and it would be remiss of me to not link to Tony Albrecht’s interactive cache performance visualizer. And std::vector is recommended over std::list for the spatial benefits (elements known to be near other elements, which is good). But the value of ordered accesses was overlooked in the presentation.

Put your data in a vector and process it in order from one end to the other. Any decent CPU will detect the ordered access and prefetch the data into the cache for you, and less time will be spent waiting for that data. It’s like magic.

 

[2] The presentation included two very specific examples regarding the inner workings of the standard library, one for std::vector, and one for std::unordered_list. It is right and good that you understand the workings of any library that you are using, but there were more general principles that were imho overlooked:

  • For std::vector and std::list (and not exclusive to them): JIT memory allocation is terrible — use what you know about the problem to minimize and manage memory allocations. To quote from Andreas Fredrikssons’ fine rant (which you should read):

    Never make memory management decisions at the lowest level of a system.
  • Always explicitly keep a copy of intermediate results that are used more than once. Don’t trust the compiler to optimize for you — more to the point, understand why compilers legitimately can’t optimize away your lazy copypasta. (Mike Acton’s CppCon keynote has some material that covers this)