Skip navigation.
 
mlRe: Best Way To Lookup From a Huge Table
FROM : E. Wing
DATE : Fri Mar 21 00:58:13 2008

You really should profile to find your bottlenecks, especially when
the STL is concerned. My personal experience has been that gcc poorly
optimizes STL code automatically for you and you must go in and
profile to eliminate the real bottlenecks.

A real world case I dealt with a couple years back, we were loading in
a data set containing around 36,000 objects which contained lots of
fields of numbers and strings. We were slurping a Lua file and copying
all data into a C++ hash_map (extension). We needed to do full copies
of the data so I'm sure we had maps with full-blown std::strings and
not pointers to std::strings.

The performance was slow for us on a 1.3GHz Powerbook under Tiger/gcc
4.0. It took over a minute and a half to load. The knee-jerk reaction
was to blame Lua, but when we built and ran the same code under Visual
Studio 7.1/Windows XP on an almost comparable system, the runtime was
under half a second.

So we used Shark. In about 7 iterations (maybe 20 minutes), we got the
Mac down to about half a second.

>From memory, these are some of the things we learned about gcc.

- Turn on optimization flags or you will get very slow STL code
- hash_map was killing us because we were inserting like std::map (one
at a time) so we think we were getting hit when the hash_map needed to
be grown. Switching to std::map actually helped a little (but not
overwhelmingly).
- Do not use the overloaded bracket operators. Removing these and
using .find() and insert() directly produced magnitudes faster code
for us
- Avoid creating temporary/intermediate objects. gcc didn't seem to
optimize these out.

I think there were several other things we learned, but I can't
remember them off hand.

I don't use Visual Studio that much, but I was impressed at how good
their optimizers worked for our STL cases. (This was not the first run
in we've had with slow STL/gcc performance vs. Visual Studio.)



Anyway, my hunch is you shouldn't have any problem with
std::map<std::string, std::string> with 41k elements as long as you
use the profiler.


-Eric

Related mailsAuthorDate
mlBest Way To Lookup From a Huge Table Karan Lyons Mar 13, 22:11
mlRe: Best Way To Lookup From a Huge Table John Stiles Mar 13, 22:27
mlRe: Best Way To Lookup From a Huge Table Jens Alfke Mar 13, 23:30
mlRe: Best Way To Lookup From a Huge Table Chris Hanson Mar 15, 01:31
mlRe: Best Way To Lookup From a Huge Table John Stiles Mar 15, 01:55
mlRe: Best Way To Lookup From a Huge Table Chris Hanson Mar 15, 02:15
mlRe: Best Way To Lookup From a Huge Table Ben Trumbull Mar 15, 02:39
mlRe: Best Way To Lookup From a Huge Table Scott Ribe Mar 15, 03:35
mlRe: Best Way To Lookup From a Huge Table Paul Thomas Mar 16, 10:19
mlRe: Best Way To Lookup From a Huge Table John Stiles Mar 17, 01:26
mlRe: Best Way To Lookup From a Huge Table Thomas Davie Mar 17, 17:31
mlRe: Best Way To Lookup From a Huge Table James Hober Mar 17, 19:38
mlRe: Best Way To Lookup From a Huge Table Scott Ribe Mar 19, 01:22
mlRe: Best Way To Lookup From a Huge Table E. Wing Mar 21, 00:58
mlRe: Best Way To Lookup From a Huge Table Michael Ash Mar 21, 05:48
mlRe: Best Way To Lookup From a Huge Table John Stiles Mar 21, 17:13
mlRe: Best Way To Lookup From a Huge Table Michael Ash Mar 21, 21:37
mlRe: Best Way To Lookup From a Huge Table John Stiles Mar 21, 21:51
mlRe: Best Way To Lookup From a Huge Table Michael Ash Mar 21, 22:03
mlRe: Best Way To Lookup From a Huge Table John Stiles Mar 21, 22:08
mlRe: Best Way To Lookup From a Huge Table Thomas Engelmeier Mar 22, 19:46