Welcome to RocketMonkeys.com!

This is my personal site, where I store my rants, pictures, and movie reviews. Have a look around, register and leave comments.
-James

Show: [all] rants movies pictures

Page: Previous << 0 1 2 3 4 5 6 [7] 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 >> Next

Clash of the Titans

Posted by james on March 30, 2011

(post.rating: 5)
Rating:

IMDB   Apple Trailers

I saw the previews, and it made one thing clear - this was a special-effects monster fest.

I actually expected a lot worse, which is probably why I didn't hate this movie. It's not a good movie, but it's also really not as awful as you'd expect.

The most painful part - the lame character setup at the beginning. Instead of introducing all the characters (and the comic relief, and the wise old sage, etc etc) they could have just shown them briefly with their title (ie. "Young innocent soldier who will most definitely die quickly") and spared the setup. Heck, at that point they don't even need names (not that I remember any of them).

It was a little distracting to see so many recognizable actors & actresses - mainly because they didn't really seem to fit. The first half of the movie (aka the real-life / non-cgi part) was a little... lame. I can't remember a movie where I noticed the makeup as much. It seemed like they cut some corners on the costumes.

But the second half was enjoyable (minus the jar-jar-binks era Medusa CGI). Why? Because it was just a CGI-laden monster fest. And Sam Worthington was probably the single best thing about the movie. Probably the only unforgettable character here (though seeing Liam Nielson in yet another "god" role is ... well, let's just say he might be challenging Morgan Freeman as the go-to god guy).

So - not the most awful movie. Pretty much what you'd expect. I take back what I said about Sam Worthington - the *kraken* is the single best thing about this movie. He's huge, but well designed. In fact, don't even bother watching this movie - just find the kraken clip on youtube, watch that, and stop when you're done. That's all you really need.

PS. What the heck with the eye-less blind witches holding an eyeball in their hand? It's like Guillermo del Toro's creatures mixed with Auhgra from the Dark Crystal.

Fixing "Error: script stack space quota is exhausted" in Firefox 3.x

Posted by james on Feb. 1, 2011

I've gotten the "Error: script stack space quota is exhausted" problem before. It usually happens when I'm doing an unreasonable amount of data manipulation in Javascript, but it's still a problem.

I recently found a fix for a specific circumstance. I narrowed down the culprit to this line:

var elem = $('' + html + '');

The problem here is that the "html" var is a 500k string containing html markup. jQuery seems to be inefficient for this purpose, as running it through the $('') should create a jQuery node containing a new HTML DOM element with the contents of "html".

Using this instead fixed problem:

var elem = $('');
elem.innerHTML = html;

I suspected that innerHTML would be more efficient than jQuery's HTML parsing, and I was right. So in this specific circumstance, I found a way to avoid the issue. The bigger problem is still there: why the heck am I parsing 500k worth of HTML text through Javascript? No good reason, unfortunately.

Optimizing Django Views By Profiling

Posted by james on Jan. 15, 2011



Django is an amazing framework, my favorite so far. My django apps can be a bit slow, but most of the time there are very simple optimizations that can speed things up a huge amount.

I'm a firm believer of programming simply first, then optimizing later if necessary. Most optimizations are premature and not needed, and if they're not data-guided then they can often be wrong. At best they're just guesses based on assumptions.

So we start out with naive and simple view code. For this example, we have a view that accepts an uploaded file and imports any missing data into the database. This app is data heavy, so this script can take quite a while. Here's psuedo code:

for row in csv_file:
field1, field2, field3... = row

# Reformat the fields.
field2 = field2.lower()
field3 = Widget.objects.get(id=field3).name
# etc...

if Resource.objects.filter(id=field1).count() > 0:
# Duplicate, skip this resource.
pass
else:
# New resource, handle it.
resource = Resource()
resource.name = field2
resource.something = field3
resource.foreign = AnotherModel.objects.get(name=field4)
# etc...
resource.save()


The code itself is unimportant. The gist is this - we parse a file, each row contains fields. If the resource already exists, skip it. Otherwise create a new resource with the reformated fields.

The first step is to profile. Again, if you optimize without measuring, you're just taking shots in the dark. The most helpful profile is the line_profiler by Robert Kern. It runs the function you give it, then can spit out a list of code with profiling info attached to each line. This is much more useful than other profilers that give you a list of profiling info, but only at the per-function level. With those profilers, you can only see which functions are slow. With the line_profiler, you can see which line within your function is slow. Often this is more useful.

So we profile our code. There are two useful metrics here: overall time, and the amount of time taken per line (given in % of overall time). The first is useful for seeing if your changes are speeding up the function. The second is useful for hunting down which line is taking the most time. So if we profile, we see something like this:

Time: 10s.

9% for row in csv_file:
field1, field2, field3... = row

# Reformat the fields.
1% field2 = field2.lower()
20% field3 = Widget.objects.get(id=field3).name
# etc...

40% if Resource.objects.filter(id=field1).count() > 0:
# Duplicate, skip this resource.
pass
else:
# New resource, handle it.
resource = Resource()
resource.name = field2
resource.something = field3
20% resource.foreign = AnotherModel.objects.get(name=field4)
# etc...
10% resource.save()


These numbers are completely made up, but give the general proportions. Right away you can see that one line is taking 40% of our time: the .count() check. That's a good candidate for optimization.

I remembered that using .exists() was preferred over using .count(), so we'll try that:

Time: 9s.

10% for row in csv_file:
field1, field2, field3... = row

# Reformat the fields.
1% field2 = field2.lower()
24% field3 = Widget.objects.get(id=field3).name
# etc...

30% if Resource.objects.filter(id=field1).exists():
# Duplicate, skip this resource.
pass
else:
# New resource, handle it.
resource = Resource()
resource.name = field2
resource.something = field3
23% resource.foreign = AnotherModel.objects.get(name=field4)
# etc...
12% resource.save()


What's happened here? The line we changed has a lower percent, but all the other lines have larger percents. This is always true - the total is always 100%. However, the overall time has decreased 1s. In other words, we've increased the performance of the function (9s instead of 10s) by decreasing the impact of a single line (30% instead of 40%). The other lines are not taking up any more time... they're just responsible for more percent of the overall time now that our optimized line is taking less. Make sense? Just know - overall time decrease means you're doing well.

Next, 30% is still too high. We can do better. There are a ton of Resource objects in the DB, and scanning them for each row takes a lot of time. Let's cache the resource ids and check a static python array.

Time: 5s.

# Cache resources
resource_values = Resource.objects.all().values('id')
27% resource_ids = [resource_value[0] for resource_value in resource_values]

12% for row in csv_file:
field1, field2, field3... = row

# Reformat the fields.
1% field2 = field2.lower()
24% field3 = Widget.objects.get(id=field3).name
# etc...

1% if field1 in resource_ids:
# Duplicate, skip this resource.
pass
else:
# New resource, handle it.
resource = Resource()
resource.name = field2
resource.something = field3
23% resource.foreign = AnotherModel.objects.get(name=field4)
# etc...
12% resource.save()


Great! In this case we get all the resources at once, which takes some time, but then every row uses the python object w/o touching the DB. Often, access to your DB is the slowest part of your view. The fewer queries you make, the better off you are. We can expand that even further.

Time: 2s.

# Cache resources
resource_values = Resource.objects.all().values('id')
27% resource_ids = [resource_value[0] for resource_value in resource_values]

widgets = {}
5% for widget in Widget.objects.all():
widgets[widget.id] = widget.name

6% another_models = {}
for another_model in AnotherModel.objects.all():
another_models[another_model.name] = another_model

30% for row in csv_file:
field1, field2, field3... = row

# Reformat the fields.
1% field2 = field2.lower()
field3 = widgets[field3]
# etc...

1% if field1 in resource_ids:
# Duplicate, skip this resource.
pass
else:
# New resource, handle it.
resource = Resource()
resource.name = field2
resource.something = field3
resource.foreign = another_models[field4]
# etc...
30% resource.save()


In this case, caching the other models (Widget and AnotherModel) is worthwhile because there aren't many of those objects. If there were too many Widgets, then this might increase time. That's why we measure. Even if there are 100's of Widgets, there may be 100,000's of rows in the file. In which case, you're trading memory (to store the widgets in the python list) vs. CPU. And profiling (ie. measuring) helps you determine when this is appropriate.

At this point, I'm satisfied. You may think, "Wow, there are two 30% items... I should optimize those!" But realistically, those are taking very little time. They were taking a very small % of time for the original function. They haven't gotten any slower - we've just improved everything else up to the point where these are the only things left taking time. And at this point, you've reached the bottleneck of python (and it's libraries) itself. Think of it this way: the function has gone from taking 9s to taking 2s. That's a 4.5x increase (450% faster). Not bad for a little bit of optimization. We haven't had to get "tricky" (ie. using C-compiled extensions, bypassing the framework, etc). Just a few simple tricks to get a lot more performance.

A last note - you should be able to stop optimizing when it makes sense. Get all the low-hanging fruit, then stop. If you absolutely need to optimize the CSV reading/parsing class, or if the django .save() function is way too slow, then you probably shouldn't be using them at all. You probably shouldn't be using python at that point. But most apps simply don't need anything close to this. For most small apps, django & python is more than fast enough. Put another way - most of the time, the language & tools are not the bottleneck - your code is.


Page: Previous << 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 >> Next