The News: Ruby on Rails performance is dominated by garbage collection. We present a set of patches to greatly improve Rails performance and show how to profile memory usage to get further performance gains.
What's at Stake: Rails is slow for many uses and did not lend itself well to optimization. Significant performance gains could only be achieved at application level at large development cost.
The Upside:
- 5x potential performance gains;
- easy way to identify whether GC is a bottleneck;
- deterministic process to fix memory bottlenecks;
- set of canned patches to solve the biggest problems;
- you can help
I don't have time, make my app 5 times faster!
- Apply the monkey patches below. Enjoy.
- Time your app with garbage collection disabled to figure out how much more performance you are leaving on the table. Enjoy.
- Come back for more patches. Enjoy.
What's going on here?
Ruby runtime uses garbage collection. Your code allocates memory as it runs. From time to time garbage collector kicks in, halts the whole system, and cleans up no-longer-referenced memory using mark-and-sweep algorithm.
GC gets triggered by multiple conditions. The one that matters the most is 8M trigger. Every time you allocate 8M of memory GC runs. Complex Rails request can allocate hundreds or even thousands of megabytes of memory, making GC runs dozens of times. Each GC pass takes 50-150ms. You do the math.
It's possible to patch ruby interpreter, increase the triggers and reduce frequency of garbage collection at the cost of additional memory use. This helps, but does not resolve the problem. If there is more garbage, each GC run takes longer. We'll devote a separate post to this later.
But, I am running ruby-prof already, you ignorant fool!?
And so are we. It does not really help:
- ruby-prof attributes full cost of garbage collection to the method where GC gets triggered, not the methods that allocate the memory. That's like blaming the last straw for breaking the camel's back.
- ruby-prof does not show that garbage collection took place and how much time it took. All you can tell is that some method took a long time.
If you go by the these result, and remove the straw that's breaking the camel's back his load does not get much lighter.
Statistics being on your side, things are not entirely hopeless. GC gets triggered within malloc, so methods doing a lot allocations are also likely to trigger GC. Further, simply allocating memory is relatively expensive, and that will also show up in profiler output.
You might get lucky, optimize ruby-prof hotspot and get better performance, but more often than not you will fail. The correlation is just too weak. We believe this is exactly why Rails resisted performance optimization. Up till now.
Profiling memory allocations
Ruby-prof is almost the right tool for memory profiling. "Almost" is because the free cheese is only found in the mousetrap. You'll have to do some work first.
You need to build patched ruby (patch
here)
and latest ruby-prof (patch
here). With
both tools patched, set RubyProf.measure_mode = RubyProf::MEMORY
before running the profiler.
Make sure to turn off GC when profiling for pure performance. You'll get get much more representative results.
The detailed HOWTO and description on the process is in our next blog post.
Will it help me?
In general your code will benefit most if you:
- Create or update a lot of rows using ActiveRecord
- Run large number of SQL statements of any kind using ActiveRecord, e.g. load many AR object through separate #find calls
- Use ActiveView helpers, especially in a loop, to generate complex HTML
- Use has_many AR association
It's easy to tell for sure if you have performance tests. All that "performance test" means is that you can run a single request and time how long it takes.
Just run the same test as usual, and then with GC disabled and compare the results. Most code will run much faster with GC disabled. E.g. for Acunote we test how long it takes to copy 120 tasks from one sprint from another.
Before optimizations with GC: 14 sec
Before optimizations w/out GC: 6 sec
You can't run ruby without GC in production. It will rapidly run out of memory and crash. You can, however, apply optimization, produce less garbage and get your code be even faster than original benchmark with GC turned off!
After optimizations with GC: 3 sec
The code that allocates little memory, but is algorithmically complex and is slow because of that, will see little improvement. For example Acunote's burndown and sprint completion date calculations fall squarely into this category and did not benefit. These aside, Acunote performance was dominated by the cost of memory allocation.
Give me the patches!
Problem | Patch |
---|---|
ActiveRecord::Base#create calls attributes_with_quotes twice, ActiveRecord::Base#update once. #attributes_with_quotes calls #attributes , which clones all attribute values. None of these clones are necessary.
Impact: When copying 120 tasks in Acunote this costs 650M. Patch improves performance from 14s to 6s. Notes: Helps when you do a lot of creates and updates through ActiveRecord. | Rails 1.2 Rails 2.0 |
Rails surrounds every SQL call and ActionView::Base#render with Benchmark#realtime . Benchmark#realtime allocates unnecessary 45k per call.
Impact: When copying 120 tasks in Acunote this costs 120M. Patch improves performance from 6s to 5s. Notes: Helps when you do read, create, update lots of AR records, or execute any other SQL, or call #render(:partial) in a loop. | |
Rails 1.2 tries to pretty-print and log every SQL statement even when logger is in Logger::INFO mode (production environment) and prints nothing.
Impact: When copying 120 tasks in Acunote this costs 17M. Patch improves performance by 1s. Notes: Helps when you do read, create, update lots of AR records, or execute any other SQL in a loop. Patch only applicable to Rails 1.2. Rails 2.0 has the correct log_info implementation. |
Rails 1.2
Rails 2.0 No patch necessary. |
Comparison of BigDecimal's (used by Rails for Numeric data types) to booleans is slow because it requires unnecessary method_missing call and exception catch.
Impact: When rendering 120 tasks in Acunote this costs 4M. Patch improves performance by 100-120ms. Notes: Helps when you have the imprudence to accidentally compare BigDecimal's with true or false. | |
ActionView::Helpers::JavaScriptHelper allocates lots of memory when run. It runs every time view is rendered.
Impact: When rendering 120 tasks in Acunote this costs 5M. This improves performance by 100-110ms. Notes: Helps when you have rich AJAX page with lots of dynamic elements. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. |
Hand-replace link_to_function with <a href="#" onclick="..."/> .
|
ActionView::Helpers::FormTagHelper allocates lots of memory when run. It runs every time view is rendered.
Impact: When rendering 120 tasks in Acunote this costs 2M. This improves performance by 40-50ms. Notes: Helps when you have page with lots of input boxes. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. |
Hand-replace text_field_tag with <input type="text"/> .
|
ActionView::Helpers::AssetTagHelper allocates lots of memory when run. It runs every time view is rendered.
Impact: When rendering 120 tasks in Acunote this costs 8M. This improves performance by 110-150ms. Notes: Helps when you have page with lots of images. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. image_tag itself can be optimized. Help wanted.
|
If you don't use asset hosts, hand-replace image_tag with <img src="..."/> .If you use asset hosts, there's no patch. |
Passing a block to a method on instance of ActiveRecord::Associations chews up memory.An example (from activerecord/lib association.select { |record| record.new_record? } line inside #add_multiple_associated method.
Impact: This is used all over ActiveRecord code, and often by application code. Notes: We have been unable to figure out the root cause. Help wanted. Separate article to come. |
None known. Sometimes it helps to convert association to array before using, for example association.to_a.select { |record| record.new_record? }
|
How can I help?
- Profile memory allocations for your Rails application. Figure out and fix hotspots. Post or link to patches and notes in the comments section. We'll be updating this post, as you and we figure out more.
- If you are writing a large commercial Rails application, have one of you engineers do some memory profiling. Fast is a feature, and it's a nice way to contribute to the community.
- Help us figure out the some of the issues we found, but haven't been able to figure out yet.
Submitted Patches
Benchmark#realtime
patch
Patch accepted. Matz committed it to Ruby 1.8 branch. Also, included in Rails trunk.ActiveRecord::Base#attributes_with_quotes
patch
Patch accepted. Michael Koziarski committed it to Rails trunk.
Thanks To
- Alexander Dymo for actually doing all the work
- Alexander Goldstein for directing us to look at GC in the first place
- DHH and rails-core for giving us something to optimize, and for making Rails code so easy to understand and modify
- Charlie Savage and Shugo Maeda for ruby-prof
- Sylvain Joyeux's for original memory allocations number measurement patch
- Stefan Kaes for all his performance work, railsbench and especially GC patch
Sightings
- Charlie Savage: Must Read Rails Performance Article
- Charlie Savage: ruby-prof 0.6.0 and Memory Profiling
- Ruby Inside: Fixing Ruby & Rails' slow performance with patches!
- Michael Klishin: Making Ruby on Rails applications 5 times faster: patches attached
- Wes Maldonado: Acunote/Nimble Method Patches Speeding up Ruby on Rails
- Accelerate HR: More notes on super-fast data insertion in Rails
- Chu Yeow Living on the edge (of Rails) #6 - better performance, Git support, and more
- Antares Trader: Collecting the Garbage in Ruby
- edhickey.com: Mongrel memory usage
- Mike Perham: Tuning ActiveRecord
Updates
- Update 1: noted that patches are against 1.2.4 Requested patches against 2.0
- Update 2: added patches against Rails 2.0
- Update 3: added "submitted patches" section with upstream patches
- Update 4: #realtime and #attributes_with_quotes patches accepted upstream
- Update 5: added Sightings section