Let’s start with two very important questions:
(1) Why should I care to benchmark Rust since it’s already super fast?
That’s the question we’ll try to answer in this post!
(2) Will there be a 🍕 demo?
Of course, you know that I’m a true 🍕🍕🍕 lover!
What’s so special about Serverless?
Benchmarking is not specific to serverless. But in serverless components, such as AWS Lambda functions, performance really matters for two main reasons:
-
Cold start duration (good news, Rust is really performant as you can see in my daily updated benchmark)
-
Runtime duration, as AWS is billing per millisecond.
Let’s see how we can measure and improve this runtime duration using benchmarks!
What are we going to benchmark?
Let’s take a very simple example:
a function which returns the pizza of the day.
🍕 (I told you)
We will compare two different implementations:
- one with HashMap –
std::collections::HashMap
- one with Vec –
std::vec::Vec
(if you don’t know much about Rust, you might wish to take a look at some basic Rust code before we start, like the Rust book or my Rust Youtube channel 😇)
First, let’s define our PizzaStore
trait:
pub trait PizzaStore {
fn get_pizza_of_the_day(&self, day_index: i32) -> &str;
}
And our first implementation with HashMap:
// let's make sure we don't initialize a new HashMap each time
pub struct PizzaHashMap<'a> {
cache: HashMap<i32, &'a str>,
}
impl<'a> PizzaHashMap<'a> {
pub fn new() -> Self {
PizzaHashMap {
cache: HashMap::from([
(0, "margherita"),
(1, "deluxe"),
(2, "veggie"),
(3, "mushrooms"),
(4, "bacon"),
(5, "four cheese"),
(6, "pepperoni"),
// what's your favorite?
]),
}
}
}
impl<'a> PizzaStore for PizzaHashMap<'a> {
// let's get the pizza of the day from the cache (HashMap)
fn get_pizza_of_the_day(&self, day_index: i32) -> &str {
match self.cache.get(&day_index) {
Some(&pizza) => pizza,
None => panic!("could not find the pizza"),
}
}
}
Writing our first benchmark
There are quite some crates to create benchmarks but we’ll use criterion
here.
Let’s start by creating a benches
folder containing a benchmark.rs
file and add our first criterion:
fn criterion_hashmap(c: &mut Criterion) {
let mut rng = rand::thread_rng();
// we create the cache outside of the bench function so only one HashMap will be created
let pizza_store_hashmap = PizzaHashMap::new();
// we call get_pizza_of_the_day with a random day index
c.bench_function("with hashmap", |b| b.iter(|| pizza_store_hashmap.get_pizza_of_the_day(rng.gen_range(0..7))));
}
We also need a bench group (as we will add more criterion later)
criterion_group!(benches, criterion_hashmap);
criterion_main!(benches);
That’s it! Let’s run it with cargo bench
By default, it runs our function for about 5seconds (that’s about 300M+ iterations)
Running benches/benchmark.rs (target/release/deps/benchmark-2f28819806c8c7c9)
with hashmap time: [13.069 ns 13.090 ns 13.114 ns]
Left and right values are lower and upper bounds.
The number in the middle is the best estimation on how long each iteration is likely to take.
Note that those 3 numbers are extremely alike. This won’t be the case if you’re depending on networking for instance.
Second implementation: with Vec
Let’s create a different implementation using Vec instead of using HashMap using the following code.
pub struct PizzaVec<'a> {
cache: Vec<&'a str>
}
impl<'a> PizzaVec<'a> {
pub fn new() -> Self {
PizzaVec {
cache: vec!["margherita","deluxe","veggie","mushrooms","bacon","four cheese","pepperoni"],
}
}
}
impl<'a> PizzaStore for PizzaVec<'a> {
fn get_pizza_of_the_day(&self, day_index: i32) -> &str {
match self.cache.get(day_index as usize) {
Some(&pizza) => pizza,
None => panic!("could not find the pizza"),
}
}
}
Let’s create a new criterion so we can compare (that’s the goal of benchmarks!)
Back in benchmark.rs
we can add
fn criterion_vec(c: &mut Criterion) {
let mut rng = rand::thread_rng();
let pizza_store_vec = PizzaVec::new();
c.bench_function("with vector", |b| b.iter(|| pizza_store_vec.get_pizza_of_the_day(rng.gen_range(0..7))));
}
and update our group to include this new criterion:
criterion_group!(benches, criterion_hashmap, criterion_vec);
so we can re-run our benchmarks with : cargo bench
and check the result!
with hashmap time: [13.096 ns 13.117 ns 13.141 ns]
with vector time: [7.5832 ns 7.5958 ns 7.6097 ns]
By replacing our HashMap with a Vec, our program is now running almost twice as fast!
This is a great reminder on how the data structure choice is really important depending on the use case 😇
Ok great, but how does it translate to Serverless?
Two AWS Lambda Functions have been deployed embedding each one of the implementations.
Each lambda function calls 10_000_000 times get_pizza_of_the_day
In us-east-1
with 128MB
, here are the results:
Implementation | Runtime duration |
---|---|
HashMap | 267.92 ms |
Vec | 87.98 ms 🤯🤯 |
That’s it!
Of course this was a simple example but I hope I’ve convinced you to use benchmarks to optimize your code!
Bonus question: where does this overhead come from?
Stay tuned for the next blog post about Rust profiling!
❤️ ❤️ ❤️ Did you like this content? ❤️ ❤️ ❤️
- Follow me on LinkedIn & Twitter
- Check my Rust Youtube channel
- Share with your friends <3