😱 Status quo stories: Barbara carefully dismisses embedded Future
🚧 Warning: Draft status 🚧
This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.
If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!
The story
Barbara is contributing to an OS that supports running multiple applications on a single microcontroller. These microcontrollers have as little as 10's of kilobytes of RAM and 100's of kilobytes of flash memory for code. Barbara is writing a library that is used by multiple applications -- and is linked into each application -- so the library is very resource constrained. The library should support asynchronous operation, so that multiple APIs can be used in parallel within each (single-threaded) application.
Barbara begins writing the library by trying to write a console interface, which allows byte sequences to be printed to the system console. Here is an example sequence of events for a console print:
- The interface gives the kernel a callback to call when the print finishes, and gives the kernel the buffer to print.
- The kernel prints the buffer in the background while the app is free to do other things.
- The print finishes.
- The app tells the kernel it is ready for the callback to be invoked, and the kernel invokes the callback.
Barbara tries to implement the API using
core::future::Future
so that the library can be compatible with the async Rust ecosystem. The OS
kernel does not expose a Future-based interface, so Barbara has to implement
Future
by hand rather than using async/await syntax. She starts with a
skeleton:
#![allow(unused)] fn main() { /// Passes `buffer` to the kernel, and prints it to the console. Returns a /// future that returns `buffer` when the print is complete. The caller must /// call kernel_ready_for_callbacks() when it is ready for the future to return. fn print_buffer(buffer: &'static mut [u8]) -> PrintFuture { // TODO: Set the callback // TODO: Tell the kernel to print `buffer` } struct PrintFuture; impl core::future::Future for PrintFuture { type Output = &'static mut [u8]; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> { // TODO: Detect when the print is done, retrieve `buffer`, and return // it. } } }
Note: All error handling is omitted to keep things understandable.
Barbara begins to implement print_buffer
:
#![allow(unused)] fn main() { fn print_buffer(buffer: &'static mut [u8]) -> PrintFuture { kernel_set_print_callback(callback); kernel_start_print(buffer); PrintFuture {} } // New! The callback the kernel calls. extern fn callback() { // TODO: Wake up the currently-waiting PrintFuture. } }
So far so good. Barbara then works on poll
:
#![allow(unused)] fn main() { fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> { if kernel_is_print_done() { return Poll::Ready(kernel_get_buffer_back()); } Poll::Pending } }
Of course, there's something missing here. How does the callback wake the
PrintFuture
? She needs to store the
Waker
somewhere! Barbara puts the Waker
in a global variable so the callback can
find it (this is fine because the app is single threaded and callbacks do NOT
interrupt execution the way Unix signals do):
#![allow(unused)] fn main() { static mut PRINT_WAKER: Option<Waker> = None; extern fn callback() { if let Some(waker) = unsafe { PRINT_WAKER.as_ref() } { waker.wake_by_ref(); } } }
She then modifies poll
to set PRINT_WAKER
:
#![allow(unused)] fn main() { fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> { if kernel_is_print_done() { return Poll::Ready(kernel_get_buffer_back()); } unsafe { PRINT_WAKER = Some(cx.waker()); } Poll::Pending } }
PRINT_WAKER
is stored in .bss
, which occupies space in RAM but not flash. It
is two words in size. It points to a
RawWakerVTable
that is provided by the executor. RawWakerVTable
's design is a compromise that
supports environments both with and without alloc
. In no-alloc
environments,
drop
and clone
are generally no-ops, and wake
/wake_by_ref
seem like
duplicates. Looking at RawWakerVTable
makes Barbara realize that even though
Future
was designed to work in embedded contexts, it may have too much
overhead for her use case.
Barbara decides to do some benchmarking. She comes up with a sample application
-- an app that blinks a led and responds to button presses -- and implements it
twice. One implementation does not use Future
at all, the other does. Both
implementations have two asynchronous interfaces: a timer interface and a GPIO
interface, as well as an application component that uses the interfaces
concurrently. In the Future
-based app, the application component functions
like a future combinator, as it is a Future
that is almost always waiting for
a timer or GPIO future to finish.
To drive the application future, Barbara implements an executor. The executor
functions like a background thread. Because alloc
is not available, this
executor contains a single future. The executor has a spawn
function that
accepts a future and starts running that future (overwriting the existing future
in the executor if one is already present). Once started, the executor runs
entirely in kernel callbacks.
Barbara identifies several factors that add branching and error handling code to the executor:
spawn
should be a safe function, because it is called by high-level application code. However, that means it can be called by the future it contains. If handled naively, this would result in dropping the future while it executes. Barbara adds runtime checks to identify this situation.Waker
isSync
, so on a multithreaded system, a future could give another thread access to itsWaker
and the other thread could wake it up. This could happen while thepoll
is executing, beforepoll
returnsPoll::Pending
. Therefore, Barbara concludes that ifwake
is called while a future is being polled then the future should be re-polled, even if the currentpoll
returnsPoll::Pending
. This requires putting a retry loop into the executor.- A kernel callback may call
Waker::wake
after its future returnsPoll::Ready
. Afterpoll
returnsPoll::Ready
, the executor should notpoll
the future again, so Barbara adds code to ignore those wakeups. This duplicates the "ignore spurious wakeups" functionality that exists in the future itself.
Ultimately, this made the executor
logic
nontrivial, and it compiled into 96 bytes of code. The executor logic is
monomorphized for each future, which allows the compiler to make inlining
optimizations, but results in a significant amount of duplicate code.
Alternatively, it could be adapted to use function pointers or vtables to avoid
the code duplication, but then the compiler definitely cannot inline
Future::poll
into the kernel callbacks.
Barbara publishes an analysis of the relative sizes of the two app implementations, finding a large percentage increase in both code size and RAM usage (note: stack usage was not investigated). Most of the code size increase is from the future combinator code.
In the no-Future
version of the app, a kernel callback causes the following:
- The kernel callback calls the application logic's event-handling function for the specific event type.
- The application handles the event.
The call in step 1 is inlined, so the compiled kernel callback consists only of the application's event-handling logic.
In the Future
-based version of the app, a kernel callback causes the
following:
- The kernel callback updates some global state to indicate the event happened.
- The kernel callback invokes
Waker::wake
. Waker::wake
callspoll
on the application future.- The application future has to look at the state saved in step 1 to determine what event happened.
- The application future handles the event.
LLVM is unable to devirtualize the call in step 2, so the optimizer is unable to simplify the above steps. Steps 1-4 only exist in the future-based version of the code, and add over 200 bytes of code (note: Barbara believes this could be reduced to between 100 and 200 bytes at the expense of execution speed).
Barbara concludes that Future
is not suitable for
highly-resource-constrained environments due to the amount of code and RAM
required to implement executors and combinators.
Barbara redesigns the library she is building to use a different
concept
for implementing async APIs in Rust that are much lighter weight. She has moved
on from Future
and is refining her async
traits
instead. Here are some ways in which these APIs are lighter weight than a
Future
implementation:
- After monomorphization, kernel callbacks directly call application code. This allows the application code to be inlined into the kernel callback.
- The callback invocation is more precise: these APIs don't make spurious wakeups, so application code does not need to handle spurious wakeups.
- The async traits lack an equivalent of
Waker
. Instead, all callbacks are expected to be'static
(i.e. they modify global state) and passing pointers around is replaced by static dispatch.
🤔 Frequently Asked Questions
What are the morals of the story?
core::future::Future
isn't suitable for every asynchronous API in Rust.Future
has a lot of capabilities, such as the ability to spawn dynamically-allocated futures, that are unnecessary in embedded systems. These capabilities have a cost, which is unavoidable without backwards-incompatible changes to the trait.- We should look at embedded Rust's relationship with
Future
so we don't fragment the embedded Rust ecosystem. Other embedded crates useFuture
--Future
certainly has a lot of advantages over lighter-weight alternatives, if you have the space to use it.
Why did you choose Barbara to tell this story?
- This story is about someone who is an experienced systems programmer and an experienced Rust developer. All the other characters have "new to Rust" or "new to programming" as a key characteristic.
How would this story have played out differently for the other characters?
- Alan would have found the
#![no_std]
crate ecosystem lacking async support. He would have moved forward with aFuture
-based implementation, unaware of its impact on code size and RAM usage. - Grace would have handled the issue similarly to Barbara, but may not
have tried as hard to use
Future
. Barbara has been paying attention to Rust long enough to know how significant theFuture
trait is in the Rust community and ecosystem. - Niklaus would really have struggled. If he asked for help, he probably would've gotten conflicting advice from the community.
Future
has a lot of features that Barbara's traits don't have -- aren't those worth the cost?
Future
has many additional features that are nice-to-have:Future
works smoothly in a multithreaded environment. Futures can beSend
and/orSync
, and do not need to have interior mutability, which avoids the need for internal locking.- Manipulating arbitrary Rust types without locking allows
async fn
to be efficient.
- Manipulating arbitrary Rust types without locking allows
- Futures can be spawned and dropped in a dynamic manner: an executor
that supports dynamic allocation can manage an arbitrary number of
futures at runtime, and futures may easily be dropped to stop their
execution.
- Dropping a future will also drop futures it owns, conveniently providing good cancellation semantics.
- A future that creates other futures (e.g. an
async fn
that calls otherasync fn
s) can be spawned with only a single memory allocation, whereas callback-based approaches need to allocate for each asynchronous component.
- Community and ecosystem support. This isn't a feature of
Future
per se, but the Rust language has special support forFuture
(async
/await
) and practically the entire async Rust ecosystem is based onFuture
. The ability to use existing async crates is a very strong reason to useFuture
over any alternative async abstraction.
- However, the code size impact of
Future
is a deal-breaker, and no number of nice-to-have features can outweigh a deal-breaker. Barbara's traits have every feature she needs. - Using
Future
saves developer time relative to building your own async abstractions. Developers can use the time they saved to minimize code size elsewhere in the project. In some cases, this may result in a net decrease in code size for the same total effort. However, code size reduction efforts have diminishing returns, so projects that expect to optimize code size regardless likely won't find the tradeoff beneficial.
Is the code size impact of Future
fundamental, or can the design be tweaked in a way that eliminates the tradeoff?
Future
isolates the code that determines a future should wake up (the code that callsWaker::wake
) from the code that executes the future (the executor). The only information transferred viaWaker::wake
is "try waking up now" -- any other information has to be stored somewhere. When polled, a future has to run logic to identify how it can make progress -- in many cases this requires answering "who woke me up?" -- and retrieve the stored information. Most completion-driven async APIs allow information about the event to be transferred directly to the code that handles the event. According to Barbara's analysis, the code required to determine what event happened was the majority of the size impact ofFuture
.
I thought Future
was a zero-cost abstraction?
- Aaron Turon described futures as zero-cost abstractions. In the linked post, he elaborated on what he meant by zero-cost abstraction, and eliminating their impact on code size was not part of that definition. Since then, the statement that future is a zero-cost abstraction has been repeated many times, mostly without the context that Aaron provided. Rust has many zero-cost abstractions, most of which do not impact code size (assuming optimization is enabled), so it is easy for developers to see "futures are zero-cost" and assume that makes them lighter-weight than they are.
How does Barbara's code handle thread-safety? Is her executor unsound?
- The library Barbara is writing only works in Tock OS' userspace environment. This environment is single-threaded: the runtime does not provide a way to spawn another thread, hardware interrupts do not execute in userspace, and there are no interrupt-style callbacks like Unix signals. All kernel callbacks are invoked synchronously, using a method that is functionally equivalent to a function call.