Perhaps my favorite feature in the Rust 2018 edition is procedural macros. Procedural macros have had a long and storied history in Rust (and will continue to have a storied future!), and now is perhaps one of the best times to get involved with them because the 2018 edition has so dramatically improved the experience both defining and using them.
Here I'd like to explore what procedural macros are, what they're capable of, notable new features, and some fun use cases of procedural macros. I might even convince you that this is Rust 2018's best feature as well!
What is a procedural macro?
First defined over two years ago in RFC 1566, procedural macros are, in layman's terms, a function that takes a piece of syntax at compile time and produces a new bit of syntax. Procedural macros in Rust 2018 come in one of three flavors:
-
#[derive]
mode macros have actually been stable since Rust 1.15 and bring all the goodness and ease of use of#[derive(Debug)]
to user-defined traits as well, such as Serde's#[derive(Deserialize)]
. -
Function-like macros are newly stable to the 2018 edition and allow defining macros like
env!("FOO")
orformat_args!("...")
in a crates.io-based library. You can think of these as sort of "macro_rules!
macros" on steroids. -
Attribute macros, my favorite, are also new in the 2018 edition and allow you to provide lightweight annotations on Rust functions which perform syntactical transformations over the code at compile time.
Each of these flavors of macros can be defined in a crate with proc-macro = true
specified in its manifest. When used, a procedural macro is
loaded by the Rust compiler and executed as the invocation is expanded. This
means that Cargo is in control of versioning for procedural macros and you can
use them with all same ease of use you'd expect from other Cargo dependencies!
Defining a procedural macro
Each of the three types of procedural macros are defined in a slightly different
fashion, and here we'll single out attribute macros. First, we'll flag
Cargo.toml
:
[lib]
proc-macro = true
and then in src/lib.rs
we can write our macro:
extern crate proc_macro;
use proc_macro::TokenStream;
#[proc_macro_attribute]
pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream {
// ...
}
We can then write some unit tests in tests/smoke.rs
:
#[my_crate::hello]
fn wrapped_function() {}
#[test]
fn works() {
wrapped_function();
}
... and that's it! When we execute cargo test
Cargo will compile our
procedural macro. Afterwards it will compile our unit test which loads the macro
at compile time, executing the hello
function and compiling the resulting
syntax.
Right off the bat we can see a few important properties of procedural macros:
- The input/output is this fancy
TokenStream
type we'll talk about more in a bit - We're executing arbitrary code at compile time, which means we can do just about anything!
- Procedural macros are incorporated with the module system, meaning they can be imported just like any other name.
Before we take a look at implementing a procedural macro, let's first dive into some of these points.
Macros and the module system
First stabilized in Rust 1.30 (noticing a trend with 1.15?) macros are now
integrated with the module system in Rust. This mainly means that you no longer
need the clunky #[macro_use]
attribute when importing macros! Instead of this:
#[macro_use]
extern crate log;
fn main() {
debug!("hello, ");
info!("world!");
}
you can do:
use log::info;
fn main() {
log::debug!("hello, ");
info!("world!");
}
Integration with the module system solves one of the most confusing parts about macros historically. They're now imported and namespaced just as you would any other item in Rust!
The benefits are not only limited to bang-style macro_rules
macros, as you can
now transform code that looks like this:
#[macro_use]
extern crate serde_derive;
#[derive(Deserialize)]
struct Foo {
// ...
}
into
use serde::Deserialize;
#[derive(Deserialize)]
struct Foo {
// ...
}
and you don't even need to explicitly depend on serde_derive
in Cargo.toml
!
All you need is:
[dependencies]
serde = { version = '1.0.82', features = ['derive'] }
TokenStream
?
What's inside a This mysterious TokenStream
type comes from the compiler-provided
proc_macro
crate. When it was first added all you could do with a
TokenStream
was call convert it to or from a string using to_string()
or parse()
.
As of Rust 2018, you can act on the tokens in a TokenStream
directly.
A TokenStream
is effectively "just" an iterator over TokenTree
. All
syntax in Rust falls into one of these four categories, the four variants of
TokenTree
:
Ident
is any identifier likefoo
orbar
. This also contains keywords such asself
andsuper
.Literal
include things like1
,"foo"
, and'b'
. All literals are one token and represent constant values in a program.Punct
represents some form of punctuation that's not a delimiter. For example.
is aPunct
token in the field access offoo.bar
. Multi-character punctuation like=>
is represented as twoPunct
tokens, one for=
and one for>
, and theSpacing
enum says that the=
is adjacent to the>
.Group
is where the term "tree" is most relevant, asGroup
represents a delimited sub-token-stream. For example(a, b)
is aGroup
with parentheses as delimiters, and the internal token stream isa, b
.
While this is conceptually simple, this may sound like there's not much we can
do with this! It's unclear, for example, how we might parse a function from a
TokenStream
. The minimality of TokenTree
is crucial, however, for
stabilization. It would be infeasible to stabilize the Rust AST because that
means we could never change it. (imagine if we couldn't have added the ?
operator!)
By using TokenStream
to communicate with procedural macros, the compiler is
able to add new language syntax while also being able to compile
and work with older procedural macros. Let's see now, though, how we can
actually get useful information out of a TokenStream
.
TokenStream
Parsing a If TokenStream
is just a simple iterator, then we've got a long way to go from
that to an actual parsed function. Although the code is already lexed for us
we still need to write a whole Rust parser! Thankfully though the community has
been hard at work to make sure writing procedural macros in Rust is as smooth as
can be, so you need look no further than the syn
crate.
With the syn
crate we can parse any Rust AST as a one-liner:
#[proc_macro_attribute]
pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream {
let input = syn::parse_macro_input!(item as syn::ItemFn);
let name = &input.ident;
let abi = &input.abi;
// ...
}
The syn
crate not only comes with the ability to parse built-in syntax
but you can also easily write a recursive descent parser for your own syntax.
The syn::parse
module has more information about this capability.
TokenStream
Producing a Not only do we take a TokenStream
as input with a procedural macro, but we
also need to produce a TokenStream
as output. This output is typically
required to be valid Rust syntax, but like the input it's just list of tokens
that we need to build somehow.
Technically the only way to create a TokenStream
is via its FromIterator
implementation, which means we'd have to create each token one-by-one and
collect it into a TokenStream
. This is quite tedious, though, so let's take a
look at syn
's sibling crate: quote
.
The quote
crate is a quasi-quoting implementation for Rust which primarily
provides a convenient macro for us to use:
use quote::quote;
#[proc_macro_attribute]
pub fn hello(attr: TokenStream, item: TokenStream) -> TokenStream {
let input = syn::parse_macro_input!(item as syn::ItemFn);
let name = &input.ident;
// Our input function is always equivalent to returning 42, right?
let result = quote! {
fn #name() -> u32 { 42 }
};
result.into()
}
The quote!
macro allows you to write mostly-Rust syntax and interpolate
variables quickly from the environment with #foo
. This removes much of the
tedium of creating a TokenStream
token-by-token and allows quickly cobbling
together various pieces of syntax into one return value.
Span
Tokens and Perhaps the greatest feature of procedural macros in Rust 2018 is the ability to
customize and use Span
information on each token, giving us the ability for
amazing syntactical error messages from procedural macros:
error: expected `fn`
--> src/main.rs:3:14
|
3 | my_annotate!(not_fn foo() {});
| ^^^^^^
as well as completely custom error messages:
error: imported methods must have at least one argument
--> invalid-imports.rs:12:5
|
12 | fn f1();
| ^^^^^^^^
A Span
can be thought of as a pointer back into an original source file,
typically saying something like "the Ident
token foo
came from file
bar.rs
, line 4, column 5, and was 3 bytes long". This information is
primarily used by the compiler's diagnostics with warnings and error messages.
In Rust 2018 each TokenTree
has a Span
associated with it. This means that
if you preserve the Span
of all input tokens into the output then even
though you're producing brand new syntax the compiler's error messages are still
accurate!
For example, a small macro like:
#[proc_macro]
pub fn make_pub(item: TokenStream) -> TokenStream {
let result = quote! {
pub #item
};
result.into()
}
when invoked as:
my_macro::make_pub! {
static X: u32 = "foo";
}
is invalid because we're returning a string from a function that should return a
u32
, and the compiler will helpfully diagnose the problem as:
error[E0308]: mismatched types
--> src/main.rs:1:37
|
1 | my_macro::make_pub!(static X: u32 = "foo");
| ^^^^^ expected u32, found reference
|
= note: expected type `u32`
found type `&'static str`
error: aborting due to previous error
And we can see here that although we're generating brand new syntax, the compiler can preserve span information to continue to provide targeted diagnostics about code that we've written.
Procedural Macros in the Wild
Ok up to this point we've got a pretty good idea about what procedural macros can do and the various capabilities they have in the 2018 edition. As such a long-awaited feature, the ecosystem is already making use of these new capabilities! If you're interested, some projects to keep your eyes on are:
-
syn
,quote
, andproc-macro2
are your go-to libraries for writing procedural macros. They make it easy to define custom parsers, parse existing syntax, create new syntax, work with older versions of Rust, and much more! -
Serde and its derive macros for
Serialize
andDeserialize
are likely the most used macros in the ecosystem. They sport an impressive amount of configuration and are a great example of how small annotations can be so powerful. -
The
wasm-bindgen
project uses attribute macros to easily define interfaces in Rust and import interfaces from JS. The#[wasm_bindgen]
lightweight annotation makes it easy to understand what's coming in and out, as well as removing lots of conversion boilerplate. -
The
gobject_gen!
macro is an experimental IDL for the GNOME project to define GObject objects safely in Rust, eschewing manually writing all the glue necessary to talk to C and interface with other GObject instances in Rust. -
The Rocket framework has recently switched over to procedural macros, and showcases some of nightly-only features of procedural macros like custom diagnostics, custom span creation, and more. Expect to see these features stabilize in 2019!
That's just a taste of the power of procedural macros and some example usage throughout the ecosystem today. We're only 6 weeks out from the original release of procedural macros on stable, so we've surely only scratched the surface as well! I'm really excited to see where we can take Rust with procedural macros by empowering all kinds of lightweight additions and extensions to the language!