The Rust Diagnostics working group is leading an effort to add support for internationalization of error messages in the compiler, allowing the compiler to produce output in languages other than English.
For example, consider the following diagnostic where a user has used a colon to specify a function's return type instead of an arrow:
error: return types are denoted using `->`
--> src/main.rs:1:21
|
1 | fn meaning_of_life(): u32 { 42 }
| ^ help: use `->` instead
We could output that diagnostic in Chinese:
错误: 返回类型使用`->`表示
--> src/main.rs:1:21
|
1 | fn meaning_of_life(): u32 { 42 }
| ^ 帮助: 使用`->`来代替
or even in Spanish:
error: el tipo de retorno se debe indicar mediante `->`
--> src/main.rs:1:21
|
1 | fn meaning_of_life(): u32 { 42 }
| ^ ayuda: utilice `->` en su lugar
Translated error messages will allow non-native speakers of English to use Rust in their preferred language.
What's the current status?
Implementation on diagnostic translation has started, but we're looking for help!
The core infrastructure for diagnostic translation has been implemented in
rustc
; this makes it possible for Rust to emit a diagnostic with translated
messages. However, every diagnostic in rustc
has to be ported to use this new
infrastructure, otherwise they can't be translated. That's a lot of work, so
the diagnostics working group has chosen to combine the translation effort with
a transition to "diagnostic structs" (more on that later) and get both done at
once.
Once most diagnostic messages have been ported to the new infrastructure, then the diagnostics working group will start creating a workflow for translation teams to translate all of the diagnostic messages to different languages.
Every pull request related to diagnostic translation is tagged with
A-translation
.
Getting involved
There's a lot of work to do on diagnostic translation, but the good news is that
lots of the work can be done in parallel, and it doesn't require background in
compiler development or familiarity with rustc
to contribute!
If you are interested in getting involved, take a look at #100717 to find
out where to get started! You can ask for help in
#t-compiler/wg-diagnostics
or reach out to @davidtwco
.
Note: This post isn't going to be updated as the working group iterates on and improves the workflow for diagnostic translation, so always consult the developer guide for the most recent documentation on diagnostic structs or diagnostic translation.
1. Setting up a local development environment
Before helping with the diagnostic translation effort, you'll need to get your
development environment set up, so follow the instructions on the rustc
dev
guide.
2. Getting ready to port your first diagnostic
Almost all diagnostics in rustc
are implemented using the traditional
DiagnosticBuilder
APIs, which look something like this:
self.struct_span_err(self.prev_token.span, "return types are denoted using `->`")
.span_suggestion_short(
self.prev_token.span,
"use `->` instead",
"->".to_string(),
Applicability::MachineApplicable,
)
.emit();
struct_span_err
creates a new diagnostic given two things - a Span
and a
message. struct_span_err
isn't the only diagnostic function that you'll
encounter in the compiler's source, but the others are all pretty similar. You
can read more about rustc
's diagnostic infrastructure in the rustc
dev
guide.
Span
s just identify some location in the user's source code and you can find
them used throughout the compiler for diagnostic reporting (for example, the
location main.rs:1:21
from the earlier example would have been
self.prev_token.span
).
In this example, the message is just a string literal (a &'static str
) which
needs to be replaced by an identifier for the same message in whichever
language was requested.
There are two ways that a diagnostic will be ported to the new infrastructure:
- If it's a simple diagnostic, without any logic to decide whether or not to add suggestions or notes or helps or labels, like in the example above, then...
- Otherwise...
In both cases, diagnostics are represented as types. Representing diagnostics using types is a goal of the diagnostic working group as it helps separate diagnostic logic from the main code paths.
Every diagnostic type should implement SessionDiagnostic
(either manually or
automatically). In the SessionDiagnostic
trait, there's a member function
which converts the trait into a Diagnostic
to be emitted.
Using a diagnostic derive...
Diagnostic derives (either SessionDiagnostic
for whole diagnostics,
SessionSubdiagnostic
for parts of diagnostics, or DecorateLint
for lints)
can be used to automatically implement a diagnostic trait.
To start, create a new type in the errors
module of the current crate (e.g.
rustc_typeck::errors
or rustc_borrowck::errors
) named after your
diagnostic. In our example, that might look like:
struct ReturnTypeArrow {
}
Next, we'll need to add fields with all the information we need - that's just a
Span
for us:
struct ReturnTypeArrow {
span: Span,
}
In most cases, this will just be the Span
s that are used by the original
diagnostic emission logic and values that are interpolated into diagnostic
messages.
After that, we should add the derive, add our error attribute and annotate the
primary Span
(that was given to struct_span_err
).
#[derive(SessionDiagnostic)]
#[error(parser_return_type_arrow)]
struct ReturnTypeArrow {
#[primary_span]
span: Span,
}
Each diagnostic should have a unique slug. By convention, these always start
with the crate that the error is related to (parser
in this example). This
slug will be used to find the actual diagnostic message in our translation
resources, which we'll see shortly.
Finally, we need to add any labels, notes, helps or suggestions:
#[derive(SessionDiagnostic)]
#[error(parser_return_type_arrow)]
struct ReturnTypeArrow {
#[primary_span]
#[suggestion(applicability = "machine-applicable", code = "->")]
span: Span,
}
In this example, there's just a single suggestion - to replace the :
with
a ->
.
Before we're finished, we have to add the diagnostic messages to the translation resources..
For more documentation on diagnostic derives, see the diagnostic structs
chapter of the rustc
dev guide.
SessionDiagnostic
...
Manually implementing Some diagnostics are too complicated to be generated from a diagnostic type
using the diagnostic derive. In these cases, SessionDiagnostic
can be
implemented manually.
Using the same type as in "Using a diagnostic
derive...", we can implement SessionDiagnostic
as below:
use rustc_errors::{fluent, SessionDiagnostic};
struct ReturnTypeArrow { span: Span }
impl SessionDiagnostic for ReturnTypeArrow {
fn into_diagnostic(self, sess: &'_ rustc_session::Session) -> DiagnosticBuilder<'_> {
sess.struct_span_err(
self.span,
fluent::parser_return_type_arrow,
)
.span_suggestion_short(
self.span,
fluent::suggestion,
"->".to_string(),
Applicability::MachineApplicable,
)
}
}
Instead of using strings for the messages as in the original diagnostic emission logic, typed identifiers referring to translation resources are used. Now we just have to add the diagnostic messages to the translation resources...
Examples
For more examples of diagnostics ported to use the diagnostic derive or written manually, see the following pull requests:
For more examples, see the pull requests labelled A-translation
.
Adding translation resources...
Every slug in a diagnostic derive or typed identifier in a manual implementation needs to correspond to a message in a translation resource.
rustc
's translations use Fluent, an asymmetric translation system.
For each crate in the compiler which emits diagnostics, there is a
corresponding Fluent resource at
compiler/rustc_error_messages/locales/en-US/$crate.ftl
.
Error messages need to be added to this resource (a macro will then generate the typed identifier corresponding to the message).
For our example, we should add the following Fluent to
compiler/rustc_error_messages/locales/en-US/parser.ftl
:
parser_return_type_arrow = return types are denoted using `->`
.suggestion = use `->` instead
parser_return_type_arrow
will generate a parser::return_type_arrow
type (in
rustc_errors::fluent
) that can be used with diagnostic structs and the
diagnostic builder.
Subdiagnostics are "attributes" of the primary Fluent message - by convention, the name of attributes are the type of subdiagnostic, such as "suggestion", but this can be changed when there are multiple of one kind of subdiagnostic.
Now that the Fluent resource contains the message, our diagnostic is ported!
More complex messages with interpolation will be able to reference other fields
in a diagnostic type (when implemented manually, those are provided as
arguments). See the diagnostic translation documentation in the rustc
dev
guide for more examples.
3. Porting diagnostics
Now that you've got a rough idea what to do, you need to find some diagnostics to port. There's lots of diagnostics to port, so the diagnostic working group have split the work up to avoid anyone working on the same diagnostic as someone else - but right now, there aren't many people involved, so just pick a crate and start porting it :)
Please add the A-translation
label to any pull requests that
you make so we can keep track of who has made a contribution! You can use
rustbot
to label your PR (if it wasn't labelled automatically by
triagebot
):
@rustbot label +A-translation
You can also assign a member of the diagnostics working group to review your PR by posting a comment with the following content (or including this in the PR description):
r? rust-lang/diagnostics
Even if you aren't sure exactly how to proceed, give it a go and you can ask
for help in #t-compiler/wg-diagnostics
or reach out to @davidtwco
.
Check out #100717 for guidance on where to get started!
FAQ
Is this a feature that anyone wants?
Yes! Some language communities prefer native resources and some don't (and preferences will vary within those communities too). For example, Chinese-speaking communities have a mature ecosystem of programming language resources which don't require knowing any English.
Wouldn't translating X be more worthwhile?
There are many different areas within the Rust project where internationalization would be beneficial. Diagnostics aren't being prioritized over any other part of the project, it's just that there is interest within the compiler team in supporting this feature.
Couldn't compiler developer time be better spent elsewhere?
Compiler implementation isn't zero-sum: work on other parts of the compiler aren't impacted by these efforts and working on diagnostic translation doesn't prevent contributors working on anything else.
Will translations be opt-in?
Translations will be opt-in, you won't need to use them if you don't want to.
How will a user select the language?
Exactly how a user will choose to use translated error messages hasn't been decided yet.