Rust developers have repeatedly raised concerned about an unaddressed privacy issue over the last few years.
Rust has rapidly gained momentum among developers, for its focus on performance, safety, safe concurrency, and for having a similar syntax to C++.
StackOverflow’s 2020 developer survey ranked Rust first among the “most loved programming languages.”
However, for the longest time developers have been bothered by their production builds leaking potentially sensitive debug information.
I’ll leave this one for you folks to figure out, but from a layman’s perspective, it looks like a really dumb thing to keep paths from the developer’s machine like this in compiled binaries? At least after countless years, the Rust developers seem committed to fixing it, finally.
I’ve had my eye on Rust for a while. I’m looking forward to its continued maturation, and this looks like an important step along the way.
The engineering-community suffers from the same problem.
For us the culprit is STEP:
If you look at the first lines of https://www.parker.com/cadfiles/687513/10C46-12-6.stp
“FILE_NAME(‘D:\\Hpde\\Work requests\\10C46\\10C46-XX-X\\Output\\10C46-12-6.stp’,’2012-09-26T19:22:05′,(‘AS103222′),(”),’Autodesk Inventor 2010′,’Autodesk Inventor 2010’,”);”
Or https://www.cejn.com/globalassets/productdrawings/148100404-ad.stp:
“FILE_NAME(
/* name */ ‘C:\\VaultExport\\148100404-AD.stp’,
/* time_stamp */ ‘2018-05-18T14:29:54+02:00’,
/* author */ (‘PeOd’),
/* organization */ (”),
/* preprocessor_version */ ‘ST-DEVELOPER v16.1’,
/* originating_system */ ‘Autodesk Inventor 2016’,
/* authorisation */ ”);”
I’ve been subscribed to the relevant bug for a while, so here’s the gist.
1. Compiled binaries in various different languages embed the paths of the source files they were generated from for various reasons. (eg. so a debugger can display the source corresponding to the code that’s running. So panic/die/assert/”clean crash” messages can report where the error occurred)
Go also does the same thing and I believe C and C++ also do it in certain circumstances.
2. Said paths often include your username because your source code is usually inside /home/username/ or C:\Users\Username
3. It’s possible to use relative paths or to mangle the paths, but just switching to that by default as a simple answer to the problem would break things.
This was implemented as the –remap-path-prefix switch after this problem was initially reported three or four years ago and there are claims that it doesn’t do enough but, last I checked, they were having trouble getting a clear answer and reproducer for the claimed problems.
4. The language team has *not* dismissed this and has triaged the remaining open bug as
P-medium
.5. One or more people are either ignorant of the concept that language teams have limited resources and need to prioritize or are trying to use high-school level rumor-mongering tactics to bully the Rust team into reconsidering the priority of this bug.
There’s nothing special about Rust here. The proper headline is “People trying to single Rust out for following common conventions for embedding paths to source files in binaries”. (Possibly because of Rust’s ability to draw people to systems programming who never did it before, given that this design decision is more a C/C++/Go thing than a Perl/Python/Ruby/etc. thing.)
The person trying to throw their weight around just hit the jackpot with the BleepingComputer writer.
ssokolow,
I’d say it’s ok for debug builds to contain such information, but not release builds. Release builds shouldn’t contain this information even if it does help identify responsible source files.
On a different note, this reminds me of something else one doesn’t expect to find in rust executables: Rustc used to statically link in ~2k worth of quotes from American literature writer H. P. Lovecraft.
https://github.com/rust-lang/rust/issues/13871
https://github.com/rust-lang/rust/commit/29ad8e15a2b7e2024941d74ea4ce261cb501ded9
It was added in 2013, putting it there was in bad taste IMHO. It’s one thing to add an easter egg to the rustc compiler itself, but another altogether to add it to everyone’s output executables. It got taken out of the dev branch in 2015, although continued to affect the rustc version in debian stable for some time and as a result I actually discovered it after 2015.
PS.
Did anybody else notice firefox replacing duckduckgo with google (in addition to some other settings) with a recent firefox update? If this is deliberate, comon mozilla, don’t do this sort of thing.
AFAIK that is how C and C++ work by default. Debug builds have full source paths, but release builds do not (though they can under various circumstances)
Yes, if you do not give the “-g” flag, the stack traces will not contain the nice source information.
But with dynamic linking, it is still possible to get some information from DLL / .so files which publish module and function names. For static linking you would only get instruction addresses.
sukru,
I think that’s generally understood to be the case that binaries include names but not paths.
ssokolow makes a valid point about the __FILE__ macro. I would argue that if a source file containers __FILE__ within it, then having path information in the compiled output is completely expected. But it should not be output by the language itself (unless it’s a debug build of course).
Alfman,
Yes, in general regular binaries do include paths. To be sure, I decided to give it a try.
But that can be avoided with stripped statically linked binaries
I have *no experience* with Rust (which I think I should fix in some free time). So cannot make the same test there.
sukru,
I did this myself and was not able to get the compiler to embed a path. I even tried compiling/linking across a complex directory structure and so far I haven’t been able to get the C++ compiler to output a path anywhere, only the file name. So can I assume you meant filename instead of path?
I guess we could criticize GCC for that, it doesn’t need to be there and it bloats the binary. __FILE__ outputs the actual path though…
Ok. Here’s a simple example using the assert that ssokolow mentions…
Here is the output of a non-debug build.
Incidentally, by default the rust version is 6.7M on my machine and the C version is 17k, which is kind of insane. Using the following commands I’m able to get the rust version down to 159k.
strip is a GNU utility that gets rid of lots of metadata, but rust’s compile time path remains in the exception, which is obviously outside the scope of “strip”.
Alfman,
You are right, it did not have the path, but only the filenames.
(No need for strip / static link)
Even LLVM is similar
I would take your example for Rust, which really seems to include the paths.
As far as I know, things like __FILE__ and full paths in debug information have been obstacles to reproducible builds, because they make each build environment slightly different, and the usage of them in common C/C++ toolkits has been reduced in recent years. I don’t remember people advocating for removing them on the basis of privacy though.
As I remember, it’s because the same precompiled “release optimizations and debugging symbols” build of the standard library gets statically linked in both debug and release builds.
There’s an open bug tracking efforts to incorporate portable support for symbol stripping into Rust itself and, as I remember, automatically stripping
std
in release builds is considered blocked on that.If you want more info on how to crunch down a Rust binary, this guide goes through various techniques.
(This blog post from 2016 includes stuff that’s now become the default, but actually shows a comparison between Rust, C, and C++ binary sizes at each step.)
sukru,
I’ve come across the same guide in trying to find a solution and as you can see I’ve used many of the same options. The base size has actually gotten worse since then, but at least once everything gets stripped it’s more reasonable and I can live with it.
Obviously if the compiler were able to analyze the code paths in detail, it could get rid of 99% of code in the final binary because it is unused. C’s code analysis isn’t any better, but it has a huge advantage, enabling it’s binaries to be much smaller. It has a standard ABI that allows libc to be used effectly system-wide, whereas rust libraries don’t have a stable ABI yet and consequently every rust program ends up needing it’s own copy with a far larger footprint. I hope this will improve though. One day rust could theoretically use deep code analysis to output minimalistic binaries out of the box.
Alfman,
(I think the previous answer was for someone else)
I checked the “hello.rs” build on my machines. And, yes it is huge. 6.5MB
The unstripped static C++ version (which links tons of stuff, including many templates) was only about 2MB in comparison.
And separating the linker as a standalone step was not an easy task. The best tutorial I found did not work:
https://medium.com/@squanderingtime/manually-linking-rust-binaries-to-support-out-of-tree-llvm-passes-8776b1d037a4
It needed some fixups:
And it failed with:
The first one means the standard libraries do not match. So the C version that rustc used vs llvm uses on the system are not the same (most probably).
The second one means, it is trying to link in the C startup code, which should not have happened.
And earlier with the wrong libc path it was complaining for over 1500 missing symbols which is an indication on how much stuff is needed for a minimal program.
At this point I am giving up. (Late hour here, and to be fair my first experience with Rust did not make me more interested in the language).
As I understand it, it’s because Rust gives you ready-made equivalents to using the
__FILE__
macro in anassert
message in C or C++… something which googling shows to have caused similar problems.…and Rust’s
assert
gets retained in release builds. (It’sdebug_assert
that gets removed, withassert
being involved in common patterns like “If bounds checks on indexed array access don’t get elided,assert
that the array is long enough before iterating over it”.)To be fair, those versions were prior to the v1.0 compatibility freeze.and followed in the tradition of what you get when you navigate to
about:mozilla
in Firefox.[q]PS.
Did anybody else notice firefox replacing duckduckgo with google (in addition to some other settings) with a recent firefox update? If this is deliberate, comon mozilla, don’t do this sort of thing[\q]
Seems like a deal renewal between Google and Mozilla to highlight the giant’s search engine…
More of a general commentary, but does anyone know why the full path is used so often in so many languages? Even when I did more C/C++ programming, you’d get these massive full paths in debug builds. I suppose it handles all cases in case you’re linking things in various projects located in odd parts of the file system. Or possible easier to setup with debugging tools?
I just wonder why the default is not based on some ‘base’ directory and then maybe if files are not in that base directory, then the full path is used.
As someone who does path stuff often, I can say that it’s just a lot easier if you use the OS API to get an absolute path without having to plumb a base directory path down through your call stack to use in a “make relative to” function.
Take that and things like “is X a subdirectory of Y?” checks and it’s just a lot more to think about than canonicalizing all the paths and calling it a day.
(There’s still an open bug (one of many) about build failures when run off a Windows RAM drive because the Win32 API’s canonicalization function breaks on RAM drives and they haven’t got a “make absolute without hitting the OS APIs” function in
std
yet, like Python’sos.path.abspath
.)This was a stupid rookie mistake. Honestly too tired to explain why. It’s the same with idiots unnecessarily hardcoding variables.