使用 valgrind 检查内存泄露

除了上一节介绍的各种 sanitizer 工具外, valgrind 也常用于检查内存泄露等问题. 但与 sanitizer 相比, valgrind 的执行速度可能要慢三十倍, 因为它们的工作方式不同.

sanitizer 工具是在 LLVM 编译器生成代码时, 插入了内存检查相关的代码, 并且带上了 相应的运行时, 这些代码都被直接编译成了汇编代码. 这个工作依赖于从源代码来编译.

而 valgrind 本质上就是一个虚拟机, 它会解析可执行文件 (ELF格式), 然后读取里面的 每一条汇编指令, 将它反汇编成 VEX IR 中间代码, 然后插入一些 valgrind 的运行时代码, 即所谓的插桩过程, 再将 VEX IR 编译成汇编代码. 这个过程是程序在运行过程时实时进行的, 所以可以想象它会有多慢. 但是好处在于 valgrind 不依赖于程序的源代码, 只要程序可以 运行, 就可以用 valgrind 来检测它的问题.

检测内存泄露

比如, 下面的示例代码中有两个泄露位点:

use std::mem;

fn main() {
    let msg = String::from("Hello, Rust");
    assert_eq!(msg.chars().count(), 11);
    // 创建 ManuallyDrop, 阻止 String::drop() 方法被调用.
    mem::forget(msg);

    let numbers = vec![1, 2, 3, 5, 8, 13];
    let slice = numbers.into_boxed_slice();
    // 转换成原始指针, 不会再调用 Vec<i32>::drop() 方法
    let _ptr: *mut [i32] = Box::leak(slice);
}

先将它编译成 debug 程序, cargo build --bin san-memory-leak, 这样的话可执行文件包含 DWARF 格式的调试信息, 更方便进行错误追踪.

先对它进行检查一般的内存错误:

valgrind ./san-memory-leak

此时 valgrind 提示有内存泄露, 可以加上 --leak-check=full 来跟踪具体的泄露位置:

valgrind --leak-check=full ./san-memory-leak

运行的完整日志如下:

==25294== Memcheck, a memory error detector
==25294== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==25294== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==25294== Command: ./san-memory-leak
==25294== Parent PID: 24939
==25294== 
==25294== 
==25294== HEAP SUMMARY:
==25294==     in use at exit: 35 bytes in 2 blocks
==25294==   total heap usage: 11 allocs, 9 frees, 2,195 bytes allocated
==25294== 
==25294== 11 bytes in 1 blocks are definitely lost in loss record 1 of 2
==25294==    at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25294==    by 0x11D6AA: alloc::alloc::alloc (alloc.rs:100)
==25294==    by 0x11D7B6: alloc::alloc::Global::alloc_impl (alloc.rs:183)
==25294==    by 0x11DF78: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:243)
==25294==    by 0x11EEE0: alloc::raw_vec::RawVec<T,A>::try_allocate_in (raw_vec.rs:230)
==25294==    by 0x11CCED: with_capacity_in<u8, alloc::alloc::Global> (raw_vec.rs:158)
==25294==    by 0x11CCED: with_capacity_in<u8, alloc::alloc::Global> (mod.rs:699)
==25294==    by 0x11CCED: <T as alloc::slice::hack::ConvertVec>::to_vec (slice.rs:162)
==25294==    by 0x11D48B: to_vec<u8, alloc::alloc::Global> (slice.rs:111)
==25294==    by 0x11D48B: to_vec_in<u8, alloc::alloc::Global> (slice.rs:441)
==25294==    by 0x11D48B: to_vec<u8> (slice.rs:416)
==25294==    by 0x11D48B: to_owned<u8> (slice.rs:823)
==25294==    by 0x11D48B: to_owned (str.rs:211)
==25294==    by 0x11D48B: <alloc::string::String as core::convert::From<&str>>::from (string.rs:2711)
==25294==    by 0x11E011: san_memory_leak::main (san-memory-leak.rs:8)
==25294==    by 0x11E89A: core::ops::function::FnOnce::call_once (function.rs:250)
==25294==    by 0x11CF4D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==25294==    by 0x11E2F0: std::rt::lang_start::{{closure}} (rt.rs:159)
==25294==    by 0x135BFF: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==25294==    by 0x135BFF: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==25294==    by 0x135BFF: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==25294==    by 0x135BFF: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==25294==    by 0x135BFF: {closure#2} (rt.rs:141)
==25294==    by 0x135BFF: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==25294==    by 0x135BFF: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==25294==    by 0x135BFF: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==25294==    by 0x135BFF: std::rt::lang_start_internal (rt.rs:141)
==25294== 
==25294== 24 bytes in 1 blocks are definitely lost in loss record 2 of 2
==25294==    at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25294==    by 0x11D6AA: alloc::alloc::alloc (alloc.rs:100)
==25294==    by 0x11D7B6: alloc::alloc::Global::alloc_impl (alloc.rs:183)
==25294==    by 0x11D5E7: allocate (alloc.rs:243)
==25294==    by 0x11D5E7: alloc::alloc::exchange_malloc (alloc.rs:332)
==25294==    by 0x11E15D: san_memory_leak::main (san-memory-leak.rs:13)
==25294==    by 0x11E89A: core::ops::function::FnOnce::call_once (function.rs:250)
==25294==    by 0x11CF4D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==25294==    by 0x11E2F0: std::rt::lang_start::{{closure}} (rt.rs:159)
==25294==    by 0x135BFF: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==25294==    by 0x135BFF: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==25294==    by 0x135BFF: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==25294==    by 0x135BFF: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==25294==    by 0x135BFF: {closure#2} (rt.rs:141)
==25294==    by 0x135BFF: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==25294==    by 0x135BFF: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==25294==    by 0x135BFF: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==25294==    by 0x135BFF: std::rt::lang_start_internal (rt.rs:141)
==25294==    by 0x11E2C9: std::rt::lang_start (rt.rs:158)
==25294==    by 0x11E28D: main (in /tmp/intro-to-rust/target/debug/san-memory-leak)
==25294== 
==25294== LEAK SUMMARY:
==25294==    definitely lost: 35 bytes in 2 blocks
==25294==    indirectly lost: 0 bytes in 0 blocks
==25294==      possibly lost: 0 bytes in 0 blocks
==25294==    still reachable: 0 bytes in 0 blocks
==25294==         suppressed: 0 bytes in 0 blocks
==25294== 
==25294== For lists of detected and suppressed errors, rerun with: -s
==25294== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

里面有两个关键信息:

  • ==25294== by 0x11E011: san_memory_leak::main (san-memory-leak.rs:8)
  • ==25294== by 0x11E15D: san_memory_leak::main (san-memory-leak.rs:13)

这里明确指示了被泄露的内存是在哪个地方分配的, 基于这些信息便可以轻松定位到问题.

检测内存越界

下面的代码示例中有三处内存越界发生:

use std::ptr;

fn main() {
    // numbers 在堆内存上分配的空间只有 3 个字节.
    let mut numbers: Vec<u8> = vec![0, 1, 2];

    // 越界写入
    unsafe {
        let numbers_ptr = numbers.as_mut_ptr();
        // 向 numbers 的堆内存连续写入 4 个字节, 最后一个字节是越界的.
        ptr::write_bytes(numbers_ptr, 0xf1, 4);
    }

    // 越界读取
    let _off_last_byte: u8 = unsafe {
        // 从 numbers 的堆内存读取第 4 个字节
        *numbers.as_ptr().offset(4)
    };

    let mut numbers2: [i32; 3] = [0, 1, 2];
    unsafe {
        let numbers2_ptr = ptr::addr_of_mut!(numbers2);
        // 栈内存越界写入
        ptr::write_bytes(numbers2_ptr, 0x1f, 2);
    }
    assert_eq!(numbers2[0], 0x1f1f1f1f);
}

使用 valgrind ./san-out-of-bounds 来检测, 得到如下的报告:

==41511== Memcheck, a memory error detector
==41511== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==41511== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==41511== Command: ./san-out-of-bounds
==41511== Parent PID: 24939
==41511== 
==41511== Invalid write of size 1
==41511==    at 0x484AD2E: memset (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==41511==    by 0x11CB89: write_bytes<u8> (intrinsics.rs:3153)
==41511==    by 0x11CB89: san_out_of_bounds::main (san-out-of-bounds.rs:15)
==41511==    by 0x11C46A: core::ops::function::FnOnce::call_once (function.rs:250)
==41511==    by 0x11C5DD: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==41511==    by 0x11D030: std::rt::lang_start::{{closure}} (rt.rs:159)
==41511==    by 0x13374F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==41511==    by 0x13374F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==41511==    by 0x13374F: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==41511==    by 0x13374F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==41511==    by 0x13374F: {closure#2} (rt.rs:141)
==41511==    by 0x13374F: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==41511==    by 0x13374F: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==41511==    by 0x13374F: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==41511==    by 0x13374F: std::rt::lang_start_internal (rt.rs:141)
==41511==    by 0x11D009: std::rt::lang_start (rt.rs:158)
==41511==    by 0x11CCFD: main (in /home/shaohua/dev/rust/intro-to-rust/target/debug/san-out-of-bounds)
==41511==  Address 0x4aa4b13 is 0 bytes after a block of size 3 alloc'd
==41511==    at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==41511==    by 0x11C73A: alloc::alloc::alloc (alloc.rs:100)
==41511==    by 0x11C846: alloc::alloc::Global::alloc_impl (alloc.rs:183)
==41511==    by 0x11C677: allocate (alloc.rs:243)
==41511==    by 0x11C677: alloc::alloc::exchange_malloc (alloc.rs:332)
==41511==    by 0x11CAEA: san_out_of_bounds::main (san-out-of-bounds.rs:9)
==41511==    by 0x11C46A: core::ops::function::FnOnce::call_once (function.rs:250)
==41511==    by 0x11C5DD: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==41511==    by 0x11D030: std::rt::lang_start::{{closure}} (rt.rs:159)
==41511==    by 0x13374F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==41511==    by 0x13374F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==41511==    by 0x13374F: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==41511==    by 0x13374F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==41511==    by 0x13374F: {closure#2} (rt.rs:141)
==41511==    by 0x13374F: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==41511==    by 0x13374F: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==41511==    by 0x13374F: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==41511==    by 0x13374F: std::rt::lang_start_internal (rt.rs:141)
==41511==    by 0x11D009: std::rt::lang_start (rt.rs:158)
==41511==    by 0x11CCFD: main (in /home/shaohua/dev/rust/intro-to-rust/target/debug/san-out-of-bounds)
==41511== 
==41511== Invalid read of size 1
==41511==    at 0x11CBC2: san_out_of_bounds::main (san-out-of-bounds.rs:21)
==41511==    by 0x11C46A: core::ops::function::FnOnce::call_once (function.rs:250)
==41511==    by 0x11C5DD: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==41511==    by 0x11D030: std::rt::lang_start::{{closure}} (rt.rs:159)
==41511==    by 0x13374F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==41511==    by 0x13374F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==41511==    by 0x13374F: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==41511==    by 0x13374F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==41511==    by 0x13374F: {closure#2} (rt.rs:141)
==41511==    by 0x13374F: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==41511==    by 0x13374F: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==41511==    by 0x13374F: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==41511==    by 0x13374F: std::rt::lang_start_internal (rt.rs:141)
==41511==    by 0x11D009: std::rt::lang_start (rt.rs:158)
==41511==    by 0x11CCFD: main (in /home/shaohua/dev/rust/intro-to-rust/target/debug/san-out-of-bounds)
==41511==  Address 0x4aa4b14 is 1 bytes after a block of size 3 alloc'd
==41511==    at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==41511==    by 0x11C73A: alloc::alloc::alloc (alloc.rs:100)
==41511==    by 0x11C846: alloc::alloc::Global::alloc_impl (alloc.rs:183)
==41511==    by 0x11C677: allocate (alloc.rs:243)
==41511==    by 0x11C677: alloc::alloc::exchange_malloc (alloc.rs:332)
==41511==    by 0x11CAEA: san_out_of_bounds::main (san-out-of-bounds.rs:9)
==41511==    by 0x11C46A: core::ops::function::FnOnce::call_once (function.rs:250)
==41511==    by 0x11C5DD: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==41511==    by 0x11D030: std::rt::lang_start::{{closure}} (rt.rs:159)
==41511==    by 0x13374F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==41511==    by 0x13374F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==41511==    by 0x13374F: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==41511==    by 0x13374F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==41511==    by 0x13374F: {closure#2} (rt.rs:141)
==41511==    by 0x13374F: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==41511==    by 0x13374F: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==41511==    by 0x13374F: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==41511==    by 0x13374F: std::rt::lang_start_internal (rt.rs:141)
==41511==    by 0x11D009: std::rt::lang_start (rt.rs:158)
==41511==    by 0x11CCFD: main (in /tmp/san-out-of-bounds)
==41511== 
==41511== 
==41511== HEAP SUMMARY:
==41511==     in use at exit: 0 bytes in 0 blocks
==41511==   total heap usage: 10 allocs, 10 frees, 2,163 bytes allocated
==41511== 
==41511== All heap blocks were freed -- no leaks are possible
==41511== 
==41511== For lists of detected and suppressed errors, rerun with: -s
==41511== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

可以看到, valgrind 只检测出了堆内存读写相关的两处错误, 但并没能发现栈内存写入越界问题:

  • ==41511== by 0x11CB89: san_out_of_bounds::main (san-out-of-bounds.rs:15)
  • ==41511== at 0x11CBC2: san_out_of_bounds::main (san-out-of-bounds.rs:21)

访问已被释放的内存 use after free

以下的代码示例中, 错误地访问了已经被释放的堆内存:

use std::ptr;

fn main() {
    let mut msg = String::from("Hello, Rust");
    let msg_ptr = msg.as_mut_ptr();
    // 释放 msg 的堆内存
    drop(msg);
    unsafe {
        // 将 msg 中的字符 `R` 转为小写
        ptr::write_bytes(msg_ptr.offset(8), b'r', 1);
    }
}

现在使用 valgrind 来检测, valgrind ./san-use-after-free, 输出了以下日志:

==48059== Memcheck, a memory error detector
==48059== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==48059== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==48059== Command: ./san-use-after-free
==48059== Parent PID: 24939
==48059== 
==48059== Invalid write of size 1
==48059==    at 0x484AD19: memset (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==48059==    by 0x11CFCB: write_bytes<u8> (intrinsics.rs:3153)
==48059==    by 0x11CFCB: san_use_after_free::main (san-use-after-free.rs:13)
==48059==    by 0x11D08A: core::ops::function::FnOnce::call_once (function.rs:250)
==48059==    by 0x11DA4D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==48059==    by 0x11CB60: std::rt::lang_start::{{closure}} (rt.rs:159)
==48059==    by 0x1340BF: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==48059==    by 0x1340BF: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==48059==    by 0x1340BF: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==48059==    by 0x1340BF: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==48059==    by 0x1340BF: {closure#2} (rt.rs:141)
==48059==    by 0x1340BF: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==48059==    by 0x1340BF: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==48059==    by 0x1340BF: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==48059==    by 0x1340BF: std::rt::lang_start_internal (rt.rs:141)
==48059==    by 0x11CB39: std::rt::lang_start (rt.rs:158)
==48059==    by 0x11D01D: main (in /tmp/san-use-after-free)
==48059==  Address 0x4aa4b18 is 8 bytes inside a block of size 11 free'd
==48059==    at 0x48431EF: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==48059==    by 0x11CA6F: dealloc (alloc.rs:119)
==48059==    by 0x11CA6F: <alloc::alloc::Global as core::alloc::Allocator>::deallocate (alloc.rs:256)
==48059==    by 0x11D1AD: <alloc::raw_vec::RawVec<T,A> as core::ops::drop::Drop>::drop (raw_vec.rs:583)
==48059==    by 0x11D109: core::ptr::drop_in_place<alloc::raw_vec::RawVec<u8>> (mod.rs:514)
==48059==    by 0x11D0DA: core::ptr::drop_in_place<alloc::vec::Vec<u8>> (mod.rs:514)
==48059==    by 0x11D099: core::ptr::drop_in_place<alloc::string::String> (mod.rs:514)
==48059==    by 0x11CEB5: core::mem::drop (mod.rs:938)
==48059==    by 0x11CF68: san_use_after_free::main (san-use-after-free.rs:10)
==48059==    by 0x11D08A: core::ops::function::FnOnce::call_once (function.rs:250)
==48059==    by 0x11DA4D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==48059==    by 0x11CB60: std::rt::lang_start::{{closure}} (rt.rs:159)
==48059==    by 0x1340BF: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==48059==    by 0x1340BF: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==48059==    by 0x1340BF: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==48059==    by 0x1340BF: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==48059==    by 0x1340BF: {closure#2} (rt.rs:141)
==48059==    by 0x1340BF: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==48059==    by 0x1340BF: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==48059==    by 0x1340BF: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==48059==    by 0x1340BF: std::rt::lang_start_internal (rt.rs:141)
==48059==  Block was alloc'd at
==48059==    at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==48059==    by 0x11C73A: alloc::alloc::alloc (alloc.rs:100)
==48059==    by 0x11C846: alloc::alloc::Global::alloc_impl (alloc.rs:183)
==48059==    by 0x11CAC8: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:243)
==48059==    by 0x11D6C0: alloc::raw_vec::RawVec<T,A>::try_allocate_in (raw_vec.rs:230)
==48059==    by 0x11D91D: with_capacity_in<u8, alloc::alloc::Global> (raw_vec.rs:158)
==48059==    by 0x11D91D: with_capacity_in<u8, alloc::alloc::Global> (mod.rs:699)
==48059==    by 0x11D91D: <T as alloc::slice::hack::ConvertVec>::to_vec (slice.rs:162)
==48059==    by 0x11CD5B: to_vec<u8, alloc::alloc::Global> (slice.rs:111)
==48059==    by 0x11CD5B: to_vec_in<u8, alloc::alloc::Global> (slice.rs:441)
==48059==    by 0x11CD5B: to_vec<u8> (slice.rs:416)
==48059==    by 0x11CD5B: to_owned<u8> (slice.rs:823)
==48059==    by 0x11CD5B: to_owned (str.rs:211)
==48059==    by 0x11CD5B: <alloc::string::String as core::convert::From<&str>>::from (string.rs:2711)
==48059==    by 0x11CEEB: san_use_after_free::main (san-use-after-free.rs:8)
==48059==    by 0x11D08A: core::ops::function::FnOnce::call_once (function.rs:250)
==48059==    by 0x11DA4D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==48059==    by 0x11CB60: std::rt::lang_start::{{closure}} (rt.rs:159)
==48059==    by 0x1340BF: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==48059==    by 0x1340BF: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==48059==    by 0x1340BF: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==48059==    by 0x1340BF: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==48059==    by 0x1340BF: {closure#2} (rt.rs:141)
==48059==    by 0x1340BF: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==48059==    by 0x1340BF: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==48059==    by 0x1340BF: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==48059==    by 0x1340BF: std::rt::lang_start_internal (rt.rs:141)
==48059== 
==48059== 
==48059== HEAP SUMMARY:
==48059==     in use at exit: 0 bytes in 0 blocks
==48059==   total heap usage: 10 allocs, 10 frees, 2,171 bytes allocated
==48059== 
==48059== All heap blocks were freed -- no leaks are possible
==48059== 
==48059== For lists of detected and suppressed errors, rerun with: -s
==48059== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

可以看出, valgrind 确定发现了 use-after-free 的错误, 而且给出了精准定位:

  • ==48059== by 0x11CFCB: san_use_after_free::main (san-use-after-free.rs:13)

访问未初始化的内存 uninit

以下的代码片段包含了未初始化内存的错误:

use std::mem;

fn main() {
    let x_uninit = mem::MaybeUninit::<i32>::uninit();
    let x = unsafe {
        x_uninit.assume_init()
    };
    if x == 2 {
        println!("x is 2");
    }
}

使用 valgrind 来检测, valgrind ./san-memory-uninit, 得到了以下日志:

==57348== Memcheck, a memory error detector
==57348== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==57348== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==57348== Command: ./san-memory-uninit
==57348== Parent PID: 24939
==57348== 
==57348== Conditional jump or move depends on uninitialised value(s)
==57348==    at 0x11C50B: san_memory_uninit::main (san-memory-uninit.rs:12)
==57348==    by 0x11C65A: core::ops::function::FnOnce::call_once (function.rs:250)
==57348==    by 0x11C4DD: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==57348==    by 0x11C6D0: std::rt::lang_start::{{closure}} (rt.rs:159)
==57348==    by 0x132D1F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==57348==    by 0x132D1F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==57348==    by 0x132D1F: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==57348==    by 0x132D1F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==57348==    by 0x132D1F: {closure#2} (rt.rs:141)
==57348==    by 0x132D1F: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==57348==    by 0x132D1F: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==57348==    by 0x132D1F: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==57348==    by 0x132D1F: std::rt::lang_start_internal (rt.rs:141)
==57348==    by 0x11C6A9: std::rt::lang_start (rt.rs:158)
==57348==    by 0x11C54D: main (in /tmp/san-memory-uninit)
==57348== 
==57348== 
==57348== HEAP SUMMARY:
==57348==     in use at exit: 0 bytes in 0 blocks
==57348==   total heap usage: 10 allocs, 10 frees, 3,184 bytes allocated
==57348== 
==57348== All heap blocks were freed -- no leaks are possible
==57348== 
==57348== Use --track-origins=yes to see where uninitialised values come from
==57348== For lists of detected and suppressed errors, rerun with: -s
==57348== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

日志里显示了完整的错误:

  • ==57348== Conditional jump or move depends on uninitialised value(s)
  • ==57348== at 0x11C50B: san_memory_uninit::main (san-memory-uninit.rs:12)

检测循环引用 Cyclic references

循环引用的问题常出现在 Rc/Arc 等以引用计数的方式来管理对象的地方. 以下一个示例展示了二叉树中的循环引用问题:

use std::cell::RefCell;
use std::rc::Rc;

#[derive(Default)]
struct TreeNode {
    left: Option<Rc<RefCell<TreeNode>>>,
    right: Option<Rc<RefCell<TreeNode>>>,
    val: i32,
}

impl TreeNode {
    #[must_use]
    #[inline]
    pub const fn is_leaf(&self) -> bool {
        self.left.is_none() && self.right.is_none()
    }
}

impl Drop for TreeNode {
    fn drop(&mut self) {
        println!("Will drop node with value: {}", self.val);
    }
}

fn main() {
    let leaf_node = Rc::new(RefCell::new(TreeNode::default()));
    assert!(leaf_node.borrow().is_leaf());
    
    let node1 = Rc::new(RefCell::new(TreeNode {
        left: None,
        right: Some(leaf_node.clone()),
        val: 42,
    }));
    let node2 = Rc::new(RefCell::new(TreeNode {
        left: Some(leaf_node.clone()),
        right: Some(node1.clone()),
        val: 12,
    }));
    // 制造一个循环引用
    node1.borrow_mut().left = Some(node2.clone());

    // 程序运行结束后, node1 和 node2 都不会被正确的释放
}

循环引用会导致节点上的对象不能被正常的释放, 内存不会回收并出现内存泄露的问题.

使用 valgrind 来检测, valgrind --check-leak=full ./san-cyclic-references, 得到了以下日志:

==165066== Memcheck, a memory error detector
==165066== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==165066== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==165066== Command: ./san-cyclic-references
==165066== Parent PID: 24939
==165066== 
==165066== 
==165066== HEAP SUMMARY:
==165066==     in use at exit: 144 bytes in 3 blocks
==165066==   total heap usage: 12 allocs, 9 frees, 2,304 bytes allocated
==165066== 
==165066== 144 (48 direct, 96 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3
==165066==    at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==165066==    by 0x11E32A: alloc::alloc::alloc (alloc.rs:100)
==165066==    by 0x11E436: alloc::alloc::Global::alloc_impl (alloc.rs:183)
==165066==    by 0x11E267: allocate (alloc.rs:243)
==165066==    by 0x11E267: alloc::alloc::exchange_malloc (alloc.rs:332)
==165066==    by 0x11D56C: new<alloc::rc::RcBox<core::cell::RefCell<san_cyclic_references::TreeNode>>> (boxed.rs:218)
==165066==    by 0x11D56C: alloc::rc::Rc<T>::new (rc.rs:398)
==165066==    by 0x11DA0A: san_cyclic_references::main (san-cyclic-references.rs:33)
==165066==    by 0x11CF3A: core::ops::function::FnOnce::call_once (function.rs:250)
==165066==    by 0x11E67D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==165066==    by 0x11E6F0: std::rt::lang_start::{{closure}} (rt.rs:159)
==165066==    by 0x134E5F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==165066==    by 0x134E5F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==165066==    by 0x134E5F: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==165066==    by 0x134E5F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==165066==    by 0x134E5F: {closure#2} (rt.rs:141)
==165066==    by 0x134E5F: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==165066==    by 0x134E5F: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==165066==    by 0x134E5F: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==165066==    by 0x134E5F: std::rt::lang_start_internal (rt.rs:141)
==165066==    by 0x11E6C9: std::rt::lang_start (rt.rs:158)
==165066==    by 0x11DDBD: main (in /tmp/san-cyclic-references)
==165066== 
==165066== LEAK SUMMARY:
==165066==    definitely lost: 48 bytes in 1 blocks
==165066==    indirectly lost: 96 bytes in 2 blocks
==165066==      possibly lost: 0 bytes in 0 blocks
==165066==    still reachable: 0 bytes in 0 blocks
==165066==         suppressed: 0 bytes in 0 blocks
==165066== 
==165066== For lists of detected and suppressed errors, rerun with: -s
==165066== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

可以看到 valgrind 确实检测到了堆内存泄露的问题:

  • ==165066== by 0x11DA0A: san_cyclic_references::main (san-cyclic-references.rs:33)

只是另一个泄露点 (san-cyclic-references.rs:38) 并没有被定位到.

检测数据竞态 Data race

多个线程访问同一块内存时, 应该使用互斥锁等手段, 确保不会发生 data race condition.

另外, 如果使用了线程本地存储 (Thread local storage) 的话, 它在每个线程中被单独保存了一份, 各线程只会访问内部的那一份克隆, 所以不存在 data race.

看下面的例子:

use std::cell::Cell;
use std::thread;

// 初始化为 1.
thread_local!(static TLS_COUNTER: Cell<i32> = const { Cell::new(1) });

// 全局变量, 该变量位于 data segment.
static mut SHARED_COUNTER: i32 = 1;

fn main() {
    // 设置主线程的 TLS_COUNTER 实例的值为 2.
    TLS_COUNTER.set(2);

    let t1 = thread::spawn(move || {
        // 线程启动时, TLS_COUNTER 的值是 1.
        assert_eq!(TLS_COUNTER.get(), 1);
        // 修改线程内部的 TLS_COUNTER 实例.
        TLS_COUNTER.set(3);
    });
    TLS_COUNTER.set(4);
    t1.join().unwrap();
    // 读取主线程中的 TLS_COUNTER 实例.
    assert_eq!(TLS_COUNTER.get(), 4);

    // 没有任何保护手段的情况下, 直接访问全局变量.
    unsafe { SHARED_COUNTER = 2; }
    let t2 = thread::spawn(|| {
        unsafe {
            // 可能发生 data race
            SHARED_COUNTER = 3;
        }
    });
    // 可能发生 data race
    unsafe { SHARED_COUNTER = 4; }
    t2.join().unwrap();

    // 无法确定 SHARED_COUNTER 的值
    unsafe { assert!(SHARED_COUNTER == 3 || SHARED_COUNTER == 4); }

    let _x = 11;
}

使用 valgrind 的 Helgrind 来检测线程相关的问题, valgrind --tool=helgrind ./san-data-race, 得到了以下日志:

==174647== Helgrind, a thread error detector
==174647== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==174647== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==174647== Command: ./san-data-race
==174647== Parent PID: 24939
==174647== 
==174647== ---Thread-Announcement------------------------------------------
==174647== 
==174647== Thread #1 is the program's root thread
==174647== 
==174647== ---Thread-Announcement------------------------------------------
==174647== 
==174647== Thread #3 was created
==174647==    at 0x49D086F: clone (clone.S:76)
==174647==    by 0x49D09C0: __clone_internal_fallback (clone-internal.c:71)
==174647==    by 0x49D09C0: __clone_internal (clone-internal.c:117)
==174647==    by 0x494E9EF: create_thread (pthread_create.c:297)
==174647==    by 0x494F49D: pthread_create@@GLIBC_2.34 (pthread_create.c:833)
==174647==    by 0x484BDD5: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==174647==    by 0x146971: std::sys::pal::unix::thread::Thread::new (thread.rs:87)
==174647==    by 0x126FCC: std::thread::Builder::spawn_unchecked_ (mod.rs:580)
==174647==    by 0x12657E: std::thread::Builder::spawn_unchecked (mod.rs:456)
==174647==    by 0x1264A1: spawn<san_data_race::main::{closure_env#1}, ()> (mod.rs:388)
==174647==    by 0x1264A1: std::thread::spawn (mod.rs:697)
==174647==    by 0x124D32: san_data_race::main (san-data-race.rs:27)
==174647==    by 0x1234FA: core::ops::function::FnOnce::call_once (function.rs:250)
==174647==    by 0x12033D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==174647== 
==174647== ----------------------------------------------------------------
==174647== 
==174647== Possible data race during write of size 4 at 0x16ADD0 by thread #1
==174647== Locks held: none
==174647==    at 0x124D33: san_data_race::main (san-data-race.rs:34)
==174647==    by 0x1234FA: core::ops::function::FnOnce::call_once (function.rs:250)
==174647==    by 0x12033D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==174647==    by 0x1226C0: std::rt::lang_start::{{closure}} (rt.rs:159)
==174647==    by 0x13F0AF: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284)
==174647==    by 0x13F0AF: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:559)
==174647==    by 0x13F0AF: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:523)
==174647==    by 0x13F0AF: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:149)
==174647==    by 0x13F0AF: {closure#2} (rt.rs:141)
==174647==    by 0x13F0AF: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:559)
==174647==    by 0x13F0AF: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:523)
==174647==    by 0x13F0AF: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:149)
==174647==    by 0x13F0AF: std::rt::lang_start_internal (rt.rs:141)
==174647==    by 0x122699: std::rt::lang_start (rt.rs:158)
==174647==    by 0x124F3D: main (in /tmp/san-data-race)
==174647== 
==174647== This conflicts with a previous write of size 4 by thread #3
==174647== Locks held: none
==174647==    at 0x122A30: san_data_race::main::{{closure}} (san-data-race.rs:30)
==174647==    by 0x120365: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==174647==    by 0x128685: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}} (mod.rs:542)
==174647==    by 0x121025: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (unwind_safe.rs:272)
==174647==    by 0x12054B: std::panicking::try::do_call (panicking.rs:559)
==174647==    by 0x1206FA: __rust_try (in /tmp/san-data-race)
==174647==    by 0x120485: std::panicking::try (panicking.rs:523)
==174647==    by 0x127D46: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#2}::{closure_env#0}<san_data_race::main::{closure_env#1}, ()>>, ()> (panic.rs:149)
==174647==    by 0x127D46: std::thread::Builder::spawn_unchecked_::{{closure}} (mod.rs:541)
==174647==  Address 0x16add0 is 0 bytes inside data symbol "_ZN13san_data_race14SHARED_COUNTER17h75b4b0961c850d6dE"
==174647== 
==174647== 
==174647== Use --history-level=approx or =none to gain increased speed, at
==174647== the cost of reduced accuracy of conflicting-access information
==174647== For lists of detected and suppressed errors, rerun with: -s
==174647== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

可以看到 valgrind 确实检测到了堆内存泄露的问题:

  • Possible data race during write of size 4 at 0x16ADD0 by thread #1
  • at 0x124D33: san_data_race::main (san-data-race.rs:34)
  • This conflicts with a previous write of size 4 by thread #3
  • at 0x122A30: san_data_race::main::{{closure}} (san-data-race.rs:30)

valgrind 的其它模块

除了上面提到的功能,valgrind 还可以检查 CPU 缓存及分支预测的命中率:

valgrind --tool=cachegrind your-app