Mercurial > hg
changeset 44870:9f96beb9bafe
rust: remove support for `re2`
With the performance issues with `regex` figured out and fixed in previous
patches and `regex` newly gaining support for empty alternations, there is no
reason to keep `re2` around anymore. It's only *marginally* faster at creating
the regex which saves at most a couple of ms, but gets beaten by `regex` in
every other aspect.
This removes the Rust/C/C++ bridge (hooray!), the `with-re2` feature, the
conditional code that goes with it, the documentation and relevant part of the
debug/module output.
Differential Revision: https://phab.mercurial-scm.org/D8594
author | Raphaël Gomès <rgomes@octobus.net> |
---|---|
date | Fri, 29 May 2020 12:17:59 +0200 |
parents | 4313a0d7540d |
children | 17d928f8abaf |
files | mercurial/debugcommands.py rust/Cargo.lock rust/README.rst rust/hg-core/Cargo.toml rust/hg-core/build.rs rust/hg-core/src/lib.rs rust/hg-core/src/matchers.rs rust/hg-core/src/re2/mod.rs rust/hg-core/src/re2/re2.rs rust/hg-core/src/re2/rust_re2.cpp rust/hg-cpython/Cargo.toml rust/hg-cpython/src/debug.rs tests/test-install.t |
diffstat | 13 files changed, 18 insertions(+), 322 deletions(-) [+] |
line wrap: on
line diff
--- a/mercurial/debugcommands.py Fri May 29 12:12:16 2020 +0200 +++ b/mercurial/debugcommands.py Fri May 29 12:17:59 2020 +0200 @@ -1650,13 +1650,6 @@ fm.plain(_(b'checking "re2" regexp engine (%s)\n') % re2) fm.data(re2=bool(util._re2)) - rust_debug_mod = policy.importrust("debug") - if rust_debug_mod is not None: - re2_rust = b'installed' if rust_debug_mod.re2_installed else b'missing' - - msg = b'checking "re2" regexp engine Rust bindings (%s)\n' - fm.plain(_(msg % re2_rust)) - # templates p = templater.templatepaths() fm.write(b'templatedirs', b'checking templates (%s)...\n', b' '.join(p))
--- a/rust/Cargo.lock Fri May 29 12:12:16 2020 +0200 +++ b/rust/Cargo.lock Fri May 29 12:17:59 2020 +0200 @@ -42,11 +42,6 @@ source = "registry+https://github.com/rust-lang/crates.io-index" [[package]] -name = "cc" -version = "1.0.50" -source = "registry+https://github.com/rust-lang/crates.io-index" - -[[package]] name = "cfg-if" version = "0.1.10" source = "registry+https://github.com/rust-lang/crates.io-index" @@ -208,12 +203,10 @@ version = "0.1.0" dependencies = [ "byteorder 1.3.4 (registry+https://github.com/rust-lang/crates.io-index)", - "cc 1.0.50 (registry+https://github.com/rust-lang/crates.io-index)", "clap 2.33.0 (registry+https://github.com/rust-lang/crates.io-index)", "crossbeam 0.7.3 (registry+https://github.com/rust-lang/crates.io-index)", "hex 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)", "lazy_static 1.4.0 (registry+https://github.com/rust-lang/crates.io-index)", - "libc 0.2.67 (registry+https://github.com/rust-lang/crates.io-index)", "log 0.4.8 (registry+https://github.com/rust-lang/crates.io-index)", "memchr 2.3.3 (registry+https://github.com/rust-lang/crates.io-index)", "memmap 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)", @@ -655,7 +648,6 @@ "checksum autocfg 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "f8aac770f1885fd7e387acedd76065302551364496e46b3dd00860b2f8359b9d" "checksum bitflags 1.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "cf1de2fe8c75bc145a2f577add951f8134889b4795d47466a54a5c846d691693" "checksum byteorder 1.3.4 (registry+https://github.com/rust-lang/crates.io-index)" = "08c48aae112d48ed9f069b33538ea9e3e90aa263cfa3d1c24309612b1f7472de" -"checksum cc 1.0.50 (registry+https://github.com/rust-lang/crates.io-index)" = "95e28fa049fda1c330bcf9d723be7663a899c4679724b34c81e9f5a326aab8cd" "checksum cfg-if 0.1.10 (registry+https://github.com/rust-lang/crates.io-index)" = "4785bdd1c96b2a846b2bd7cc02e86b6b3dbf14e7e53446c4f54c92a361040822" "checksum chrono 0.4.11 (registry+https://github.com/rust-lang/crates.io-index)" = "80094f509cf8b5ae86a4966a39b3ff66cd7e2a3e594accec3743ff3fabeab5b2" "checksum clap 2.33.0 (registry+https://github.com/rust-lang/crates.io-index)" = "5067f5bb2d80ef5d68b4c87db81601f0b75bca627bc2ef76b141d7b846a3c6d9"
--- a/rust/README.rst Fri May 29 12:12:16 2020 +0200 +++ b/rust/README.rst Fri May 29 12:17:59 2020 +0200 @@ -36,36 +36,6 @@ One day we may use this environment variable to switch to new experimental binding crates like a hypothetical ``HGWITHRUSTEXT=hpy``. -Using the fastest ``hg status`` -------------------------------- - -The code for ``hg status`` needs to conform to ``.hgignore`` rules, which are -all translated into regex. - -In the first version, for compatibility and ease of development reasons, the -Re2 regex engine was chosen until we figured out if the ``regex`` crate had -similar enough behavior. - -Now that that work has been done, the default behavior is to use the ``regex`` -crate, that provides a significant performance boost compared to the standard -Python + C path in many commands such as ``status``, ``diff`` and ``commit``, - -However, the ``Re2`` path remains slightly faster for our use cases and remains -a better option for getting the most speed out of your Mercurial. - -If you want to use ``Re2``, you need to install ``Re2`` following Google's -guidelines: https://github.com/google/re2/wiki/Install. -Then, use ``HG_RUST_FEATURES=with-re2`` and -``HG_RE2_PATH=system|<path to your re2 install>`` when building ``hg`` to -signal the use of Re2. Using the local path instead of the "system" RE2 links -it statically. - -For example:: - - $ HG_RUST_FEATURES=with-re2 HG_RE2_PATH=system make PURE=--rust - $ # OR - $ HG_RUST_FEATURES=with-re2 HG_RE2_PATH=/path/to/re2 make PURE=--rust - Developing Rust =============== @@ -114,14 +84,3 @@ $ cargo +nightly fmt This requires you to have the nightly toolchain installed. - -Additional features -------------------- - -As mentioned in the section about ``hg status``, code paths using ``re2`` are -opt-in. - -For example:: - - $ cargo check --features with-re2 -
--- a/rust/hg-core/Cargo.toml Fri May 29 12:12:16 2020 +0200 +++ b/rust/hg-core/Cargo.toml Fri May 29 12:17:59 2020 +0200 @@ -4,7 +4,6 @@ authors = ["Georges Racinet <gracinet@anybox.fr>"] description = "Mercurial pure Rust core library, with no assumption on Python bindings (FFI)" edition = "2018" -build = "build.rs" [lib] name = "hg" @@ -13,7 +12,6 @@ byteorder = "1.3.4" hex = "0.4.2" lazy_static = "1.4.0" -libc = { version = "0.2.66", optional = true } memchr = "2.3.3" rand = "0.7.3" rand_pcg = "0.2.1" @@ -31,10 +29,3 @@ memmap = "0.7.0" pretty_assertions = "0.6.1" tempfile = "3.1.0" - -[build-dependencies] -cc = { version = "1.0.48", optional = true } - -[features] -default = [] -with-re2 = ["cc", "libc"]
--- a/rust/hg-core/build.rs Fri May 29 12:12:16 2020 +0200 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,61 +0,0 @@ -// build.rs -// -// Copyright 2020 Raphaël Gomès <rgomes@octobus.net> -// -// This software may be used and distributed according to the terms of the -// GNU General Public License version 2 or any later version. - -#[cfg(feature = "with-re2")] -use cc; - -/// Uses either the system Re2 install as a dynamic library or the provided -/// build as a static library -#[cfg(feature = "with-re2")] -fn compile_re2() { - use cc; - use std::path::Path; - use std::process::exit; - - let msg = r"HG_RE2_PATH must be one of `system|<path to build source clone of Re2>`"; - let re2 = match std::env::var_os("HG_RE2_PATH") { - None => { - eprintln!("{}", msg); - exit(1) - } - Some(v) => { - if v == "system" { - None - } else { - Some(v) - } - } - }; - - let mut options = cc::Build::new(); - options - .cpp(true) - .flag("-std=c++11") - .file("src/re2/rust_re2.cpp"); - - if let Some(ref source) = re2 { - options.include(Path::new(source)); - }; - - options.compile("librustre.a"); - - if let Some(ref source) = &re2 { - // Link the local source statically - println!( - "cargo:rustc-link-search=native={}", - Path::new(source).join(Path::new("obj")).display() - ); - println!("cargo:rustc-link-lib=static=re2"); - } else { - println!("cargo:rustc-link-lib=re2"); - } -} - -fn main() { - #[cfg(feature = "with-re2")] - compile_re2(); -}
--- a/rust/hg-core/src/lib.rs Fri May 29 12:12:16 2020 +0200 +++ b/rust/hg-core/src/lib.rs Fri May 29 12:17:59 2020 +0200 @@ -23,8 +23,6 @@ pub mod matchers; pub mod revlog; pub use revlog::*; -#[cfg(feature = "with-re2")] -pub mod re2; pub mod utils; // Remove this to see (potential) non-artificial compile failures. MacOS @@ -141,9 +139,6 @@ /// Needed a pattern that can be turned into a regex but got one that /// can't. This should only happen through programmer error. NonRegexPattern(IgnorePattern), - /// This is temporary, see `re2/mod.rs`. - /// This will cause a fallback to Python. - Re2NotInstalled, } impl ToString for PatternError { @@ -166,10 +161,6 @@ PatternError::NonRegexPattern(pattern) => { format!("'{:?}' cannot be turned into a regex", pattern) } - PatternError::Re2NotInstalled => { - "Re2 is not installed, cannot use regex functionality." - .to_string() - } } } }
--- a/rust/hg-core/src/matchers.rs Fri May 29 12:12:16 2020 +0200 +++ b/rust/hg-core/src/matchers.rs Fri May 29 12:17:59 2020 +0200 @@ -7,8 +7,6 @@ //! Structs and types for matching files and directories. -#[cfg(feature = "with-re2")] -use crate::re2::Re2; use crate::{ dirstate::dirs_multiset::DirsChildrenMultiset, filepatterns::{ @@ -239,29 +237,24 @@ } /// Matches files that are included in the ignore rules. -#[cfg_attr( - feature = "with-re2", - doc = r##" -``` -use hg::{ - matchers::{IncludeMatcher, Matcher}, - IgnorePattern, - PatternSyntax, - utils::hg_path::HgPath -}; -use std::path::Path; -/// -let ignore_patterns = -vec![IgnorePattern::new(PatternSyntax::RootGlob, b"this*", Path::new(""))]; -let (matcher, _) = IncludeMatcher::new(ignore_patterns, "").unwrap(); -/// -assert_eq!(matcher.matches(HgPath::new(b"testing")), false); -assert_eq!(matcher.matches(HgPath::new(b"this should work")), true); -assert_eq!(matcher.matches(HgPath::new(b"this also")), true); -assert_eq!(matcher.matches(HgPath::new(b"but not this")), false); -``` -"## -)] +/// ``` +/// use hg::{ +/// matchers::{IncludeMatcher, Matcher}, +/// IgnorePattern, +/// PatternSyntax, +/// utils::hg_path::HgPath +/// }; +/// use std::path::Path; +/// /// +/// let ignore_patterns = +/// vec![IgnorePattern::new(PatternSyntax::RootGlob, b"this*", Path::new(""))]; +/// let (matcher, _) = IncludeMatcher::new(ignore_patterns, "").unwrap(); +/// /// +/// assert_eq!(matcher.matches(HgPath::new(b"testing")), false); +/// assert_eq!(matcher.matches(HgPath::new(b"this should work")), true); +/// assert_eq!(matcher.matches(HgPath::new(b"this also")), true); +/// assert_eq!(matcher.matches(HgPath::new(b"but not this")), false); +/// ``` pub struct IncludeMatcher<'a> { patterns: Vec<u8>, match_fn: Box<dyn for<'r> Fn(&'r HgPath) -> bool + 'a + Sync>, @@ -319,22 +312,6 @@ } } -#[cfg(feature = "with-re2")] -/// Returns a function that matches an `HgPath` against the given regex -/// pattern. -/// -/// This can fail when the pattern is invalid or not supported by the -/// underlying engine `Re2`, for instance anything with back-references. -#[timed] -fn re_matcher( - pattern: &[u8], -) -> PatternResult<impl Fn(&HgPath) -> bool + Sync> { - let regex = Re2::new(pattern); - let regex = regex.map_err(|e| PatternError::UnsupportedSyntax(e))?; - Ok(move |path: &HgPath| regex.is_match(path.as_bytes())) -} - -#[cfg(not(feature = "with-re2"))] /// Returns a function that matches an `HgPath` against the given regex /// pattern. /// @@ -844,7 +821,6 @@ ); } - #[cfg(feature = "with-re2")] #[test] fn test_includematcher() { // VisitchildrensetPrefix
--- a/rust/hg-core/src/re2/mod.rs Fri May 29 12:12:16 2020 +0200 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,21 +0,0 @@ -/// re2 module -/// -/// The Python implementation of Mercurial uses the Re2 regex engine when -/// possible and if the bindings are installed, falling back to Python's `re` -/// in case of unsupported syntax (Re2 is a non-backtracking engine). -/// -/// Using it from Rust is not ideal. We need C++ bindings, a C++ compiler, -/// Re2 needs to be installed... why not just use the `regex` crate? -/// -/// Using Re2 from the Rust implementation guarantees backwards compatibility. -/// We know it will work out of the box without needing to figure out the -/// subtle differences in syntax. For example, `regex` currently does not -/// support empty alternations (regex like `a||b`) which happens more often -/// than we might think. Old benchmarks also showed worse performance from -/// regex than with Re2, but the methodology and results were lost, so take -/// this with a grain of salt. -/// -/// The idea is to use Re2 for now as a temporary phase and then investigate -/// how much work would be needed to use `regex`. -mod re2; -pub use re2::Re2;
--- a/rust/hg-core/src/re2/re2.rs Fri May 29 12:12:16 2020 +0200 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,66 +0,0 @@ -/* -re2.rs - -Rust FFI bindings to Re2. - -Copyright 2020 Valentin Gatien-Baron - -This software may be used and distributed according to the terms of the -GNU General Public License version 2 or any later version. -*/ -use libc::{c_int, c_void}; - -type Re2Ptr = *const c_void; - -pub struct Re2(Re2Ptr); - -/// `re2.h` says: -/// "An "RE2" object is safe for concurrent use by multiple threads." -unsafe impl Sync for Re2 {} - -/// These bind to the C ABI in `rust_re2.cpp`. -extern "C" { - fn rust_re2_create(data: *const u8, len: usize) -> Re2Ptr; - fn rust_re2_destroy(re2: Re2Ptr); - fn rust_re2_ok(re2: Re2Ptr) -> bool; - fn rust_re2_error( - re2: Re2Ptr, - outdata: *mut *const u8, - outlen: *mut usize, - ) -> bool; - fn rust_re2_match( - re2: Re2Ptr, - data: *const u8, - len: usize, - anchor: c_int, - ) -> bool; -} - -impl Re2 { - pub fn new(pattern: &[u8]) -> Result<Re2, String> { - unsafe { - let re2 = rust_re2_create(pattern.as_ptr(), pattern.len()); - if rust_re2_ok(re2) { - Ok(Re2(re2)) - } else { - let mut data: *const u8 = std::ptr::null(); - let mut len: usize = 0; - rust_re2_error(re2, &mut data, &mut len); - Err(String::from_utf8_lossy(std::slice::from_raw_parts( - data, len, - )) - .to_string()) - } - } - } - - pub fn is_match(&self, data: &[u8]) -> bool { - unsafe { rust_re2_match(self.0, data.as_ptr(), data.len(), 1) } - } -} - -impl Drop for Re2 { - fn drop(&mut self) { - unsafe { rust_re2_destroy(self.0) } - } -}
--- a/rust/hg-core/src/re2/rust_re2.cpp Fri May 29 12:12:16 2020 +0200 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,49 +0,0 @@ -/* -rust_re2.cpp - -C ABI export of Re2's C++ interface for Rust FFI. - -Copyright 2020 Valentin Gatien-Baron - -This software may be used and distributed according to the terms of the -GNU General Public License version 2 or any later version. -*/ - -#include <re2/re2.h> -using namespace re2; - -extern "C" { - RE2* rust_re2_create(const char* data, size_t len) { - RE2::Options o; - o.set_encoding(RE2::Options::Encoding::EncodingLatin1); - o.set_log_errors(false); - o.set_max_mem(50000000); - - return new RE2(StringPiece(data, len), o); - } - - void rust_re2_destroy(RE2* re) { - delete re; - } - - bool rust_re2_ok(RE2* re) { - return re->ok(); - } - - void rust_re2_error(RE2* re, const char** outdata, size_t* outlen) { - const std::string& e = re->error(); - *outdata = e.data(); - *outlen = e.length(); - } - - bool rust_re2_match(RE2* re, char* data, size_t len, int ianchor) { - const StringPiece sp = StringPiece(data, len); - - RE2::Anchor anchor = - ianchor == 0 ? RE2::Anchor::UNANCHORED : - (ianchor == 1 ? RE2::Anchor::ANCHOR_START : - RE2::Anchor::ANCHOR_BOTH); - - return re->Match(sp, 0, len, anchor, NULL, 0); - } -}
--- a/rust/hg-cpython/Cargo.toml Fri May 29 12:12:16 2020 +0200 +++ b/rust/hg-cpython/Cargo.toml Fri May 29 12:17:59 2020 +0200 @@ -10,7 +10,6 @@ [features] default = ["python27"] -with-re2 = ["hg-core/with-re2"] # Features to build an extension module: python27 = ["cpython/python27-sys", "cpython/extension-module-2-7"]
--- a/rust/hg-cpython/src/debug.rs Fri May 29 12:12:16 2020 +0200 +++ b/rust/hg-cpython/src/debug.rs Fri May 29 12:17:59 2020 +0200 @@ -16,8 +16,6 @@ m.add(py, "__package__", package)?; m.add(py, "__doc__", "Rust debugging information")?; - m.add(py, "re2_installed", cfg!(feature = "with-re2"))?; - let sys = PyModule::import(py, "sys")?; let sys_modules: PyDict = sys.get(py, "modules")?.extract(py)?; sys_modules.set_item(py, dotted_name, &m)?;
--- a/tests/test-install.t Fri May 29 12:12:16 2020 +0200 +++ b/tests/test-install.t Fri May 29 12:17:59 2020 +0200 @@ -18,7 +18,6 @@ checking available compression engines (*zlib*) (glob) checking available compression engines for wire protocol (*zlib*) (glob) checking "re2" regexp engine \((available|missing)\) (re) - checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !) checking templates (*mercurial?templates)... (glob) checking default template (*mercurial?templates?map-cmdline.default) (glob) checking commit editor... (*) (glob) @@ -78,7 +77,6 @@ checking available compression engines (*zlib*) (glob) checking available compression engines for wire protocol (*zlib*) (glob) checking "re2" regexp engine \((available|missing)\) (re) - checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !) checking templates (*mercurial?templates)... (glob) checking default template (*mercurial?templates?map-cmdline.default) (glob) checking commit editor... (*) (glob) @@ -126,7 +124,6 @@ checking available compression engines (*zlib*) (glob) checking available compression engines for wire protocol (*zlib*) (glob) checking "re2" regexp engine \((available|missing)\) (re) - checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !) checking templates (*mercurial?templates)... (glob) checking default template (*mercurial?templates?map-cmdline.default) (glob) checking commit editor... ($TESTTMP/tools/testeditor.exe) @@ -154,7 +151,6 @@ checking available compression engines (*zlib*) (glob) checking available compression engines for wire protocol (*zlib*) (glob) checking "re2" regexp engine \((available|missing)\) (re) - checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !) checking templates (*mercurial?templates)... (glob) checking default template (*mercurial?templates?map-cmdline.default) (glob) checking commit editor... (c:\foo\bar\baz.exe) (windows !) @@ -211,7 +207,6 @@ checking available compression engines (*) (glob) checking available compression engines for wire protocol (*) (glob) checking "re2" regexp engine \((available|missing)\) (re) - checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !) checking templates ($TESTTMP/installenv/*/site-packages/mercurial/templates)... (glob) checking default template ($TESTTMP/installenv/*/site-packages/mercurial/templates/map-cmdline.default) (glob) checking commit editor... (*) (glob) @@ -252,7 +247,6 @@ checking available compression engines (*) (glob) checking available compression engines for wire protocol (*) (glob) checking "re2" regexp engine \((available|missing)\) (re) - checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !) checking templates ($TESTTMP/installenv/*/site-packages/mercurial/templates)... (glob) checking default template ($TESTTMP/installenv/*/site-packages/mercurial/templates/map-cmdline.default) (glob) checking commit editor... (*) (glob)