rust: remove support for `re2`
authorRaphaël Gomès <rgomes@octobus.net>
Fri, 29 May 2020 12:17:59 +0200
changeset 44870 9f96beb9bafe
parent 44869 4313a0d7540d
child 44871 17d928f8abaf
rust: remove support for `re2` With the performance issues with `regex` figured out and fixed in previous patches and `regex` newly gaining support for empty alternations, there is no reason to keep `re2` around anymore. It's only *marginally* faster at creating the regex which saves at most a couple of ms, but gets beaten by `regex` in every other aspect. This removes the Rust/C/C++ bridge (hooray!), the `with-re2` feature, the conditional code that goes with it, the documentation and relevant part of the debug/module output. Differential Revision: https://phab.mercurial-scm.org/D8594
mercurial/debugcommands.py
rust/Cargo.lock
rust/README.rst
rust/hg-core/Cargo.toml
rust/hg-core/build.rs
rust/hg-core/src/lib.rs
rust/hg-core/src/matchers.rs
rust/hg-core/src/re2/mod.rs
rust/hg-core/src/re2/re2.rs
rust/hg-core/src/re2/rust_re2.cpp
rust/hg-cpython/Cargo.toml
rust/hg-cpython/src/debug.rs
tests/test-install.t
--- a/mercurial/debugcommands.py	Fri May 29 12:12:16 2020 +0200
+++ b/mercurial/debugcommands.py	Fri May 29 12:17:59 2020 +0200
@@ -1650,13 +1650,6 @@
     fm.plain(_(b'checking "re2" regexp engine (%s)\n') % re2)
     fm.data(re2=bool(util._re2))
 
-    rust_debug_mod = policy.importrust("debug")
-    if rust_debug_mod is not None:
-        re2_rust = b'installed' if rust_debug_mod.re2_installed else b'missing'
-
-        msg = b'checking "re2" regexp engine Rust bindings (%s)\n'
-        fm.plain(_(msg % re2_rust))
-
     # templates
     p = templater.templatepaths()
     fm.write(b'templatedirs', b'checking templates (%s)...\n', b' '.join(p))
--- a/rust/Cargo.lock	Fri May 29 12:12:16 2020 +0200
+++ b/rust/Cargo.lock	Fri May 29 12:17:59 2020 +0200
@@ -42,11 +42,6 @@
 source = "registry+https://github.com/rust-lang/crates.io-index"
 
 [[package]]
-name = "cc"
-version = "1.0.50"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-
-[[package]]
 name = "cfg-if"
 version = "0.1.10"
 source = "registry+https://github.com/rust-lang/crates.io-index"
@@ -208,12 +203,10 @@
 version = "0.1.0"
 dependencies = [
  "byteorder 1.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
- "cc 1.0.50 (registry+https://github.com/rust-lang/crates.io-index)",
  "clap 2.33.0 (registry+https://github.com/rust-lang/crates.io-index)",
  "crossbeam 0.7.3 (registry+https://github.com/rust-lang/crates.io-index)",
  "hex 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
  "lazy_static 1.4.0 (registry+https://github.com/rust-lang/crates.io-index)",
- "libc 0.2.67 (registry+https://github.com/rust-lang/crates.io-index)",
  "log 0.4.8 (registry+https://github.com/rust-lang/crates.io-index)",
  "memchr 2.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
  "memmap 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -655,7 +648,6 @@
 "checksum autocfg 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "f8aac770f1885fd7e387acedd76065302551364496e46b3dd00860b2f8359b9d"
 "checksum bitflags 1.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "cf1de2fe8c75bc145a2f577add951f8134889b4795d47466a54a5c846d691693"
 "checksum byteorder 1.3.4 (registry+https://github.com/rust-lang/crates.io-index)" = "08c48aae112d48ed9f069b33538ea9e3e90aa263cfa3d1c24309612b1f7472de"
-"checksum cc 1.0.50 (registry+https://github.com/rust-lang/crates.io-index)" = "95e28fa049fda1c330bcf9d723be7663a899c4679724b34c81e9f5a326aab8cd"
 "checksum cfg-if 0.1.10 (registry+https://github.com/rust-lang/crates.io-index)" = "4785bdd1c96b2a846b2bd7cc02e86b6b3dbf14e7e53446c4f54c92a361040822"
 "checksum chrono 0.4.11 (registry+https://github.com/rust-lang/crates.io-index)" = "80094f509cf8b5ae86a4966a39b3ff66cd7e2a3e594accec3743ff3fabeab5b2"
 "checksum clap 2.33.0 (registry+https://github.com/rust-lang/crates.io-index)" = "5067f5bb2d80ef5d68b4c87db81601f0b75bca627bc2ef76b141d7b846a3c6d9"
--- a/rust/README.rst	Fri May 29 12:12:16 2020 +0200
+++ b/rust/README.rst	Fri May 29 12:17:59 2020 +0200
@@ -36,36 +36,6 @@
 One day we may use this environment variable to switch to new experimental
 binding crates like a hypothetical ``HGWITHRUSTEXT=hpy``.
 
-Using the fastest ``hg status``
--------------------------------
-
-The code for ``hg status`` needs to conform to ``.hgignore`` rules, which are
-all translated into regex. 
-
-In the first version, for compatibility and ease of development reasons, the 
-Re2 regex engine was chosen until we figured out if the ``regex`` crate had
-similar enough behavior.
-
-Now that that work has been done, the default behavior is to use the ``regex``
-crate, that provides a significant performance boost compared to the standard 
-Python + C path in many commands such as ``status``, ``diff`` and ``commit``,
-
-However, the ``Re2`` path remains slightly faster for our use cases and remains
-a better option for getting the most speed out of your Mercurial. 
-
-If you want to use ``Re2``, you need to install ``Re2`` following Google's 
-guidelines: https://github.com/google/re2/wiki/Install.
-Then, use ``HG_RUST_FEATURES=with-re2`` and 
-``HG_RE2_PATH=system|<path to your re2 install>`` when building ``hg`` to 
-signal the use of Re2. Using the local path instead of the "system" RE2 links
-it statically.
-
-For example::
-
-  $ HG_RUST_FEATURES=with-re2 HG_RE2_PATH=system make PURE=--rust
-  $ # OR
-  $ HG_RUST_FEATURES=with-re2 HG_RE2_PATH=/path/to/re2 make PURE=--rust
-
 Developing Rust
 ===============
 
@@ -114,14 +84,3 @@
   $ cargo +nightly fmt
 
 This requires you to have the nightly toolchain installed.
-
-Additional features
--------------------
-
-As mentioned in the section about ``hg status``, code paths using ``re2`` are
-opt-in.
-
-For example::
-
-  $ cargo check --features with-re2
-
--- a/rust/hg-core/Cargo.toml	Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-core/Cargo.toml	Fri May 29 12:17:59 2020 +0200
@@ -4,7 +4,6 @@
 authors = ["Georges Racinet <gracinet@anybox.fr>"]
 description = "Mercurial pure Rust core library, with no assumption on Python bindings (FFI)"
 edition = "2018"
-build = "build.rs"
 
 [lib]
 name = "hg"
@@ -13,7 +12,6 @@
 byteorder = "1.3.4"
 hex = "0.4.2"
 lazy_static = "1.4.0"
-libc = { version = "0.2.66", optional = true }
 memchr = "2.3.3"
 rand = "0.7.3"
 rand_pcg = "0.2.1"
@@ -31,10 +29,3 @@
 memmap = "0.7.0"
 pretty_assertions = "0.6.1"
 tempfile = "3.1.0"
-
-[build-dependencies]
-cc = { version = "1.0.48", optional = true }
-
-[features]
-default = []
-with-re2 = ["cc", "libc"]
--- a/rust/hg-core/build.rs	Fri May 29 12:12:16 2020 +0200
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,61 +0,0 @@
-// build.rs
-//
-// Copyright 2020 Raphaël Gomès <rgomes@octobus.net>
-//
-// This software may be used and distributed according to the terms of the
-// GNU General Public License version 2 or any later version.
-
-#[cfg(feature = "with-re2")]
-use cc;
-
-/// Uses either the system Re2 install as a dynamic library or the provided
-/// build as a static library
-#[cfg(feature = "with-re2")]
-fn compile_re2() {
-    use cc;
-    use std::path::Path;
-    use std::process::exit;
-
-    let msg = r"HG_RE2_PATH must be one of `system|<path to build source clone of Re2>`";
-    let re2 = match std::env::var_os("HG_RE2_PATH") {
-        None => {
-            eprintln!("{}", msg);
-            exit(1)
-        }
-        Some(v) => {
-            if v == "system" {
-                None
-            } else {
-                Some(v)
-            }
-        }
-    };
-
-    let mut options = cc::Build::new();
-    options
-        .cpp(true)
-        .flag("-std=c++11")
-        .file("src/re2/rust_re2.cpp");
-
-    if let Some(ref source) = re2 {
-        options.include(Path::new(source));
-    };
-
-    options.compile("librustre.a");
-
-    if let Some(ref source) = &re2 {
-        // Link the local source statically
-        println!(
-            "cargo:rustc-link-search=native={}",
-            Path::new(source).join(Path::new("obj")).display()
-        );
-        println!("cargo:rustc-link-lib=static=re2");
-    } else {
-        println!("cargo:rustc-link-lib=re2");
-    }
-}
-
-fn main() {
-    #[cfg(feature = "with-re2")]
-    compile_re2();
-}
--- a/rust/hg-core/src/lib.rs	Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-core/src/lib.rs	Fri May 29 12:17:59 2020 +0200
@@ -23,8 +23,6 @@
 pub mod matchers;
 pub mod revlog;
 pub use revlog::*;
-#[cfg(feature = "with-re2")]
-pub mod re2;
 pub mod utils;
 
 // Remove this to see (potential) non-artificial compile failures. MacOS
@@ -141,9 +139,6 @@
     /// Needed a pattern that can be turned into a regex but got one that
     /// can't. This should only happen through programmer error.
     NonRegexPattern(IgnorePattern),
-    /// This is temporary, see `re2/mod.rs`.
-    /// This will cause a fallback to Python.
-    Re2NotInstalled,
 }
 
 impl ToString for PatternError {
@@ -166,10 +161,6 @@
             PatternError::NonRegexPattern(pattern) => {
                 format!("'{:?}' cannot be turned into a regex", pattern)
             }
-            PatternError::Re2NotInstalled => {
-                "Re2 is not installed, cannot use regex functionality."
-                    .to_string()
-            }
         }
     }
 }
--- a/rust/hg-core/src/matchers.rs	Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-core/src/matchers.rs	Fri May 29 12:17:59 2020 +0200
@@ -7,8 +7,6 @@
 
 //! Structs and types for matching files and directories.
 
-#[cfg(feature = "with-re2")]
-use crate::re2::Re2;
 use crate::{
     dirstate::dirs_multiset::DirsChildrenMultiset,
     filepatterns::{
@@ -239,29 +237,24 @@
 }
 
 /// Matches files that are included in the ignore rules.
-#[cfg_attr(
-    feature = "with-re2",
-    doc = r##"
-```
-use hg::{
-    matchers::{IncludeMatcher, Matcher},
-    IgnorePattern,
-    PatternSyntax,
-    utils::hg_path::HgPath
-};
-use std::path::Path;
-///
-let ignore_patterns =
-vec![IgnorePattern::new(PatternSyntax::RootGlob, b"this*", Path::new(""))];
-let (matcher, _) = IncludeMatcher::new(ignore_patterns, "").unwrap();
-///
-assert_eq!(matcher.matches(HgPath::new(b"testing")), false);
-assert_eq!(matcher.matches(HgPath::new(b"this should work")), true);
-assert_eq!(matcher.matches(HgPath::new(b"this also")), true);
-assert_eq!(matcher.matches(HgPath::new(b"but not this")), false);
-```
-"##
-)]
+/// ```
+/// use hg::{
+///     matchers::{IncludeMatcher, Matcher},
+///     IgnorePattern,
+///     PatternSyntax,
+///     utils::hg_path::HgPath
+/// };
+/// use std::path::Path;
+/// ///
+/// let ignore_patterns =
+/// vec![IgnorePattern::new(PatternSyntax::RootGlob, b"this*", Path::new(""))];
+/// let (matcher, _) = IncludeMatcher::new(ignore_patterns, "").unwrap();
+/// ///
+/// assert_eq!(matcher.matches(HgPath::new(b"testing")), false);
+/// assert_eq!(matcher.matches(HgPath::new(b"this should work")), true);
+/// assert_eq!(matcher.matches(HgPath::new(b"this also")), true);
+/// assert_eq!(matcher.matches(HgPath::new(b"but not this")), false);
+/// ```
 pub struct IncludeMatcher<'a> {
     patterns: Vec<u8>,
     match_fn: Box<dyn for<'r> Fn(&'r HgPath) -> bool + 'a + Sync>,
@@ -319,22 +312,6 @@
     }
 }
 
-#[cfg(feature = "with-re2")]
-/// Returns a function that matches an `HgPath` against the given regex
-/// pattern.
-///
-/// This can fail when the pattern is invalid or not supported by the
-/// underlying engine `Re2`, for instance anything with back-references.
-#[timed]
-fn re_matcher(
-    pattern: &[u8],
-) -> PatternResult<impl Fn(&HgPath) -> bool + Sync> {
-    let regex = Re2::new(pattern);
-    let regex = regex.map_err(|e| PatternError::UnsupportedSyntax(e))?;
-    Ok(move |path: &HgPath| regex.is_match(path.as_bytes()))
-}
-
-#[cfg(not(feature = "with-re2"))]
 /// Returns a function that matches an `HgPath` against the given regex
 /// pattern.
 ///
@@ -844,7 +821,6 @@
         );
     }
 
-    #[cfg(feature = "with-re2")]
     #[test]
     fn test_includematcher() {
         // VisitchildrensetPrefix
--- a/rust/hg-core/src/re2/mod.rs	Fri May 29 12:12:16 2020 +0200
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,21 +0,0 @@
-/// re2 module
-///
-/// The Python implementation of Mercurial uses the Re2 regex engine when
-/// possible and if the bindings are installed, falling back to Python's `re`
-/// in case of unsupported syntax (Re2 is a non-backtracking engine).
-///
-/// Using it from Rust is not ideal. We need C++ bindings, a C++ compiler,
-/// Re2 needs to be installed... why not just use the `regex` crate?
-///
-/// Using Re2 from the Rust implementation guarantees backwards compatibility.
-/// We know it will work out of the box without needing to figure out the
-/// subtle differences in syntax. For example, `regex` currently does not
-/// support empty alternations (regex like `a||b`) which happens more often
-/// than we might think. Old benchmarks also showed worse performance from
-/// regex than with Re2, but the methodology and results were lost, so take
-/// this with a grain of salt.
-///
-/// The idea is to use Re2 for now as a temporary phase and then investigate
-/// how much work would be needed to use `regex`.
-mod re2;
-pub use re2::Re2;
--- a/rust/hg-core/src/re2/re2.rs	Fri May 29 12:12:16 2020 +0200
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,66 +0,0 @@
-/*
-re2.rs
-
-Rust FFI bindings to Re2.
-
-Copyright 2020 Valentin Gatien-Baron
-
-This software may be used and distributed according to the terms of the
-GNU General Public License version 2 or any later version.
-*/
-use libc::{c_int, c_void};
-
-type Re2Ptr = *const c_void;
-
-pub struct Re2(Re2Ptr);
-
-/// `re2.h` says:
-/// "An "RE2" object is safe for concurrent use by multiple threads."
-unsafe impl Sync for Re2 {}
-
-/// These bind to the C ABI in `rust_re2.cpp`.
-extern "C" {
-    fn rust_re2_create(data: *const u8, len: usize) -> Re2Ptr;
-    fn rust_re2_destroy(re2: Re2Ptr);
-    fn rust_re2_ok(re2: Re2Ptr) -> bool;
-    fn rust_re2_error(
-        re2: Re2Ptr,
-        outdata: *mut *const u8,
-        outlen: *mut usize,
-    ) -> bool;
-    fn rust_re2_match(
-        re2: Re2Ptr,
-        data: *const u8,
-        len: usize,
-        anchor: c_int,
-    ) -> bool;
-}
-
-impl Re2 {
-    pub fn new(pattern: &[u8]) -> Result<Re2, String> {
-        unsafe {
-            let re2 = rust_re2_create(pattern.as_ptr(), pattern.len());
-            if rust_re2_ok(re2) {
-                Ok(Re2(re2))
-            } else {
-                let mut data: *const u8 = std::ptr::null();
-                let mut len: usize = 0;
-                rust_re2_error(re2, &mut data, &mut len);
-                Err(String::from_utf8_lossy(std::slice::from_raw_parts(
-                    data, len,
-                ))
-                .to_string())
-            }
-        }
-    }
-
-    pub fn is_match(&self, data: &[u8]) -> bool {
-        unsafe { rust_re2_match(self.0, data.as_ptr(), data.len(), 1) }
-    }
-}
-
-impl Drop for Re2 {
-    fn drop(&mut self) {
-        unsafe { rust_re2_destroy(self.0) }
-    }
-}
--- a/rust/hg-core/src/re2/rust_re2.cpp	Fri May 29 12:12:16 2020 +0200
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,49 +0,0 @@
-/*
-rust_re2.cpp
-
-C ABI export of Re2's C++ interface for Rust FFI.
-
-Copyright 2020 Valentin Gatien-Baron
-
-This software may be used and distributed according to the terms of the
-GNU General Public License version 2 or any later version.
-*/
-
-#include <re2/re2.h>
-using namespace re2;
-
-extern "C" {
-	RE2* rust_re2_create(const char* data, size_t len) {
-		RE2::Options o;
-		o.set_encoding(RE2::Options::Encoding::EncodingLatin1);
-		o.set_log_errors(false);
-		o.set_max_mem(50000000);
-
-		return new RE2(StringPiece(data, len), o);
-	}
-
-	void rust_re2_destroy(RE2* re) {
-		delete re;
-	}
-
-	bool rust_re2_ok(RE2* re) {
-		return re->ok();
-	}
-
-	void rust_re2_error(RE2* re, const char** outdata, size_t* outlen) {
-		const std::string& e = re->error();
-		*outdata = e.data();
-		*outlen = e.length();
-	}
-
-	bool rust_re2_match(RE2* re, char* data, size_t len, int ianchor) {
-		const StringPiece sp = StringPiece(data, len);
-
-		RE2::Anchor anchor =
-			ianchor == 0 ? RE2::Anchor::UNANCHORED :
-			(ianchor == 1 ? RE2::Anchor::ANCHOR_START :
-			 RE2::Anchor::ANCHOR_BOTH);
-
-		return re->Match(sp, 0, len, anchor, NULL, 0);
-	}
-}
--- a/rust/hg-cpython/Cargo.toml	Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-cpython/Cargo.toml	Fri May 29 12:17:59 2020 +0200
@@ -10,7 +10,6 @@
 
 [features]
 default = ["python27"]
-with-re2 = ["hg-core/with-re2"]
 
 # Features to build an extension module:
 python27 = ["cpython/python27-sys", "cpython/extension-module-2-7"]
--- a/rust/hg-cpython/src/debug.rs	Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-cpython/src/debug.rs	Fri May 29 12:17:59 2020 +0200
@@ -16,8 +16,6 @@
     m.add(py, "__package__", package)?;
     m.add(py, "__doc__", "Rust debugging information")?;
 
-    m.add(py, "re2_installed", cfg!(feature = "with-re2"))?;
-
     let sys = PyModule::import(py, "sys")?;
     let sys_modules: PyDict = sys.get(py, "modules")?.extract(py)?;
     sys_modules.set_item(py, dotted_name, &m)?;
--- a/tests/test-install.t	Fri May 29 12:12:16 2020 +0200
+++ b/tests/test-install.t	Fri May 29 12:17:59 2020 +0200
@@ -18,7 +18,6 @@
   checking available compression engines (*zlib*) (glob)
   checking available compression engines for wire protocol (*zlib*) (glob)
   checking "re2" regexp engine \((available|missing)\) (re)
-  checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
   checking templates (*mercurial?templates)... (glob)
   checking default template (*mercurial?templates?map-cmdline.default) (glob)
   checking commit editor... (*) (glob)
@@ -78,7 +77,6 @@
   checking available compression engines (*zlib*) (glob)
   checking available compression engines for wire protocol (*zlib*) (glob)
   checking "re2" regexp engine \((available|missing)\) (re)
-  checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
   checking templates (*mercurial?templates)... (glob)
   checking default template (*mercurial?templates?map-cmdline.default) (glob)
   checking commit editor... (*) (glob)
@@ -126,7 +124,6 @@
   checking available compression engines (*zlib*) (glob)
   checking available compression engines for wire protocol (*zlib*) (glob)
   checking "re2" regexp engine \((available|missing)\) (re)
-  checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
   checking templates (*mercurial?templates)... (glob)
   checking default template (*mercurial?templates?map-cmdline.default) (glob)
   checking commit editor... ($TESTTMP/tools/testeditor.exe)
@@ -154,7 +151,6 @@
   checking available compression engines (*zlib*) (glob)
   checking available compression engines for wire protocol (*zlib*) (glob)
   checking "re2" regexp engine \((available|missing)\) (re)
-  checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
   checking templates (*mercurial?templates)... (glob)
   checking default template (*mercurial?templates?map-cmdline.default) (glob)
   checking commit editor... (c:\foo\bar\baz.exe) (windows !)
@@ -211,7 +207,6 @@
   checking available compression engines (*) (glob)
   checking available compression engines for wire protocol (*) (glob)
   checking "re2" regexp engine \((available|missing)\) (re)
-  checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
   checking templates ($TESTTMP/installenv/*/site-packages/mercurial/templates)... (glob)
   checking default template ($TESTTMP/installenv/*/site-packages/mercurial/templates/map-cmdline.default) (glob)
   checking commit editor... (*) (glob)
@@ -252,7 +247,6 @@
   checking available compression engines (*) (glob)
   checking available compression engines for wire protocol (*) (glob)
   checking "re2" regexp engine \((available|missing)\) (re)
-  checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
   checking templates ($TESTTMP/installenv/*/site-packages/mercurial/templates)... (glob)
   checking default template ($TESTTMP/installenv/*/site-packages/mercurial/templates/map-cmdline.default) (glob)
   checking commit editor... (*) (glob)