rust: remove support for `re2`
With the performance issues with `regex` figured out and fixed in previous
patches and `regex` newly gaining support for empty alternations, there is no
reason to keep `re2` around anymore. It's only *marginally* faster at creating
the regex which saves at most a couple of ms, but gets beaten by `regex` in
every other aspect.
This removes the Rust/C/C++ bridge (hooray!), the `with-re2` feature, the
conditional code that goes with it, the documentation and relevant part of the
debug/module output.
Differential Revision: https://phab.mercurial-scm.org/D8594
--- a/mercurial/debugcommands.py Fri May 29 12:12:16 2020 +0200
+++ b/mercurial/debugcommands.py Fri May 29 12:17:59 2020 +0200
@@ -1650,13 +1650,6 @@
fm.plain(_(b'checking "re2" regexp engine (%s)\n') % re2)
fm.data(re2=bool(util._re2))
- rust_debug_mod = policy.importrust("debug")
- if rust_debug_mod is not None:
- re2_rust = b'installed' if rust_debug_mod.re2_installed else b'missing'
-
- msg = b'checking "re2" regexp engine Rust bindings (%s)\n'
- fm.plain(_(msg % re2_rust))
-
# templates
p = templater.templatepaths()
fm.write(b'templatedirs', b'checking templates (%s)...\n', b' '.join(p))
--- a/rust/Cargo.lock Fri May 29 12:12:16 2020 +0200
+++ b/rust/Cargo.lock Fri May 29 12:17:59 2020 +0200
@@ -42,11 +42,6 @@
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
-name = "cc"
-version = "1.0.50"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-
-[[package]]
name = "cfg-if"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
@@ -208,12 +203,10 @@
version = "0.1.0"
dependencies = [
"byteorder 1.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
- "cc 1.0.50 (registry+https://github.com/rust-lang/crates.io-index)",
"clap 2.33.0 (registry+https://github.com/rust-lang/crates.io-index)",
"crossbeam 0.7.3 (registry+https://github.com/rust-lang/crates.io-index)",
"hex 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
"lazy_static 1.4.0 (registry+https://github.com/rust-lang/crates.io-index)",
- "libc 0.2.67 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.8 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
"memmap 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -655,7 +648,6 @@
"checksum autocfg 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "f8aac770f1885fd7e387acedd76065302551364496e46b3dd00860b2f8359b9d"
"checksum bitflags 1.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "cf1de2fe8c75bc145a2f577add951f8134889b4795d47466a54a5c846d691693"
"checksum byteorder 1.3.4 (registry+https://github.com/rust-lang/crates.io-index)" = "08c48aae112d48ed9f069b33538ea9e3e90aa263cfa3d1c24309612b1f7472de"
-"checksum cc 1.0.50 (registry+https://github.com/rust-lang/crates.io-index)" = "95e28fa049fda1c330bcf9d723be7663a899c4679724b34c81e9f5a326aab8cd"
"checksum cfg-if 0.1.10 (registry+https://github.com/rust-lang/crates.io-index)" = "4785bdd1c96b2a846b2bd7cc02e86b6b3dbf14e7e53446c4f54c92a361040822"
"checksum chrono 0.4.11 (registry+https://github.com/rust-lang/crates.io-index)" = "80094f509cf8b5ae86a4966a39b3ff66cd7e2a3e594accec3743ff3fabeab5b2"
"checksum clap 2.33.0 (registry+https://github.com/rust-lang/crates.io-index)" = "5067f5bb2d80ef5d68b4c87db81601f0b75bca627bc2ef76b141d7b846a3c6d9"
--- a/rust/README.rst Fri May 29 12:12:16 2020 +0200
+++ b/rust/README.rst Fri May 29 12:17:59 2020 +0200
@@ -36,36 +36,6 @@
One day we may use this environment variable to switch to new experimental
binding crates like a hypothetical ``HGWITHRUSTEXT=hpy``.
-Using the fastest ``hg status``
--------------------------------
-
-The code for ``hg status`` needs to conform to ``.hgignore`` rules, which are
-all translated into regex.
-
-In the first version, for compatibility and ease of development reasons, the
-Re2 regex engine was chosen until we figured out if the ``regex`` crate had
-similar enough behavior.
-
-Now that that work has been done, the default behavior is to use the ``regex``
-crate, that provides a significant performance boost compared to the standard
-Python + C path in many commands such as ``status``, ``diff`` and ``commit``,
-
-However, the ``Re2`` path remains slightly faster for our use cases and remains
-a better option for getting the most speed out of your Mercurial.
-
-If you want to use ``Re2``, you need to install ``Re2`` following Google's
-guidelines: https://github.com/google/re2/wiki/Install.
-Then, use ``HG_RUST_FEATURES=with-re2`` and
-``HG_RE2_PATH=system|<path to your re2 install>`` when building ``hg`` to
-signal the use of Re2. Using the local path instead of the "system" RE2 links
-it statically.
-
-For example::
-
- $ HG_RUST_FEATURES=with-re2 HG_RE2_PATH=system make PURE=--rust
- $ # OR
- $ HG_RUST_FEATURES=with-re2 HG_RE2_PATH=/path/to/re2 make PURE=--rust
-
Developing Rust
===============
@@ -114,14 +84,3 @@
$ cargo +nightly fmt
This requires you to have the nightly toolchain installed.
-
-Additional features
--------------------
-
-As mentioned in the section about ``hg status``, code paths using ``re2`` are
-opt-in.
-
-For example::
-
- $ cargo check --features with-re2
-
--- a/rust/hg-core/Cargo.toml Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-core/Cargo.toml Fri May 29 12:17:59 2020 +0200
@@ -4,7 +4,6 @@
authors = ["Georges Racinet <gracinet@anybox.fr>"]
description = "Mercurial pure Rust core library, with no assumption on Python bindings (FFI)"
edition = "2018"
-build = "build.rs"
[lib]
name = "hg"
@@ -13,7 +12,6 @@
byteorder = "1.3.4"
hex = "0.4.2"
lazy_static = "1.4.0"
-libc = { version = "0.2.66", optional = true }
memchr = "2.3.3"
rand = "0.7.3"
rand_pcg = "0.2.1"
@@ -31,10 +29,3 @@
memmap = "0.7.0"
pretty_assertions = "0.6.1"
tempfile = "3.1.0"
-
-[build-dependencies]
-cc = { version = "1.0.48", optional = true }
-
-[features]
-default = []
-with-re2 = ["cc", "libc"]
--- a/rust/hg-core/build.rs Fri May 29 12:12:16 2020 +0200
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,61 +0,0 @@
-// build.rs
-//
-// Copyright 2020 Raphaël Gomès <rgomes@octobus.net>
-//
-// This software may be used and distributed according to the terms of the
-// GNU General Public License version 2 or any later version.
-
-#[cfg(feature = "with-re2")]
-use cc;
-
-/// Uses either the system Re2 install as a dynamic library or the provided
-/// build as a static library
-#[cfg(feature = "with-re2")]
-fn compile_re2() {
- use cc;
- use std::path::Path;
- use std::process::exit;
-
- let msg = r"HG_RE2_PATH must be one of `system|<path to build source clone of Re2>`";
- let re2 = match std::env::var_os("HG_RE2_PATH") {
- None => {
- eprintln!("{}", msg);
- exit(1)
- }
- Some(v) => {
- if v == "system" {
- None
- } else {
- Some(v)
- }
- }
- };
-
- let mut options = cc::Build::new();
- options
- .cpp(true)
- .flag("-std=c++11")
- .file("src/re2/rust_re2.cpp");
-
- if let Some(ref source) = re2 {
- options.include(Path::new(source));
- };
-
- options.compile("librustre.a");
-
- if let Some(ref source) = &re2 {
- // Link the local source statically
- println!(
- "cargo:rustc-link-search=native={}",
- Path::new(source).join(Path::new("obj")).display()
- );
- println!("cargo:rustc-link-lib=static=re2");
- } else {
- println!("cargo:rustc-link-lib=re2");
- }
-}
-
-fn main() {
- #[cfg(feature = "with-re2")]
- compile_re2();
-}
--- a/rust/hg-core/src/lib.rs Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-core/src/lib.rs Fri May 29 12:17:59 2020 +0200
@@ -23,8 +23,6 @@
pub mod matchers;
pub mod revlog;
pub use revlog::*;
-#[cfg(feature = "with-re2")]
-pub mod re2;
pub mod utils;
// Remove this to see (potential) non-artificial compile failures. MacOS
@@ -141,9 +139,6 @@
/// Needed a pattern that can be turned into a regex but got one that
/// can't. This should only happen through programmer error.
NonRegexPattern(IgnorePattern),
- /// This is temporary, see `re2/mod.rs`.
- /// This will cause a fallback to Python.
- Re2NotInstalled,
}
impl ToString for PatternError {
@@ -166,10 +161,6 @@
PatternError::NonRegexPattern(pattern) => {
format!("'{:?}' cannot be turned into a regex", pattern)
}
- PatternError::Re2NotInstalled => {
- "Re2 is not installed, cannot use regex functionality."
- .to_string()
- }
}
}
}
--- a/rust/hg-core/src/matchers.rs Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-core/src/matchers.rs Fri May 29 12:17:59 2020 +0200
@@ -7,8 +7,6 @@
//! Structs and types for matching files and directories.
-#[cfg(feature = "with-re2")]
-use crate::re2::Re2;
use crate::{
dirstate::dirs_multiset::DirsChildrenMultiset,
filepatterns::{
@@ -239,29 +237,24 @@
}
/// Matches files that are included in the ignore rules.
-#[cfg_attr(
- feature = "with-re2",
- doc = r##"
-```
-use hg::{
- matchers::{IncludeMatcher, Matcher},
- IgnorePattern,
- PatternSyntax,
- utils::hg_path::HgPath
-};
-use std::path::Path;
-///
-let ignore_patterns =
-vec![IgnorePattern::new(PatternSyntax::RootGlob, b"this*", Path::new(""))];
-let (matcher, _) = IncludeMatcher::new(ignore_patterns, "").unwrap();
-///
-assert_eq!(matcher.matches(HgPath::new(b"testing")), false);
-assert_eq!(matcher.matches(HgPath::new(b"this should work")), true);
-assert_eq!(matcher.matches(HgPath::new(b"this also")), true);
-assert_eq!(matcher.matches(HgPath::new(b"but not this")), false);
-```
-"##
-)]
+/// ```
+/// use hg::{
+/// matchers::{IncludeMatcher, Matcher},
+/// IgnorePattern,
+/// PatternSyntax,
+/// utils::hg_path::HgPath
+/// };
+/// use std::path::Path;
+/// ///
+/// let ignore_patterns =
+/// vec![IgnorePattern::new(PatternSyntax::RootGlob, b"this*", Path::new(""))];
+/// let (matcher, _) = IncludeMatcher::new(ignore_patterns, "").unwrap();
+/// ///
+/// assert_eq!(matcher.matches(HgPath::new(b"testing")), false);
+/// assert_eq!(matcher.matches(HgPath::new(b"this should work")), true);
+/// assert_eq!(matcher.matches(HgPath::new(b"this also")), true);
+/// assert_eq!(matcher.matches(HgPath::new(b"but not this")), false);
+/// ```
pub struct IncludeMatcher<'a> {
patterns: Vec<u8>,
match_fn: Box<dyn for<'r> Fn(&'r HgPath) -> bool + 'a + Sync>,
@@ -319,22 +312,6 @@
}
}
-#[cfg(feature = "with-re2")]
-/// Returns a function that matches an `HgPath` against the given regex
-/// pattern.
-///
-/// This can fail when the pattern is invalid or not supported by the
-/// underlying engine `Re2`, for instance anything with back-references.
-#[timed]
-fn re_matcher(
- pattern: &[u8],
-) -> PatternResult<impl Fn(&HgPath) -> bool + Sync> {
- let regex = Re2::new(pattern);
- let regex = regex.map_err(|e| PatternError::UnsupportedSyntax(e))?;
- Ok(move |path: &HgPath| regex.is_match(path.as_bytes()))
-}
-
-#[cfg(not(feature = "with-re2"))]
/// Returns a function that matches an `HgPath` against the given regex
/// pattern.
///
@@ -844,7 +821,6 @@
);
}
- #[cfg(feature = "with-re2")]
#[test]
fn test_includematcher() {
// VisitchildrensetPrefix
--- a/rust/hg-core/src/re2/mod.rs Fri May 29 12:12:16 2020 +0200
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,21 +0,0 @@
-/// re2 module
-///
-/// The Python implementation of Mercurial uses the Re2 regex engine when
-/// possible and if the bindings are installed, falling back to Python's `re`
-/// in case of unsupported syntax (Re2 is a non-backtracking engine).
-///
-/// Using it from Rust is not ideal. We need C++ bindings, a C++ compiler,
-/// Re2 needs to be installed... why not just use the `regex` crate?
-///
-/// Using Re2 from the Rust implementation guarantees backwards compatibility.
-/// We know it will work out of the box without needing to figure out the
-/// subtle differences in syntax. For example, `regex` currently does not
-/// support empty alternations (regex like `a||b`) which happens more often
-/// than we might think. Old benchmarks also showed worse performance from
-/// regex than with Re2, but the methodology and results were lost, so take
-/// this with a grain of salt.
-///
-/// The idea is to use Re2 for now as a temporary phase and then investigate
-/// how much work would be needed to use `regex`.
-mod re2;
-pub use re2::Re2;
--- a/rust/hg-core/src/re2/re2.rs Fri May 29 12:12:16 2020 +0200
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,66 +0,0 @@
-/*
-re2.rs
-
-Rust FFI bindings to Re2.
-
-Copyright 2020 Valentin Gatien-Baron
-
-This software may be used and distributed according to the terms of the
-GNU General Public License version 2 or any later version.
-*/
-use libc::{c_int, c_void};
-
-type Re2Ptr = *const c_void;
-
-pub struct Re2(Re2Ptr);
-
-/// `re2.h` says:
-/// "An "RE2" object is safe for concurrent use by multiple threads."
-unsafe impl Sync for Re2 {}
-
-/// These bind to the C ABI in `rust_re2.cpp`.
-extern "C" {
- fn rust_re2_create(data: *const u8, len: usize) -> Re2Ptr;
- fn rust_re2_destroy(re2: Re2Ptr);
- fn rust_re2_ok(re2: Re2Ptr) -> bool;
- fn rust_re2_error(
- re2: Re2Ptr,
- outdata: *mut *const u8,
- outlen: *mut usize,
- ) -> bool;
- fn rust_re2_match(
- re2: Re2Ptr,
- data: *const u8,
- len: usize,
- anchor: c_int,
- ) -> bool;
-}
-
-impl Re2 {
- pub fn new(pattern: &[u8]) -> Result<Re2, String> {
- unsafe {
- let re2 = rust_re2_create(pattern.as_ptr(), pattern.len());
- if rust_re2_ok(re2) {
- Ok(Re2(re2))
- } else {
- let mut data: *const u8 = std::ptr::null();
- let mut len: usize = 0;
- rust_re2_error(re2, &mut data, &mut len);
- Err(String::from_utf8_lossy(std::slice::from_raw_parts(
- data, len,
- ))
- .to_string())
- }
- }
- }
-
- pub fn is_match(&self, data: &[u8]) -> bool {
- unsafe { rust_re2_match(self.0, data.as_ptr(), data.len(), 1) }
- }
-}
-
-impl Drop for Re2 {
- fn drop(&mut self) {
- unsafe { rust_re2_destroy(self.0) }
- }
-}
--- a/rust/hg-core/src/re2/rust_re2.cpp Fri May 29 12:12:16 2020 +0200
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,49 +0,0 @@
-/*
-rust_re2.cpp
-
-C ABI export of Re2's C++ interface for Rust FFI.
-
-Copyright 2020 Valentin Gatien-Baron
-
-This software may be used and distributed according to the terms of the
-GNU General Public License version 2 or any later version.
-*/
-
-#include <re2/re2.h>
-using namespace re2;
-
-extern "C" {
- RE2* rust_re2_create(const char* data, size_t len) {
- RE2::Options o;
- o.set_encoding(RE2::Options::Encoding::EncodingLatin1);
- o.set_log_errors(false);
- o.set_max_mem(50000000);
-
- return new RE2(StringPiece(data, len), o);
- }
-
- void rust_re2_destroy(RE2* re) {
- delete re;
- }
-
- bool rust_re2_ok(RE2* re) {
- return re->ok();
- }
-
- void rust_re2_error(RE2* re, const char** outdata, size_t* outlen) {
- const std::string& e = re->error();
- *outdata = e.data();
- *outlen = e.length();
- }
-
- bool rust_re2_match(RE2* re, char* data, size_t len, int ianchor) {
- const StringPiece sp = StringPiece(data, len);
-
- RE2::Anchor anchor =
- ianchor == 0 ? RE2::Anchor::UNANCHORED :
- (ianchor == 1 ? RE2::Anchor::ANCHOR_START :
- RE2::Anchor::ANCHOR_BOTH);
-
- return re->Match(sp, 0, len, anchor, NULL, 0);
- }
-}
--- a/rust/hg-cpython/Cargo.toml Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-cpython/Cargo.toml Fri May 29 12:17:59 2020 +0200
@@ -10,7 +10,6 @@
[features]
default = ["python27"]
-with-re2 = ["hg-core/with-re2"]
# Features to build an extension module:
python27 = ["cpython/python27-sys", "cpython/extension-module-2-7"]
--- a/rust/hg-cpython/src/debug.rs Fri May 29 12:12:16 2020 +0200
+++ b/rust/hg-cpython/src/debug.rs Fri May 29 12:17:59 2020 +0200
@@ -16,8 +16,6 @@
m.add(py, "__package__", package)?;
m.add(py, "__doc__", "Rust debugging information")?;
- m.add(py, "re2_installed", cfg!(feature = "with-re2"))?;
-
let sys = PyModule::import(py, "sys")?;
let sys_modules: PyDict = sys.get(py, "modules")?.extract(py)?;
sys_modules.set_item(py, dotted_name, &m)?;
--- a/tests/test-install.t Fri May 29 12:12:16 2020 +0200
+++ b/tests/test-install.t Fri May 29 12:17:59 2020 +0200
@@ -18,7 +18,6 @@
checking available compression engines (*zlib*) (glob)
checking available compression engines for wire protocol (*zlib*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
- checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
checking templates (*mercurial?templates)... (glob)
checking default template (*mercurial?templates?map-cmdline.default) (glob)
checking commit editor... (*) (glob)
@@ -78,7 +77,6 @@
checking available compression engines (*zlib*) (glob)
checking available compression engines for wire protocol (*zlib*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
- checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
checking templates (*mercurial?templates)... (glob)
checking default template (*mercurial?templates?map-cmdline.default) (glob)
checking commit editor... (*) (glob)
@@ -126,7 +124,6 @@
checking available compression engines (*zlib*) (glob)
checking available compression engines for wire protocol (*zlib*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
- checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
checking templates (*mercurial?templates)... (glob)
checking default template (*mercurial?templates?map-cmdline.default) (glob)
checking commit editor... ($TESTTMP/tools/testeditor.exe)
@@ -154,7 +151,6 @@
checking available compression engines (*zlib*) (glob)
checking available compression engines for wire protocol (*zlib*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
- checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
checking templates (*mercurial?templates)... (glob)
checking default template (*mercurial?templates?map-cmdline.default) (glob)
checking commit editor... (c:\foo\bar\baz.exe) (windows !)
@@ -211,7 +207,6 @@
checking available compression engines (*) (glob)
checking available compression engines for wire protocol (*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
- checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
checking templates ($TESTTMP/installenv/*/site-packages/mercurial/templates)... (glob)
checking default template ($TESTTMP/installenv/*/site-packages/mercurial/templates/map-cmdline.default) (glob)
checking commit editor... (*) (glob)
@@ -252,7 +247,6 @@
checking available compression engines (*) (glob)
checking available compression engines for wire protocol (*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
- checking "re2" regexp engine Rust bindings \((installed|missing)\) (re) (rust !)
checking templates ($TESTTMP/installenv/*/site-packages/mercurial/templates)... (glob)
checking default template ($TESTTMP/installenv/*/site-packages/mercurial/templates/map-cmdline.default) (glob)
checking commit editor... (*) (glob)