diff --git a/.gitignore b/.gitignore index 5dd6bb4..2d3a52b 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,6 @@ *.egg-info .*-cache +.mxdev_cache/ .coverage .coverage.* !.coveragerc diff --git a/CHANGES.md b/CHANGES.md index 2dd0b7b..bedcbd7 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -2,7 +2,10 @@ ## 5.0.2 (unreleased) -- Nothing yet. +- Fix #70: HTTP-referenced requirements/constraints files are now properly cached and respected in offline mode. Previously, offline mode only skipped VCS operations but still fetched HTTP URLs. Now mxdev caches all HTTP content in `.mxdev_cache/` during online mode and reuses it during offline mode, enabling true offline operation. This fixes the inconsistent behavior where `-o/--offline` didn't prevent all network activity. + [jensens] +- Improvement: Enhanced help text for `-n/--no-fetch`, `-f/--fetch-only`, and `-o/--offline` command-line options to better explain their differences and when to use each one. + [jensens] ## 5.0.1 (2025-10-23) diff --git a/README.md b/README.md index c1edecc..e4fa05e 100644 --- a/README.md +++ b/README.md @@ -85,7 +85,7 @@ The **main section** must be called `[settings]`, even if kept empty. | `default-target` | Target directory for VCS checkouts | `./sources` | | `threads` | Number of parallel threads for fetching sources | `4` | | `smart-threading` | Process HTTPS packages serially to avoid overlapping credential prompts (see below) | `True` | -| `offline` | Skip all VCS fetch operations (handy for offline work) | `False` | +| `offline` | Skip all VCS and HTTP fetches; use cached HTTP content from `.mxdev_cache/` (see below) | `False` | | `default-install-mode` | Default `install-mode` for packages: `editable`, `fixed`, or `skip` (see below) | `editable` | | `default-update` | Default update behavior: `yes` or `no` | `yes` | | `default-use` | Default use behavior (when false, sources not checked out) | `True` | @@ -103,6 +103,34 @@ This solves the problem where parallel git operations would cause multiple crede **When to disable**: Set `smart-threading = false` if you have git credential helpers configured (e.g., credential cache, credential store) and never see prompts. +##### Offline Mode and HTTP Caching + +When `offline` mode is enabled (or via `-o/--offline` flag), mxdev operates without any network access: + +1. **HTTP Caching**: HTTP-referenced requirements/constraints files are automatically cached in `.mxdev_cache/` during online mode +2. **Offline Usage**: In offline mode, mxdev reads from the cache instead of fetching from the network +3. **Cache Miss**: If a referenced HTTP file is not in the cache, mxdev will error and prompt you to run in online mode first + +**Example workflow:** +```bash +# First run in online mode to populate cache +mxdev + +# Subsequent runs can be offline (e.g., on airplane, restricted network) +mxdev -o + +# Cache persists across runs, enabling true offline development +``` + +**Cache location**: `.mxdev_cache/` (automatically added to `.gitignore`) + +**When to use offline mode**: +- Working without internet access (airplanes, restricted networks) +- Testing configuration changes without re-fetching +- Faster iterations when VCS sources are already checked out + +**Note**: Offline mode tolerates missing source directories (logs warnings), while non-offline mode treats missing sources as fatal errors. + #### Package Overrides ##### `version-overrides` diff --git a/src/mxdev/main.py b/src/mxdev/main.py index 9d3dd3c..eb591e4 100644 --- a/src/mxdev/main.py +++ b/src/mxdev/main.py @@ -27,9 +27,19 @@ type=str, default="mx.ini", ) -parser.add_argument("-n", "--no-fetch", help="Do not fetch sources", action="store_true") -parser.add_argument("-f", "--fetch-only", help="Only fetch sources", action="store_true") -parser.add_argument("-o", "--offline", help="Do not fetch sources, work offline", action="store_true") +parser.add_argument( + "-n", + "--no-fetch", + help="Skip VCS checkout/update; regenerate files from existing sources (error if missing)", + action="store_true", +) +parser.add_argument("-f", "--fetch-only", help="Only perform VCS operations; skip file generation", action="store_true") +parser.add_argument( + "-o", + "--offline", + help="Work offline; skip VCS and HTTP fetches; use cached files (tolerate missing)", + action="store_true", +) parser.add_argument( "-t", "--threads", diff --git a/src/mxdev/processing.py b/src/mxdev/processing.py index 70509dc..0b72a0d 100644 --- a/src/mxdev/processing.py +++ b/src/mxdev/processing.py @@ -7,16 +7,83 @@ from urllib import request from urllib.error import URLError +import hashlib import os import typing +def _get_cache_key(url: str) -> str: + """Generate a deterministic cache key from a URL. + + Uses SHA256 hash of the URL, truncated to 16 hex characters for readability + while maintaining low collision probability. + + Args: + url: The URL to generate a cache key for + + Returns: + 16-character hex string (cache key) + + """ + hash_obj = hashlib.sha256(url.encode("utf-8")) + return hash_obj.hexdigest()[:16] + + +def _cache_http_content(url: str, content: str, cache_dir: Path) -> None: + """Cache HTTP content to disk. + + Args: + url: The URL being cached + content: The content to cache + cache_dir: Directory to store cache files + + """ + cache_dir.mkdir(parents=True, exist_ok=True) + cache_key = _get_cache_key(url) + + # Write content + cache_file = cache_dir / cache_key + cache_file.write_text(content, encoding="utf-8") + + # Write URL metadata for debugging + url_file = cache_dir / f"{cache_key}.url" + url_file.write_text(url, encoding="utf-8") + + logger.debug(f"Cached {url} to {cache_file}") + + +def _read_from_cache(url: str, cache_dir: Path) -> str | None: + """Read cached HTTP content from disk. + + Args: + url: The URL to look up in cache + cache_dir: Directory containing cache files + + Returns: + Cached content if found, None otherwise + + """ + if not cache_dir.exists(): + return None + + cache_key = _get_cache_key(url) + cache_file = cache_dir / cache_key + + if cache_file.exists(): + logger.debug(f"Cache hit for {url} from {cache_file}") + return cache_file.read_text(encoding="utf-8") + + return None + + def process_line( line: str, package_keys: list[str], override_keys: list[str], ignore_keys: list[str], variety: str, + offline: bool = False, + cache_dir: Path | None = None, ) -> tuple[list[str], list[str]]: """Take line from a constraints or requirements file and process it recursively. @@ -41,6 +108,8 @@ def process_line( override_keys=override_keys, ignore_keys=ignore_keys, variety="c", + offline=offline, + cache_dir=cache_dir, ) elif line.startswith("-r"): return resolve_dependencies( @@ -49,6 +118,8 @@ def process_line( override_keys=override_keys, ignore_keys=ignore_keys, variety="r", + offline=offline, + cache_dir=cache_dir, ) try: parsed = Requirement(line) @@ -78,6 +149,8 @@ def process_io( override_keys: list[str], ignore_keys: list[str], variety: str, + offline: bool = False, + cache_dir: Path | None = None, ) -> None: """Read lines from an open file and trigger processing of each line @@ -85,7 +158,9 @@ def process_io( and constraint lists. """ for line in fio: - new_requirements, new_constraints = process_line(line, package_keys, override_keys, ignore_keys, variety) + new_requirements, new_constraints = process_line( + line, package_keys, override_keys, ignore_keys, variety, offline, cache_dir + ) requirements += new_requirements constraints += new_constraints @@ -96,10 +171,23 @@ def resolve_dependencies( override_keys: list[str], ignore_keys: list[str], variety: str = "r", + offline: bool = False, + cache_dir: Path | None = None, ) -> tuple[list[str], list[str]]: """Takes a file or url, loads it and trigger to recursivly processes its content. - returns tuple of requirements and constraints + Args: + file_or_url: Path to local file or HTTP(S) URL + package_keys: List of package names being developed from source + override_keys: List of package names with version overrides + ignore_keys: List of package names to ignore + variety: "r" for requirements, "c" for constraints + offline: If True, use cached HTTP content and don't make network requests + cache_dir: Directory for caching HTTP content (default: ./.mxdev_cache) + + Returns: + Tuple of (requirements, constraints) as lists of strings + """ requirements: list[str] = [] constraints: list[str] = [] @@ -113,6 +201,10 @@ def resolve_dependencies( # Windows drive letters are single characters, URL schemes are longer is_url = parsed.scheme and len(parsed.scheme) > 1 + # Default cache directory + if cache_dir is None: + cache_dir = Path(".mxdev_cache") + if not is_url: requirements_in_file = Path(file_or_url) if requirements_in_file.exists(): @@ -125,25 +217,51 @@ def resolve_dependencies( override_keys, ignore_keys, variety, + offline, + cache_dir, ) else: logger.info( f"Can not read {variety_verbose} file '{file_or_url}', " "it does not exist. Empty file assumed." ) else: - try: - with request.urlopen(file_or_url) as fio: - process_io( - fio, - requirements, - constraints, - package_keys, - override_keys, - ignore_keys, - variety, + # HTTP(S) URL handling with caching + content: str + if offline: + # Offline mode: try to read from cache + cached_content = _read_from_cache(file_or_url, cache_dir) + if cached_content is None: + raise RuntimeError( + f"Offline mode: HTTP reference '{file_or_url}' not found in cache. " + f"Run mxdev in online mode first to populate the cache at {cache_dir}" ) - except URLError as e: - raise Exception(f"Failed to fetch '{file_or_url}': {e}") + content = cached_content + logger.info(f"Using cached content for {file_or_url}") + else: + # Online mode: fetch from HTTP and cache it + try: + with request.urlopen(file_or_url) as fio: + content = fio.read().decode("utf-8") + # Cache the content for future offline use + _cache_http_content(file_or_url, content, cache_dir) + except URLError as e: + raise Exception(f"Failed to fetch '{file_or_url}': {e}") + + # Process the content (either from cache or fresh from HTTP) + from io import StringIO + + with StringIO(content) as fio: + process_io( + fio, + requirements, + constraints, + package_keys, + override_keys, + ignore_keys, + variety, + offline, + cache_dir, + ) if requirements and variety == "r": requirements = ( @@ -172,12 +290,16 @@ def read(state: State) -> None: The result is stored on the state object """ + from .config import to_bool + cfg = state.configuration + offline = to_bool(cfg.settings.get("offline", False)) state.requirements, state.constraints = resolve_dependencies( file_or_url=cfg.infile, package_keys=cfg.package_keys, override_keys=cfg.override_keys, ignore_keys=cfg.ignore_keys, + offline=offline, ) diff --git a/tests/test_processing.py b/tests/test_processing.py index e39c70e..3b7916a 100644 --- a/tests/test_processing.py +++ b/tests/test_processing.py @@ -824,3 +824,188 @@ def test_write_dev_sources_missing_directories_offline_mode(tmp_path, caplog): assert "Source directory does not exist" in caplog.text assert "This is expected in offline mode" in caplog.text assert "Run mxdev without -n and --offline flags" in caplog.text + + +def test_http_cache_online_mode(tmp_path): + """Test HTTP URLs are cached in online mode.""" + from mxdev.processing import resolve_dependencies + + import httpretty + + cache_dir = tmp_path / ".mxdev-cache" + + # Mock HTTP response + httpretty.enable() + try: + httpretty.register_uri( + httpretty.GET, + "http://example.com/requirements.txt", + body="requests>=2.28.0\nurllib3==1.26.9\n", + ) + + requirements, constraints = resolve_dependencies( + "http://example.com/requirements.txt", + package_keys=[], + override_keys=[], + ignore_keys=[], + variety="r", + offline=False, + cache_dir=cache_dir, + ) + + # Should have requirements + assert any("requests" in line for line in requirements) + assert any("urllib3" in line for line in requirements) + + # Cache directory should be created + assert cache_dir.exists() + + # Cache file should exist (check for any file in cache) + cache_files = list(cache_dir.glob("*")) + cache_content_files = [f for f in cache_files if not f.suffix] + assert len(cache_content_files) >= 1, "Expected at least one cache file" + + # Read cache file and verify content + cache_file = cache_content_files[0] + cached_content = cache_file.read_text() + assert "requests>=2.28.0" in cached_content + assert "urllib3==1.26.9" in cached_content + + # Check .url metadata file exists + url_files = list(cache_dir.glob("*.url")) + assert len(url_files) >= 1, "Expected at least one .url metadata file" + + finally: + httpretty.disable() + httpretty.reset() + + +def test_http_cache_offline_mode_hit(tmp_path): + """Test HTTP URLs are read from cache in offline mode (cache hit).""" + from mxdev.processing import _get_cache_key + from mxdev.processing import resolve_dependencies + + cache_dir = tmp_path / ".mxdev-cache" + cache_dir.mkdir() + + url = "http://example.com/requirements.txt" + cache_key = _get_cache_key(url) + + # Pre-populate cache + cache_file = cache_dir / cache_key + cache_file.write_text("cached-package>=1.0.0\n") + + # Also write .url metadata + url_file = cache_dir / f"{cache_key}.url" + url_file.write_text(url) + + # Don't enable httpretty - we shouldn't make any HTTP requests + requirements, constraints = resolve_dependencies( + url, + package_keys=[], + override_keys=[], + ignore_keys=[], + variety="r", + offline=True, + cache_dir=cache_dir, + ) + + # Should use cached content + assert any("cached-package" in line for line in requirements) + + # Verify no HTTP request was made (httpretty would fail if one was attempted) + + +def test_http_cache_offline_mode_miss(tmp_path): + """Test HTTP URLs error in offline mode when not cached (cache miss).""" + from mxdev.processing import resolve_dependencies + + cache_dir = tmp_path / ".mxdev-cache" + cache_dir.mkdir() + + # Cache is empty, should raise error + with pytest.raises(RuntimeError) as exc_info: + resolve_dependencies( + "http://example.com/requirements.txt", + package_keys=[], + override_keys=[], + ignore_keys=[], + variety="r", + offline=True, + cache_dir=cache_dir, + ) + + error_msg = str(exc_info.value) + assert "offline mode" in error_msg.lower() + assert "not found in cache" in error_msg.lower() + assert "http://example.com/requirements.txt" in error_msg + + +def test_cache_key_generation(): + """Test cache key generation is deterministic and collision-resistant.""" + from mxdev.processing import _get_cache_key + + # Same URL should produce same cache key + url1 = "http://example.com/requirements.txt" + key1 = _get_cache_key(url1) + key2 = _get_cache_key(url1) + assert key1 == key2 + + # Different URLs should produce different cache keys + url2 = "http://example.com/constraints.txt" + key3 = _get_cache_key(url2) + assert key1 != key3 + + # Cache key should be reasonable length (16 hex chars) + assert len(key1) == 16 + assert all(c in "0123456789abcdef" for c in key1) + + +def test_http_cache_revalidates_in_online_mode(tmp_path): + """Test HTTP cache is updated in online mode (not just read).""" + from mxdev.processing import _get_cache_key + from mxdev.processing import resolve_dependencies + + import httpretty + + cache_dir = tmp_path / ".mxdev-cache" + cache_dir.mkdir() + + url = "http://example.com/requirements.txt" + cache_key = _get_cache_key(url) + + # Pre-populate cache with old content + cache_file = cache_dir / cache_key + cache_file.write_text("old-package==1.0.0\n") + + # Mock HTTP response with new content + httpretty.enable() + try: + httpretty.register_uri( + httpretty.GET, + url, + body="new-package>=2.0.0\n", + ) + + requirements, constraints = resolve_dependencies( + url, + package_keys=[], + override_keys=[], + ignore_keys=[], + variety="r", + offline=False, + cache_dir=cache_dir, + ) + + # Should use NEW content from HTTP, not old cache + assert any("new-package" in line for line in requirements) + assert not any("old-package" in line for line in requirements) + + # Cache should be updated + cached_content = cache_file.read_text() + assert "new-package>=2.0.0" in cached_content + assert "old-package" not in cached_content + + finally: + httpretty.disable() + httpretty.reset()