Skip to content

Conversation

@UladzislauK-Writer
Copy link
Collaborator

@UladzislauK-Writer UladzislauK-Writer commented Dec 22, 2025

Summary by CodeRabbit

  • New Features

    • Enhanced project file monitoring: computes per-file hashes, derives a combined project hash, and logs when the project hash changes (active in debug logging).
  • Tests

    • Added comprehensive tests for initial hashing, order-independence, and updates on file create/modify/move/delete events, including handling of missing files.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 22, 2025

Walkthrough

Adds ProjectHashLogHandler to compute and maintain per-file MD5s and a combined project hash, update hashes on filesystem events (create/modify/move/delete), handle missing files/errors, and register the watcher in AppRunner’s observer when DEBUG logging is enabled.

Changes

Cohort / File(s) Summary
ProjectHashLogHandler Implementation
src/writer/app_runner.py
Adds ProjectHashLogHandler class; initializes per-file MD5s for files under app_path, updates hashes on on_modified, on_created, on_moved, on_deleted, provides _calculate_project_hash(), _process_file(), _log_hash(), handles FileNotFoundError and logs hashing errors; integrates watcher into AppRunner observer in DEBUG mode; adds hashlib, pathlib, and typing updates.
ProjectHashLogHandler Tests
tests/backend/test_app_runner.py
New test suite verifying initial hashing, order-independence of project hash, event-driven updates (modify/create/delete/move), and missing-file handling; introduces fixtures (wf_project_context, sample_app) and uses watchdog.events to simulate filesystem events.

Sequence Diagram(s)

sequenceDiagram
    participant FS as Filesystem
    participant OBS as Watchdog Observer
    participant PHH as ProjectHashLogHandler
    participant CTX as WfProjectContext
    participant LOG as Logger

    Note over OBS,PHH: Observer registered in AppRunner when DEBUG
    FS->>OBS: file event (created/modified/moved/deleted)
    OBS->>PHH: dispatch event -> on_created/modified/moved/deleted

    alt file exists
        PHH->>PHH: _process_file(file_path)
        PHH->>FS: read file bytes
        PHH->>PHH: compute MD5 for file
        PHH->>CTX: update file_hashes[file_path]
    else file missing
        PHH->>LOG: log FileNotFoundError (handled)
    end

    PHH->>PHH: _calculate_project_hash() (combine per-file hashes)
    alt project_hash changed
        PHH->>LOG: log project hash change
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.39% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: log project files hash on save' directly matches the main implementation: a ProjectHashLogHandler class that logs project file hashes on file events during save operations.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch vlad/log-file-hash

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f9b509f and fe74af5.

📒 Files selected for processing (2)
  • src/writer/app_runner.py
  • tests/backend/test_app_runner.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/backend/test_app_runner.py (1)
src/writer/app_runner.py (7)
  • ProjectHashLogHandler (696-764)
  • _calculate_project_hash (716-721)
  • on_modified (742-744)
  • on_created (746-751)
  • on_deleted (761-764)
  • on_moved (753-759)
  • _process_file (723-734)
src/writer/app_runner.py (2)
tests/backend/test_app_runner.py (1)
  • wf_project_context (355-356)
src/writer/wf_project.py (1)
  • WfProjectContext (32-37)
🪛 Ruff (0.14.10)
tests/backend/test_app_runner.py

382-382: Probable use of insecure hash functions in hashlib: md5

(S324)

src/writer/app_runner.py

704-704: Probable use of insecure hash functions in hashlib: md5

(S324)


710-710: Do not catch blind exception: Exception

(BLE001)


717-717: Probable use of insecure hash functions in hashlib: md5

(S324)


725-725: Probable use of insecure hash functions in hashlib: md5

(S324)


733-733: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (1)
src/writer/app_runner.py (1)

746-759: Verify necessity of recursive directory event handling before suggesting changes.

The code includes recursive handling for directory creation and move events. However, your earlier observation that directory deletion causes individual file events may not apply to creation/move operations, which are fundamentally different filesystem operations. Watchdog provides explicit generate_sub_created_events and generate_sub_moved_events APIs, suggesting they serve a specific purpose. Before treating this as redundant, confirm whether watchdog fires individual file events for created/moved directories, or whether recursive handling is necessary for proper file tracking.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pullrequest
Copy link

pullrequest bot commented Dec 22, 2025

HackerOne Code Security Review

🟢 Scan Complete: 2 Issue(s)
🟢 Validation Complete: Any Issues detected were validated by one of our engineers. None were determined to require immediate action.

Here's how the code changes were interpreted and info about the tools used for scanning.

📖 Summary of Changes The changes involve adding a project hash calculation in the write_files function of the wf_project.py file. This new functionality generates a unique identifier for the project by computing a hash based on the sorted file hashes, potentially improving project state tracking and identification.
File Summary
src/writer/wf_project.py Added a project hash calculation in the write_files function, which computes a hash based on the sorted file hashes to provide a unique identifier for the project's current state.
ℹ️ Issues Detected

NOTE: These may not require action!

Below are unvalidated results from the Analysis Tools that ran during the latest scan for transparency. We investigate each of these for accuracy and relevance before surfacing them as a potential problem.

How will I know if something is a problem?
When validation completes, any concerns that warrant attention prior to merge will be posted as inline comments. These will show up in 2 ways:

  • Expert review (most cases): Issues will be posted by experts who manually reviewed and validated them. These are real HackerOne engineers (not bots) reviewing through an integrated IDE-like tool. You can communicate with them like any other reviewer. They'll stay assigned and get notified with commit & comment updates.
  • Automatically: In cases where our validation checks have highest confidence the problem is legitimate and urgent. These will include a description of contextual reasoning why & actionable next steps.
File & Line Issue
src/writer/wf_project.py Line 85 The code introduces a project hash calculation that uses MD5, which is cryptographically broken. While this appears to be used for change detection rather than security purposes, MD5 is vulnerable to collision attacks. If this hash is used for any security-related verification, it could be exploited.
src/writer/wf_project.py Line 85 Use of weak MD5 hash for security. Consider usedforsecurity=False
🧰 Analysis tools

⏱️ Latest scan covered changes up to commit e9684b2 (latest)

@pullrequest
Copy link

pullrequest bot commented Dec 22, 2025

✅ Graham C reviewed all the included code changes and associated automation findings and determined that there were no immediately actionable security flaws. Note that they will continue to be notified of any new commits or comments and follow up as needed throughout the duration of this pull request's lifecycle.

Image of Graham C Graham C


Reviewed with ❤️ by PullRequest

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/writer/wf_project.py (1)

85-85: Optional: Consider SHA-256 for the project hash.

The static analysis tool flags MD5 as insecure. While MD5 is acceptable for checksumming (non-cryptographic use), some teams prefer to avoid it entirely. If preferred, consider using SHA-256 instead.

🔎 Optional refactor to use SHA-256
-    project_hash = hashlib.md5()
+    project_hash = hashlib.sha256()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5ee36de and e9684b2.

📒 Files selected for processing (1)
  • src/writer/wf_project.py
🧰 Additional context used
🪛 Ruff (0.14.8)
src/writer/wf_project.py

85-85: Probable use of insecure hash functions in hashlib: md5

(S324)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: tests (chromium)
  • GitHub Check: tests (firefox)
  • GitHub Check: tests (webkit)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.13)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.9)

Comment on lines +758 to +764
def on_deleted(self, event: Union[watchdog.events.DirDeletedEvent, watchdog.events.FileDeletedEvent]):
if not event.is_directory:
self.wf_project_context.file_hashes.pop(event.src_path, None)
self._log_hash()
Copy link
Collaborator Author

@UladzislauK-Writer UladzislauK-Writer Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, deleting a directory (at least from UI) causes everything inside it to fire an event. So we don't need to handle directory events recursively

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/writer/app_runner.py (1)

701-708: Missing exception handling during initial file scan.

The constructor opens files without exception handling. If a file is deleted after rglob() finds it, or if there are permission errors, this will raise an unhandled exception. Consider wrapping in try/except similar to _process_file.

🔎 Proposed fix
         for file in pathlib.Path(app_path).rglob("*"):
             if file.is_dir():
                 continue
-            file_hash = hashlib.md5()
-            with open(file, 'rb') as f:
-                while chunk := f.read(8192):
-                    file_hash.update(chunk)
-            self.wf_project_context.file_hashes[str(file.absolute())] = file_hash.hexdigest()
+            try:
+                file_hash = hashlib.md5()
+                with open(file, 'rb') as f:
+                    while chunk := f.read(8192):
+                        file_hash.update(chunk)
+                self.wf_project_context.file_hashes[str(file.absolute())] = file_hash.hexdigest()
+            except FileNotFoundError:
+                continue
+            except Exception as e:
+                logging.warning(f"Failed to hash {file}: {e}")
tests/backend/test_app_runner.py (2)

420-433: Path handling may mask the consistency issue in implementation.

The tests use str(file_path) (line 433) where file_path is already absolute from tmp_path. Since pytest's tmp_path provides absolute paths, and watchdog's FileModifiedEvent would receive the same absolute path in tests, this might accidentally pass even if the implementation has path consistency issues with relative paths.

Consider adding a test that verifies path normalization explicitly, or document that the tests assume absolute paths.


379-500: Good test coverage for the core functionality.

The test suite covers all file event types, initialization, order independence, and error handling.

Minor gap: Directory events (on_created and on_moved with is_directory=True) trigger recursive sub-event generation but aren't directly tested. Consider adding tests for directory operations if this becomes important.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e9684b2 and 0c99fd6.

📒 Files selected for processing (2)
  • src/writer/app_runner.py
  • tests/backend/test_app_runner.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/backend/test_app_runner.py (1)
src/writer/app_runner.py (8)
  • AppRunner (836-1435)
  • ProjectHashLogHandler (696-761)
  • _calculate_project_hash (713-718)
  • on_modified (739-741)
  • on_created (743-748)
  • on_deleted (758-761)
  • on_moved (750-756)
  • _process_file (720-731)
src/writer/app_runner.py (2)
tests/backend/test_app_runner.py (1)
  • wf_project_context (355-356)
src/writer/wf_project.py (1)
  • WfProjectContext (32-37)
🪛 Ruff (0.14.10)
tests/backend/test_app_runner.py

382-382: Probable use of insecure hash functions in hashlib: md5

(S324)

src/writer/app_runner.py

704-704: Probable use of insecure hash functions in hashlib: md5

(S324)


714-714: Probable use of insecure hash functions in hashlib: md5

(S324)


722-722: Probable use of insecure hash functions in hashlib: md5

(S324)


729-729: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: tests (chromium)
  • GitHub Check: build (3.13)
  • GitHub Check: tests (firefox)
  • GitHub Check: tests (webkit)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
🔇 Additional comments (2)
src/writer/app_runner.py (1)

896-901: LGTM!

Good design decision to only enable the hash handler in DEBUG mode to avoid performance overhead in production. The check logging.getLogger().isEnabledFor(logging.DEBUG) appropriately gates this feature.

tests/backend/test_app_runner.py (1)

354-376: LGTM!

The fixtures are well-designed. Using SimpleNamespace for the project context mock and tmp_path for the sample app structure provides clean isolation for tests.

@UladzislauK-Writer UladzislauK-Writer force-pushed the vlad/log-file-hash branch 2 times, most recently from 5dbbca3 to f9b509f Compare December 23, 2025 11:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/writer/app_runner.py (1)

739-761: Normalize paths consistently to avoid lookup failures.

Line 708 stores file paths as str(file.absolute()), but event handlers use event.src_path and event.dest_path directly. Watchdog may provide these paths as relative or in different absolute formats depending on the platform, causing dictionary lookups and removals to fail.

🔎 Proposed fix
+    def _normalize_path(self, path: str) -> str:
+        """Normalize path to absolute string format matching initialization."""
+        return str(pathlib.Path(path).absolute())
+
     def on_modified(self, event: Union[watchdog.events.DirModifiedEvent, watchdog.events.FileModifiedEvent]):
         if not event.is_directory:
-            self._process_file(event.src_path)
+            self._process_file(self._normalize_path(event.src_path))

     def on_created(self, event: Union[watchdog.events.DirCreatedEvent, watchdog.events.FileCreatedEvent]):
         if not event.is_directory:
-            self._process_file(event.src_path)
+            self._process_file(self._normalize_path(event.src_path))
         else:
             for sub_event in watchdog.events.generate_sub_created_events(event.src_path):
                 self.on_created(sub_event)

     def on_moved(self, event: Union[watchdog.events.DirMovedEvent, watchdog.events.FileMovedEvent]):
         if not event.is_directory:
-            self.wf_project_context.file_hashes.pop(event.src_path, None)
-            self._process_file(event.dest_path)
+            self.wf_project_context.file_hashes.pop(self._normalize_path(event.src_path), None)
+            self._process_file(self._normalize_path(event.dest_path))
         else:
             for sub_event in watchdog.events.generate_sub_moved_events(event.src_path, event.dest_path):
                 self.on_moved(sub_event)

     def on_deleted(self, event: Union[watchdog.events.DirDeletedEvent, watchdog.events.FileDeletedEvent]):
         if not event.is_directory:
-            self.wf_project_context.file_hashes.pop(event.src_path, None)
+            self.wf_project_context.file_hashes.pop(self._normalize_path(event.src_path), None)
             self._log_hash()
🧹 Nitpick comments (2)
tests/backend/test_app_runner.py (1)

354-356: Consider using the actual WfProjectContext class.

The fixture uses SimpleNamespace instead of the real WfProjectContext dataclass. While this works for the current tests, it could miss issues if ProjectHashLogHandler later accesses other WfProjectContext fields (like app_path, write_files_async_queue, etc.).

🔎 Alternative using the real class
+from writer.wf_project import WfProjectContext
+
 @pytest.fixture
 def wf_project_context():
-    return SimpleNamespace(file_hashes={})
+    return WfProjectContext(app_path="")
src/writer/app_runner.py (1)

697-711: Consider error handling during initialization.

The initialization hashes all files without error handling. If any file cannot be read (permissions, I/O error), the handler initialization will fail and crash the observer. Consider wrapping the file reading in a try-except block similar to _process_file.

🔎 Proposed fix
     def __init__(self, app_path: str, wf_project_context: WfProjectContext, patterns: List[str]):
         super().__init__(patterns=patterns)
         self.wf_project_context = wf_project_context

         for file in pathlib.Path(app_path).rglob("*"):
             if file.is_dir():
                 continue
-            file_hash = hashlib.md5()
-            with open(file, 'rb') as f:
-                while chunk := f.read(8192):
-                    file_hash.update(chunk)
-            self.wf_project_context.file_hashes[str(file.absolute())] = file_hash.hexdigest()
+            try:
+                file_hash = hashlib.md5()
+                with open(file, 'rb') as f:
+                    while chunk := f.read(8192):
+                        file_hash.update(chunk)
+                self.wf_project_context.file_hashes[str(file.absolute())] = file_hash.hexdigest()
+            except Exception as e:
+                logging.warning(f"Failed to hash {file} during initialization: {e}")
         
         self.project_hash = ""
         self._log_hash()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0c99fd6 and f9b509f.

📒 Files selected for processing (2)
  • src/writer/app_runner.py
  • tests/backend/test_app_runner.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/writer/app_runner.py (2)
tests/backend/test_app_runner.py (1)
  • wf_project_context (355-356)
src/writer/wf_project.py (1)
  • WfProjectContext (32-37)
tests/backend/test_app_runner.py (1)
src/writer/app_runner.py (7)
  • ProjectHashLogHandler (696-761)
  • _calculate_project_hash (713-718)
  • on_modified (739-741)
  • on_created (743-748)
  • on_deleted (758-761)
  • on_moved (750-756)
  • _process_file (720-731)
🪛 Ruff (0.14.10)
src/writer/app_runner.py

704-704: Probable use of insecure hash functions in hashlib: md5

(S324)


714-714: Probable use of insecure hash functions in hashlib: md5

(S324)


722-722: Probable use of insecure hash functions in hashlib: md5

(S324)


730-730: Do not catch blind exception: Exception

(BLE001)

tests/backend/test_app_runner.py

382-382: Probable use of insecure hash functions in hashlib: md5

(S324)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.13)
  • GitHub Check: tests (chromium)
  • GitHub Check: tests (firefox)
  • GitHub Check: tests (webkit)
🔇 Additional comments (15)
tests/backend/test_app_runner.py (10)

2-8: LGTM! Imports are appropriate.

The imports support the new test functionality for ProjectHashLogHandler.


360-376: LGTM! Clean fixture design.

The sample_app fixture creates a clear, minimal directory structure for testing.


380-384: LGTM! Helper method is clear and simple.

The static MD5 helper is appropriate for test code, matching the implementation's use of MD5 for non-cryptographic file integrity checking.


386-399: LGTM! Comprehensive initial hashing test.

The test verifies that both per-file hashes and the project hash are correctly computed during initialization.


402-417: LGTM! Good test for order independence.

This test ensures the combined project hash is deterministic regardless of insertion order in the dictionary.


420-433: LGTM! File modification event handling is tested.

The test verifies that file hash updates correctly when a file is modified.


436-450: LGTM! File creation event handling is tested.

The test verifies that new files are added to the hash map with correct hashes.


453-465: LGTM! File deletion event handling is tested.

The test verifies that deleted files are removed from the hash map.


468-488: LGTM! File move event handling is tested.

The test verifies that moved files update the hash map correctly (removing old path, adding new path).


491-501: LGTM! Missing file handling is tested.

The test verifies graceful handling of FileNotFoundError.

src/writer/app_runner.py (5)

3-3: LGTM! Necessary imports for the new functionality.

The imports support MD5 hashing, path operations, and type hints in ProjectHashLogHandler.

Also applies to: 12-12, 21-21


713-718: LGTM! Order-independent project hash calculation.

The sorted iteration ensures deterministic project hash computation regardless of dictionary order.


720-732: LGTM! Proper error handling for file processing.

The method correctly handles both FileNotFoundError (returns silently) and other exceptions (logs warning). The _log_hash() call on line 727 is inside the try block, so it only executes on successful hashing.

Note: The past review comment about _log_hash() being called after exceptions is not applicable to the current code.


733-737: LGTM! Efficient logging with change detection.

Only logs when the project hash actually changes, reducing log noise.


896-901: LGTM! Appropriate DEBUG-only integration.

The handler is only scheduled when DEBUG logging is enabled, which is appropriate for a hash logging feature. Using patterns=["*"] ensures all files are tracked for the project hash.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants