-
Notifications
You must be signed in to change notification settings - Fork 97
feat: log project files hash on save #1248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
WalkthroughAdds ProjectHashLogHandler to compute and maintain per-file MD5s and a combined project hash, update hashes on filesystem events (create/modify/move/delete), handle missing files/errors, and register the watcher in AppRunner’s observer when DEBUG logging is enabled. Changes
Sequence Diagram(s)sequenceDiagram
participant FS as Filesystem
participant OBS as Watchdog Observer
participant PHH as ProjectHashLogHandler
participant CTX as WfProjectContext
participant LOG as Logger
Note over OBS,PHH: Observer registered in AppRunner when DEBUG
FS->>OBS: file event (created/modified/moved/deleted)
OBS->>PHH: dispatch event -> on_created/modified/moved/deleted
alt file exists
PHH->>PHH: _process_file(file_path)
PHH->>FS: read file bytes
PHH->>PHH: compute MD5 for file
PHH->>CTX: update file_hashes[file_path]
else file missing
PHH->>LOG: log FileNotFoundError (handled)
end
PHH->>PHH: _calculate_project_hash() (combine per-file hashes)
alt project_hash changed
PHH->>LOG: log project hash change
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
🧰 Additional context used🧬 Code graph analysis (2)tests/backend/test_app_runner.py (1)
src/writer/app_runner.py (2)
🪛 Ruff (0.14.10)tests/backend/test_app_runner.py382-382: Probable use of insecure hash functions in (S324) src/writer/app_runner.py704-704: Probable use of insecure hash functions in (S324) 710-710: Do not catch blind exception: (BLE001) 717-717: Probable use of insecure hash functions in (S324) 725-725: Probable use of insecure hash functions in (S324) 733-733: Do not catch blind exception: (BLE001) 🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
HackerOne Code Security Review🟢 Scan Complete: 2 Issue(s) Here's how the code changes were interpreted and info about the tools used for scanning. 📖 Summary of ChangesThe changes involve adding a project hash calculation in the write_files function of the wf_project.py file. This new functionality generates a unique identifier for the project by computing a hash based on the sorted file hashes, potentially improving project state tracking and identification.
ℹ️ Issues DetectedNOTE: These may not require action! Below are unvalidated results from the Analysis Tools that ran during the latest scan for transparency. We investigate each of these for accuracy and relevance before surfacing them as a potential problem. How will I know if something is a problem?
🧰 Analysis tools
⏱️ Latest scan covered changes up to commit e9684b2 (latest) |
|
✅ Graham C reviewed all the included code changes and associated automation findings and determined that there were no immediately actionable security flaws. Note that they will continue to be notified of any new commits or comments and follow up as needed throughout the duration of this pull request's lifecycle. Reviewed with ❤️ by PullRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/writer/wf_project.py (1)
85-85: Optional: Consider SHA-256 for the project hash.The static analysis tool flags MD5 as insecure. While MD5 is acceptable for checksumming (non-cryptographic use), some teams prefer to avoid it entirely. If preferred, consider using SHA-256 instead.
🔎 Optional refactor to use SHA-256
- project_hash = hashlib.md5() + project_hash = hashlib.sha256()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/writer/wf_project.py
🧰 Additional context used
🪛 Ruff (0.14.8)
src/writer/wf_project.py
85-85: Probable use of insecure hash functions in hashlib: md5
(S324)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: tests (chromium)
- GitHub Check: tests (firefox)
- GitHub Check: tests (webkit)
- GitHub Check: build (3.11)
- GitHub Check: build (3.10)
- GitHub Check: build (3.13)
- GitHub Check: build (3.12)
- GitHub Check: build (3.9)
e9684b2 to
0c99fd6
Compare
| def on_deleted(self, event: Union[watchdog.events.DirDeletedEvent, watchdog.events.FileDeletedEvent]): | ||
| if not event.is_directory: | ||
| self.wf_project_context.file_hashes.pop(event.src_path, None) | ||
| self._log_hash() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, deleting a directory (at least from UI) causes everything inside it to fire an event. So we don't need to handle directory events recursively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/writer/app_runner.py (1)
701-708: Missing exception handling during initial file scan.The constructor opens files without exception handling. If a file is deleted after
rglob()finds it, or if there are permission errors, this will raise an unhandled exception. Consider wrapping in try/except similar to_process_file.🔎 Proposed fix
for file in pathlib.Path(app_path).rglob("*"): if file.is_dir(): continue - file_hash = hashlib.md5() - with open(file, 'rb') as f: - while chunk := f.read(8192): - file_hash.update(chunk) - self.wf_project_context.file_hashes[str(file.absolute())] = file_hash.hexdigest() + try: + file_hash = hashlib.md5() + with open(file, 'rb') as f: + while chunk := f.read(8192): + file_hash.update(chunk) + self.wf_project_context.file_hashes[str(file.absolute())] = file_hash.hexdigest() + except FileNotFoundError: + continue + except Exception as e: + logging.warning(f"Failed to hash {file}: {e}")tests/backend/test_app_runner.py (2)
420-433: Path handling may mask the consistency issue in implementation.The tests use
str(file_path)(line 433) wherefile_pathis already absolute fromtmp_path. Since pytest'stmp_pathprovides absolute paths, and watchdog'sFileModifiedEventwould receive the same absolute path in tests, this might accidentally pass even if the implementation has path consistency issues with relative paths.Consider adding a test that verifies path normalization explicitly, or document that the tests assume absolute paths.
379-500: Good test coverage for the core functionality.The test suite covers all file event types, initialization, order independence, and error handling.
Minor gap: Directory events (
on_createdandon_movedwithis_directory=True) trigger recursive sub-event generation but aren't directly tested. Consider adding tests for directory operations if this becomes important.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/writer/app_runner.pytests/backend/test_app_runner.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/backend/test_app_runner.py (1)
src/writer/app_runner.py (8)
AppRunner(836-1435)ProjectHashLogHandler(696-761)_calculate_project_hash(713-718)on_modified(739-741)on_created(743-748)on_deleted(758-761)on_moved(750-756)_process_file(720-731)
src/writer/app_runner.py (2)
tests/backend/test_app_runner.py (1)
wf_project_context(355-356)src/writer/wf_project.py (1)
WfProjectContext(32-37)
🪛 Ruff (0.14.10)
tests/backend/test_app_runner.py
382-382: Probable use of insecure hash functions in hashlib: md5
(S324)
src/writer/app_runner.py
704-704: Probable use of insecure hash functions in hashlib: md5
(S324)
714-714: Probable use of insecure hash functions in hashlib: md5
(S324)
722-722: Probable use of insecure hash functions in hashlib: md5
(S324)
729-729: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: tests (chromium)
- GitHub Check: build (3.13)
- GitHub Check: tests (firefox)
- GitHub Check: tests (webkit)
- GitHub Check: build (3.12)
- GitHub Check: build (3.9)
- GitHub Check: build (3.11)
- GitHub Check: build (3.10)
🔇 Additional comments (2)
src/writer/app_runner.py (1)
896-901: LGTM!Good design decision to only enable the hash handler in DEBUG mode to avoid performance overhead in production. The check
logging.getLogger().isEnabledFor(logging.DEBUG)appropriately gates this feature.tests/backend/test_app_runner.py (1)
354-376: LGTM!The fixtures are well-designed. Using
SimpleNamespacefor the project context mock andtmp_pathfor the sample app structure provides clean isolation for tests.
5dbbca3 to
f9b509f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/writer/app_runner.py (1)
739-761: Normalize paths consistently to avoid lookup failures.Line 708 stores file paths as
str(file.absolute()), but event handlers useevent.src_pathandevent.dest_pathdirectly. Watchdog may provide these paths as relative or in different absolute formats depending on the platform, causing dictionary lookups and removals to fail.🔎 Proposed fix
+ def _normalize_path(self, path: str) -> str: + """Normalize path to absolute string format matching initialization.""" + return str(pathlib.Path(path).absolute()) + def on_modified(self, event: Union[watchdog.events.DirModifiedEvent, watchdog.events.FileModifiedEvent]): if not event.is_directory: - self._process_file(event.src_path) + self._process_file(self._normalize_path(event.src_path)) def on_created(self, event: Union[watchdog.events.DirCreatedEvent, watchdog.events.FileCreatedEvent]): if not event.is_directory: - self._process_file(event.src_path) + self._process_file(self._normalize_path(event.src_path)) else: for sub_event in watchdog.events.generate_sub_created_events(event.src_path): self.on_created(sub_event) def on_moved(self, event: Union[watchdog.events.DirMovedEvent, watchdog.events.FileMovedEvent]): if not event.is_directory: - self.wf_project_context.file_hashes.pop(event.src_path, None) - self._process_file(event.dest_path) + self.wf_project_context.file_hashes.pop(self._normalize_path(event.src_path), None) + self._process_file(self._normalize_path(event.dest_path)) else: for sub_event in watchdog.events.generate_sub_moved_events(event.src_path, event.dest_path): self.on_moved(sub_event) def on_deleted(self, event: Union[watchdog.events.DirDeletedEvent, watchdog.events.FileDeletedEvent]): if not event.is_directory: - self.wf_project_context.file_hashes.pop(event.src_path, None) + self.wf_project_context.file_hashes.pop(self._normalize_path(event.src_path), None) self._log_hash()
🧹 Nitpick comments (2)
tests/backend/test_app_runner.py (1)
354-356: Consider using the actual WfProjectContext class.The fixture uses
SimpleNamespaceinstead of the realWfProjectContextdataclass. While this works for the current tests, it could miss issues ifProjectHashLogHandlerlater accesses otherWfProjectContextfields (likeapp_path,write_files_async_queue, etc.).🔎 Alternative using the real class
+from writer.wf_project import WfProjectContext + @pytest.fixture def wf_project_context(): - return SimpleNamespace(file_hashes={}) + return WfProjectContext(app_path="")src/writer/app_runner.py (1)
697-711: Consider error handling during initialization.The initialization hashes all files without error handling. If any file cannot be read (permissions, I/O error), the handler initialization will fail and crash the observer. Consider wrapping the file reading in a try-except block similar to
_process_file.🔎 Proposed fix
def __init__(self, app_path: str, wf_project_context: WfProjectContext, patterns: List[str]): super().__init__(patterns=patterns) self.wf_project_context = wf_project_context for file in pathlib.Path(app_path).rglob("*"): if file.is_dir(): continue - file_hash = hashlib.md5() - with open(file, 'rb') as f: - while chunk := f.read(8192): - file_hash.update(chunk) - self.wf_project_context.file_hashes[str(file.absolute())] = file_hash.hexdigest() + try: + file_hash = hashlib.md5() + with open(file, 'rb') as f: + while chunk := f.read(8192): + file_hash.update(chunk) + self.wf_project_context.file_hashes[str(file.absolute())] = file_hash.hexdigest() + except Exception as e: + logging.warning(f"Failed to hash {file} during initialization: {e}") self.project_hash = "" self._log_hash()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/writer/app_runner.pytests/backend/test_app_runner.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/writer/app_runner.py (2)
tests/backend/test_app_runner.py (1)
wf_project_context(355-356)src/writer/wf_project.py (1)
WfProjectContext(32-37)
tests/backend/test_app_runner.py (1)
src/writer/app_runner.py (7)
ProjectHashLogHandler(696-761)_calculate_project_hash(713-718)on_modified(739-741)on_created(743-748)on_deleted(758-761)on_moved(750-756)_process_file(720-731)
🪛 Ruff (0.14.10)
src/writer/app_runner.py
704-704: Probable use of insecure hash functions in hashlib: md5
(S324)
714-714: Probable use of insecure hash functions in hashlib: md5
(S324)
722-722: Probable use of insecure hash functions in hashlib: md5
(S324)
730-730: Do not catch blind exception: Exception
(BLE001)
tests/backend/test_app_runner.py
382-382: Probable use of insecure hash functions in hashlib: md5
(S324)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build (3.11)
- GitHub Check: build (3.12)
- GitHub Check: build (3.10)
- GitHub Check: build (3.9)
- GitHub Check: build (3.13)
- GitHub Check: tests (chromium)
- GitHub Check: tests (firefox)
- GitHub Check: tests (webkit)
🔇 Additional comments (15)
tests/backend/test_app_runner.py (10)
2-8: LGTM! Imports are appropriate.The imports support the new test functionality for ProjectHashLogHandler.
360-376: LGTM! Clean fixture design.The
sample_appfixture creates a clear, minimal directory structure for testing.
380-384: LGTM! Helper method is clear and simple.The static MD5 helper is appropriate for test code, matching the implementation's use of MD5 for non-cryptographic file integrity checking.
386-399: LGTM! Comprehensive initial hashing test.The test verifies that both per-file hashes and the project hash are correctly computed during initialization.
402-417: LGTM! Good test for order independence.This test ensures the combined project hash is deterministic regardless of insertion order in the dictionary.
420-433: LGTM! File modification event handling is tested.The test verifies that file hash updates correctly when a file is modified.
436-450: LGTM! File creation event handling is tested.The test verifies that new files are added to the hash map with correct hashes.
453-465: LGTM! File deletion event handling is tested.The test verifies that deleted files are removed from the hash map.
468-488: LGTM! File move event handling is tested.The test verifies that moved files update the hash map correctly (removing old path, adding new path).
491-501: LGTM! Missing file handling is tested.The test verifies graceful handling of FileNotFoundError.
src/writer/app_runner.py (5)
3-3: LGTM! Necessary imports for the new functionality.The imports support MD5 hashing, path operations, and type hints in ProjectHashLogHandler.
Also applies to: 12-12, 21-21
713-718: LGTM! Order-independent project hash calculation.The sorted iteration ensures deterministic project hash computation regardless of dictionary order.
720-732: LGTM! Proper error handling for file processing.The method correctly handles both FileNotFoundError (returns silently) and other exceptions (logs warning). The
_log_hash()call on line 727 is inside the try block, so it only executes on successful hashing.Note: The past review comment about
_log_hash()being called after exceptions is not applicable to the current code.
733-737: LGTM! Efficient logging with change detection.Only logs when the project hash actually changes, reducing log noise.
896-901: LGTM! Appropriate DEBUG-only integration.The handler is only scheduled when DEBUG logging is enabled, which is appropriate for a hash logging feature. Using
patterns=["*"]ensures all files are tracked for the project hash.
f9b509f to
fe74af5
Compare
✅ Actions performedReview triggered.
|
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.