Skip to content

ouspg/refhandler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

refhandler

Refhandler - full chain of tools for writers and advisors of scientific articles and thesis, from managing the corpus of works and references to interfacing with LLM models for evaluating use of references and citations.

Deployment

  1. Install docker and docker compose if you using Linux OR install docker desktop (required for Windows and MacOS)
  2. Clone the Refhandler repository
  3. Change POSTGRES_PASSWORD, SECRET_KEY, ADMIN_PASSWORD and optional VIRUSTOTAL_API_KEY in .env file
  4. Open a terminal in the project root folder
  5. Start the compose stack with the command docker-compose up
  6. Navigate to https://localhost:8443/ or http://localhost:8000 in your browser

Refhandler components

  • Frontend: React frontend and Nginx proxy
  • Backend: FastAPI microservice hosting API services for the frontend
  • Postgres: Containerized SQL database, configured with periodic backups
  • Adminer: Lightweight database management dashboard
  • ClamAV-rest: ClamAV virus scanner wit REST API, used by Backend to scan PDF documents coming from the frontend.
  • Deck-chores: Job scheduler for Docker containers, used to run Postgres backup script
  • compose.yaml: Configuration and deployment of the project's docker containers
  • .env: Environmental variables injected into the containers during startup. Hold user configurable settings, such as ports, passwords and backend secret key.

Container start order

  1. postgres, clamav-rest
  2. backend, adminer
  3. frontend

Container start order is defined in the compose.yaml file using healthchecks and depends_on attributes. Containers that depend on other containers will wait for the other container to pass its healthcheck before starting up, avoiding startup race conditions where the compose stacks fails to start because the containers started in the wrong order.

Troubleshooting

Windows line endings and Linux

On Windows, docker containers are run in Windows Subsystem for Linux (WSL), a Linux-based virtual machine that expects UNIX-style LF line endings. Windows uses CRLF line endings by default, and on Windows, GIT is installed with a default setting that converts LF line endings into CRLF. This can cause issues, for example when a bash script is written on Windows and copied into a docker container, where it will be parsed incorrectly and fails to run.

One sign of CRLF auto-conversion is when your version control is suddenly full of changed files without any visible changes inside the files.

Remediation

To avoid these issues, use the command git config --global core.autocrlf false to disable Git's LF to CRLF auto-conversion, and make sure your IDE is set to use LF line endings (bottom right corner in VScode). If you already have accidentally converted line endings, you have to discard your changes and pull the original files again from the Git repository.

Some of the containers do not start

Make sure old versions of the containers aren't running with the command docker compose down before trying to start the compose file again.

Containers Postgres, Backend, Frontend do not start

Sometimes older instances of the Postgres container don't free the Postgres port, preventing newer instances from starting.

  • Docker desktop: Try restarting Docker desktop to close hanging ports
  • Fix on windows: Restart the NAT driver with the commands net stop winnat; net start winnat (requires administrator powershell or command prompt)
  • Fix on linux: #TODO if the problem isn't windows only

Backend and Frontend aren't starting, docker logs show errors related to alembic or database operations

Backend failed to apply alembic database migrations, leaving the database tables and the SQLmodel tables in /backend/app/models.py in an incompatible state.

  • Not all SQLmodel changes are compatible with alembic autogeneration. Try adjusting autogenerated migration scripts in the folder /backend/alembic. See alembic documentation for details.
  • Delete the alembic migration scripts and drop the alembic table from the database using the Adminer dashboard. If the errors persist, it means database migrations are still needed.
  • If all else fails, delete the database volume refhandler_postgres_data, restart the compose stack and restore from backups (inside volume refhandler_postgres_backups on the host, or /backups inside the postgres container).

Plan:

Main

  • Query CorpusManager for jobs
  • Start processing jobs

DatabaseWrapper

  • Keep all database -related code in one place to permit database migration depending on need
  • Start with SQLite?

CorpusManager

  • Manage a database including:
  • ...WORKS: of the assessed works in the work directory, including year of publication, institution, faculty etc. information
  • ...REFS: of the referenced works, including on whether work is available, has been downloaded to references directory
  • ...REFTEXT: of reference and citation texts including 1:1 relation to WORKS
  • ...REFTAXONOMY: a taxonomy of reference types, including a description of the taxonomy
  • ...TYPE: reference types (N:1) related to specific TAXONOMY
  • ...REFREF: N:N relation table between REFTEXT and REFS
  • ...REFTYPE: TYPE classification of REFTEXT given by LLM model for specific REFTAXONOMY
  • ...LLM: LLM models and versions available
  • ...ASSESSMENT: assessments of REFTEXT by specific LLM
  • Provide a job list for subsequent actions

LLMinterface

  • Provide interface to pose prompts to LLM models via API or to locally run models
  • Prompt injection recognition :P
  • Provide a list of LLM models with version info

ReferenceExtractor

  • Process through a given work and extract:
    • Necessary data to table WORKS
    • The list of references and add to table REFS
    • Each reference and citation to table REFTEXT and REFREF and add an empty PDF annotation with REFTEXT row ID to add the annotation later

ReferenceClassifier

  • Given a reference text, query available LLM's to classify the reference according to each available REFTAXONOMY to TYPE
  • Annotate WORKS pdf with the outcome

ReferenceFetcher

  • Given a reference, try to obtain original PDF text and update REFS table

ReferenceAssessment

  • Given a REFTEXT, REFTYPE and REFS entry with available PDF, query available LLM's on the accuracy of the reference
  • Annotate WORKS pdf with the outcome
  • Update ASSESSMENT entry with results

Statistics

  • Provide statistics and export CSV results

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6