FlowKeys -- An Offline AI-Powered Typing Practice & Machine Learning Project

FlowKeys is a Python-based typing practice desktop application inspired by MonkeyType but enhanced with machine learning and fully offline AI. The aim is to help users analyze their typing behavior, improve accuracy, and personalize their learning curve.

Start

1. Installation and Setup

# Clone the repository
git clone https://github.com/codingsanji/FlowKeys.git
cd FlowKeys

# Create a virtual environment
python -m venv venv

# Activate it:
# Windows
.\venv\Scripts\activate
# Mac/Linux
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Run the application

a. On Windows, cd to path and run:

#(python src/app/main.py)
$env:PYTHONPATH = "src" 
python src/app/main.py

b. On Mac/Linux/Git bash, run:

export PYTHONPATH=$PYTHONPATH:$(pwd)/src
python src/app/main.py

Features

Feature	Description	Dataset Used	ML/AI Type
Randomized Typing Tests	Generates word combos or quotes using multiple datasets. Ensures variety and progressive challenge.	English Word Frequency Dataset + Quotes 500K	None (preprocessing / data-driven logic)
Keystroke Error Classification	Classifies typing errors (missed letters, wrong case, slow/fast presses, fatigue slips) using supervised classification models. Provides insight into typing weaknesses.	User-generated typing logs (CSV/JSON) + Augmented public keystroke datasets to increase training data size.	Classification (Logistic Regression, Random Forest, SVM, Gradient Boosting)
Typing Speed Prediction	Predicts future WPM and error rate based on recent session patterns. Helps anticipate performance trends.	User-generated typing logs (CSV/JSON) + Augmented public keystroke datasets to improve model robustness.	Regression (Linear Regression, Random Forest Regressor, XGBoost Regressor)
Typing Fatigue Estimator	Estimates a fatigue score (0–100) based on timing irregularities, error spikes, and speed drop. Guides adaptive difficulty and practice breaks.	Local session logs + Augmented keystroke data to enhance model accuracy.	Regression (Linear Regression, Random Forest Regressor, XGBoost Regressor)
Progress Charts & Graphs	Visualizes WPM, accuracy, errors, and fatigue trends. Supports analysis of improvement over time.	Local session logs	None (Matplotlib/Tkinter visualization)

Machine Learning Components

These ML tasks fulfill the classification/regression requirement of the assignment.

Number 1:

1. Keystroke Error Classification (Supervised Learning - Classification)

Goal: Predict what type of typing error occurred during the session.

Error Classes:

Missed Letters
The user intended to press a key but messed it up (eg., typing “hte” instead of “the”).
Incorrect Case
The user typed the correct letter but in the wrong case (uppercase instead of lowercase or vice-versa).
Slow or Hesitant Keypress
The key was pressed much slower than the user's usual speed often indicating confusion or the user was stuck searching for the key.
Fast or Rushed Keypress
The key was pressed significantly faster than normal usually due to the rush or careless typing.
Fatigue-Related Slip
A general typing anomaly caused by tiredness, inconsistent timings, uneven pressure, or unusually long pauses.
(Not directly in standard datasets; usually derived using timing thresholds.)

Dataset:

This project uses a self-generated dataset collected from typing sessions stored as a CSV or JSON but also uses augmentated datasets to assist like:

Example CSV:

user_id, key, press_time, release_time, hold_time, latency, is_error, error_type
01, a, 0.458, 0.612, 0.154, 0.102, 1, missed_letter
01, Shift, 0.900, 1.020, 0.120, 0.200, 1, incorrect_case

Models Used:

Logistic Regression
Random Forest Classifier
Support Vector Machine (SVM)
Gradient Boosting Classifier

Outputs:

Confusion Matrix
Classification Report (accuracy, precision, recall, F1-score)
ROC Curves
Model Comparison Chart (Accuracy & F1 Score)

Number 2:

2. Typing Speed Prediction (Supervised Learning - Regression)

Goal:
Predict your future typing performance based on your recent session data.
The model forecasts:

Words Per Minute (WPM)
Error Rate
Performance Trend

Dataset:

User-generated typing logs (CSV/JSON) and augmentated datasets used in the first component.

Example CSV:

user_id, session_id, avg_hold_time, avg_latency, total_errors, wpm, fatigue_score
01, 001, 0.152, 0.108, 5, 42, 10
01, 002, 0.145, 0.095, 3, 48, 8

Features explained:

avg_hold_time → average key hold duration
avg_latency → average delay between key presses
total_errors → errors per session
wpm → target variable
fatigue_score → optional, derived from timing variance

Models Used:

Linear Regression
Random Forest Regressor
XGBoost Regressor

Outputs:

MSE (Mean Squared Error)
RMSE (Root Mean Squared Error)
R² Score
Prediction vs. Actual Graph

Number 3:

3. Typing Fatigue Estimation (Supervised Learning - Regression)

Goal:
Estimate a “fatigue score” (0-100) for a user during a typing session.
Performance-based metric, useful for guiding adaptive difficulty and practice breaks.

Dataset:

Locally generated user typing logs (CSV/JSON)

Example CSV:

user_id, session_id, avg_hold_time, avg_latency, total_errors, wpm, fatigue_score
01, 001, 0.152, 0.108, 5, 42, 10
01, 002, 0.145, 0.095, 6, 40, 15
01, 003, 0.160, 0.120, 8, 35, 25

Features explained:

avg_hold_time → average key hold time
avg_latency → average delay between key presses
total_errors → total errors in the session
wpm → typing speed
fatigue_score → target variable

Models Used:

Linear Regression
Random Forest Regressor
XGBoost Regressor

Outputs Explained:

Fatigue Trend Line
Regression Metrics (MSE, RMSE, R²)
Practical Insights for adaptive difficulty and practice guidance

Tech Stack

Component	Technology
GUI	Tkinter + ttk for modern widgets
Machine Learning	Scikit-Learn, XGBoost
Backend	Python 3.10+
Data Storage	Local CSV / JSON logs
Visualization	Matplotlib, Seaborn, Tkinter Canvas
App Packaging	PyInstaller (cross-platform .exe)
Pre-processing	Pandas & NumPy
Model Serialization	joblib

Datasets

English Word Frequency Dataset

These can come in handy but not at the moment...

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
notebooks		notebooks
results		results
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlowKeys -- An Offline AI-Powered Typing Practice & Machine Learning Project

Start

1. Installation and Setup

2. Run the application

Features

Machine Learning Components

1. Keystroke Error Classification (Supervised Learning - Classification)

Error Classes:

Dataset:

Models Used:

Outputs:

2. Typing Speed Prediction (Supervised Learning - Regression)

Dataset:

Models Used:

Outputs:

3. Typing Fatigue Estimation (Supervised Learning - Regression)

Dataset:

Models Used:

Outputs Explained:

Tech Stack

Datasets

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

codingsanji/FlowKeys

Folders and files

Latest commit

History

Repository files navigation

FlowKeys -- An Offline AI-Powered Typing Practice & Machine Learning Project

Start

1. Installation and Setup

2. Run the application

Features

Machine Learning Components

1. Keystroke Error Classification (Supervised Learning - Classification)

Error Classes:

Dataset:

Models Used:

Outputs:

2. Typing Speed Prediction (Supervised Learning - Regression)

Dataset:

Models Used:

Outputs:

3. Typing Fatigue Estimation (Supervised Learning - Regression)

Dataset:

Models Used:

Outputs Explained:

Tech Stack

Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages