Skip to content

Conversation

@ngarana
Copy link

@ngarana ngarana commented Dec 31, 2025

Switch to Hybrid Google/Wiktionary Data Source

Summary

This PR implements a hybrid fallback strategy for dictionary definitions. The API now tries the original Google Dictionary endpoint first (preserving rich data like phonetics, origin, synonyms, antonyms), and falls back to Wiktionary when Google fails.

Motivation

The original Google Dictionary API endpoint started returning 400 Bad Request errors for some words, causing the API to fail. Rather than completely replacing Google with Wiktionary entirely, this hybrid approach:

  • Preserves original functionality when Google works
  • Provides reliable fallback when Google fails
  • Maintains backward compatibility with existing API consumers

Strategy

Request → Try Google API → Success? → Return rich data (phonetics, origin, synonyms, antonyms)
                ↓
              Failure
                ↓
         Try Wiktionary → Success? → Return definitions with empty optional fields
                ↓
              Failure
                ↓
         Return 404 "No Definitions Found"

Changes

Core Changes

  • modules/dictionary.js:
    • Restored original transformGoogle() and queryGoogle() functions
    • Added new transformWiktionary() and queryWiktionary() functions
    • findDefinitions() now tries Google first, falls back to Wiktionary
    • Wiktionary responses include empty placeholder fields for missing data
    • Added case-insensitive word lookup for Wiktionary (handles proper nouns)

Supporting Changes

  • app.js:
    • Removed forced lowercase conversion of input words (preserves casing for proper nouns)
    • Enhanced cleanText() to strip <style> and <script> tags from HTML content

Documentation

  • README.md:
    • Added "Data Source & License" section with Wiktionary attribution
    • Added proper CC-BY-SA 4.0 attribution as required by Wiktionary license
    • Updated example JSON responses
    • Added note about data source fallback behavior

API Response Comparison

Field Google (when available) Wiktionary (fallback)
word ✅ Present ✅ Present
phonetic ✅ IPA text ⚪ Empty string
phonetics ✅ With audio URLs ⚪ Empty array
origin ✅ Etymology ⚪ Empty string
meanings ✅ Present ✅ Present
definitions[].synonyms ✅ Array of words ⚪ Empty array
definitions[].antonyms ✅ Array of words ⚪ Empty array

Key: ✅ = Data available, ⚪ = Empty placeholder (structure preserved)

Backward Compatibility

100% backward compatible. All API endpoints and response structure remain unchanged:

  • /api/v2/entries/en/{word}
  • /api/v1/entries/en/{word}
  • All existing fields present (may be empty when using Wiktionary fallback)

Testing

Test Case Google Wiktionary Result
hello ❌ 400 Falls back to Wiktionary
hell ❌ 400 Falls back to Wiktionary
London ❌ 400 Falls back to Wiktionary
computer ❌ 400 Falls back to Wiktionary
Nonexistent word Returns proper 404 error

Note: Google is currently returning 400 for all requests, so all queries fall back to Wiktionary. When/if Google resumes working, the API will automatically use the richer Google data.

License Compliance

Added proper attribution for Wiktionary content as required by CC-BY-SA 4.0:

  • Source attribution to Wiktionary and Wikimedia Foundation
  • Link to CC-BY-SA 4.0 license
  • Requirements for downstream users documented

Future Improvements

When someone wants to do a proper PR to enhance Wiktionary data extraction, they could:

  1. Parse Wiktionary HTML for IPA pronunciation
  2. Extract etymology from Wiktionary entries
  3. Add Datamuse API integration for synonyms/antonyms
  4. Cache responses to reduce external API calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant