This is the code for the paper Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs .
The project is pip-installable. To install, run the following command in the root directory of the project:
pip install -e .For a faster experience, we recommend using uv, which is an extremely fast drop-in replacement for pip.
For running non-API LLMs (i.e., gemma and olmo LLMs), we use the vllm library. As described in their docs, they recommend using uv or conda. Since we are already using conda, you can install vllm using uv:
uv sync --group=cudaThis will add vllm to the virtual environment. To get responses for the open-source models, you need to activate this environment and run the following command:
python scripts/vllm_batch_responses.pyWe release our dataset on huggingface 🤗 (see top of readme for link). This includes a detailed datasheet (Gebru et al., 2021).
- Create WVS ground truth: Calculates the "ground truth" pro score for each chosen country and question.
- Translate prompts: Automatically translate the prompts to Danish, Portuguese, and Dutch using
gpt-3.5-turbo. - Get responses from OpenAI: Generates response from the OpenAI models. Note that I did gpt-4o in a seperate run here. For future runs, they can be done with the same script.
- Get responses from Open Source: Same as above but using vLLM for the open source models. Note, that running this requires cuda - see here for installation instructions.
- Categorize response: Categorizes the responses into pro and con using function calling and gpt-4.1.
- Merge results with scores: Merges all the results and calculates the pro-score.
- Analyze hypotheses: Finally, this analyses and plots the results. These canbe found in the
plotsfolder. - Plot and regressions: To get all the plots from the paper, you need to run the following scripts: WVS plot, Multilingual regression, US-centric bias, and Self-consistency. Running these scripts will also provide print-outs of the regression tables where relevant.