multicultural-alignment

This is the code for the paper Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs .

Installation

The project is pip-installable. To install, run the following command in the root directory of the project:

pip install -e .

For a faster experience, we recommend using uv, which is an extremely fast drop-in replacement for pip.

VLLM setup

For running non-API LLMs (i.e., gemma and olmo LLMs), we use the vllm library. As described in their docs, they recommend using uv or conda. Since we are already using conda, you can install vllm using uv:

uv sync --group=cuda

This will add vllm to the virtual environment. To get responses for the open-source models, you need to activate this environment and run the following command:

python scripts/vllm_batch_responses.py

Data

We release our dataset on huggingface 🤗 (see top of readme for link). This includes a detailed datasheet (Gebru et al., 2021).

Reproducing the analysis

Create WVS ground truth: Calculates the "ground truth" pro score for each chosen country and question.
Translate prompts: Automatically translate the prompts to Danish, Portuguese, and Dutch using gpt-3.5-turbo.
Get responses from OpenAI: Generates response from the OpenAI models. Note that I did gpt-4o in a seperate run here. For future runs, they can be done with the same script.
Get responses from Open Source: Same as above but using vLLM for the open source models. Note, that running this requires cuda - see here for installation instructions.
Categorize response: Categorizes the responses into pro and con using function calling and gpt-4.1.
Merge results with scores: Merges all the results and calculates the pro-score.
Analyze hypotheses: Finally, this analyses and plots the results. These canbe found in the plots folder.
Plot and regressions: To get all the plots from the paper, you need to run the following scripts: WVS plot, Multilingual regression, US-centric bias, and Self-consistency. Running these scripts will also provide print-outs of the regression tables where relevant.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
output		output
plots		plots
prompts		prompts
scripts		scripts
src/multicultural_alignment		src/multicultural_alignment
supplementary		supplementary
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
template.env		template.env
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

multicultural-alignment

Installation

VLLM setup

Data

Reproducing the analysis

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jhrystrom/multicultural-alignment

Folders and files

Latest commit

History

Repository files navigation

multicultural-alignment

Installation

VLLM setup

Data

Reproducing the analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages