This repository contains the official implementation of the paper:
Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework
Andrea Di Pierno, Luca Guarnera, Dario Allegra, Sebastiano Battiato
In Proceedings of the DFF Workshop at ACM MM 2025, Dublin, Ireland.
LAVA is a multi-level framework for audio deepfake attribution and model recognition, combining a shared convolutional autoencoder with attention-based classifiers.
The architecture is composed of:
- An encoder trained exclusively on fake audio
- A Level 1 classifier (ADA) for dataset-level attribution
- A Level 2 classifier (ADMR) for fine-grained model recognition
- A rejection mechanism based on prediction confidence to handle open-set inputs
The framework is evaluated across in-domain, cross-domain, and open-set scenarios, using the following datasets:
If you use this code, please cite our paper:
@inproceedings{10.1145/3746265.3759668,
author = {Di Pierno, Andrea and Guarnera, Luca and Allegra, Dario and Battiato, Sebastiano},
title = {Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework},
year = {2025},
isbn = {9798400720475},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3746265.3759668},
doi = {10.1145/3746265.3759668},
pages = {101β109},
numpages = {9},
location = {Dublin, Ireland},
series = {DFF '25}
}.
βββ models/ # Pretrained models
β βββ autoencoder.pt # Pretrained convolutional autoencoder
β βββ ADA_model.pt # Audio Deepfake Attribution model
β βββ ADMR_model.pt # Audio Deepfake Model Recognition model
βββ img/ # Project assets
β βββ LAVA-logo.png # LAVA framework logo
βββ ablationStudies.py # Ablation studies without attention mechanism
βββ audioDeepfakeAttribution.py # ADA model implementation with attention
βββ audioDeepfakeModelRecognition.py # ADMR model for generation model recognition
βββ autoencoder.py # Deep convolutional autoencoder architecture
βββ fakeDataset.py # CodecFake dataset class with automatic labeling
βββ pipeline.py # Complete LAVA framework pipeline
βββ split_datasets.py # Dataset splitting utilities for ADA and ADMR
βββ testDataset.py # Error propagation and generalization testing datasets
βββ tester.py # Testing script with error propagation analysis
βββ .gitignore # Git ignore file
βββ LICENSE # MIT License
βββ README.md # This file
βββ requirements.txt # Python dependenciesThe LAVA framework consists of three main components:
- Purpose: Feature extraction from audio waveforms
- Training: Exclusively on fake audio samples for optimal representation learning
- Architecture: 1D convolutional layers with progressive compression (1β32β64β128β256 channels)
- Output: Compressed latent representations for downstream tasks
- Purpose: Technology-level attribution to identify the generation source of audio samples
- Classes: 3-way classification (CodecFake, ASVspoof2021, FakeOrReal)
- Architecture: Pretrained encoder + attention mechanism + classification head
- Features: Confidence-based rejection for uncertain predictions
- Purpose: Fine-grained model recognition for CodecFake samples
- Classes: 6-way classification for different generation models
- Architecture: Shared encoder + attention + specialized classification head
- Pipeline: Only invoked for samples classified as CodecFake by ADA
- Python 3.10+
- CUDA-compatible GPU (recommended)
- Audio data preprocessed as PyTorch tensors (.pt files)
- Clone the repository:
git clone https://github.com/adipiz99/lava-framework.git
cd lava-framework- Install dependencies:
pip install -r requirements.txtRun the full LAVA pipeline for audio deepfake analysis:
from pipeline import load_model, run_pipeline
import torch
# Load pretrained models
autoencoder_path = "models/autoencoder.pt"
ada_model, device = load_model(
model_class=AudioDeepfakeAttributionModel,
model_path="models/ADA_model.pt",
autoencoder_path=autoencoder_path
)
admr_model, _ = load_model(
model_class=ADMR_model,
model_path="models/ADMR_model.pt",
autoencoder_path=autoencoder_path
)
# Run pipeline on audio tensor
audio_tensor = torch.load("path/to/audio.pt")
thresholds = [0.85, 0.90] # [ADA_threshold, ADMR_threshold] (example)
result = run_pipeline(audio_tensor, ada_model, admr_model, device, thresholds)
print(f"Dataset Attribution: {result['ADA']}")
print(f"Model Recognition: {result['ADMR']}")Analyze how errors propagate through the pipeline:
from tester import error_propagation_test, ErrorPropagationDataset
from torch.utils.data import DataLoader
# Create test dataset
test_dataset = ErrorPropagationDataset(
for_fake_dir="/path/to/FOR/fake",
for_real_dir="/path/to/FOR/real",
avs_fake_dir="/path/to/ASVspoof2021/fake",
avs_real_dir="/path/to/ASVspoof2021/real",
codec_real_dir="/path/to/CodecFake/real",
codec_csvfile="/path/to/csv/codecfake/test.csv"
)
test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)
# Run error propagation analysis
ada_true, ada_pred, admr_true, admr_pred, valid_mask = error_propagation_test(
ada_model, admr_model, test_loader, device
)Test model performance on unseen datasets:
from tester import generalization_test, GeneralizationDataset
# Create generalization dataset (ASVspoof2019LA)
gen_dataset = GeneralizationDataset(
avs19LA_fake_dir="/path/to/ASVspoof2019LA/fake"
)
gen_loader = DataLoader(gen_dataset, batch_size=16, shuffle=False)
# Run generalization test
generalization_test(gen_loader, ada_model, admr_model, device)The LAVA framework expects audio data to be preprocessed as PyTorch tensors (.pt files). Each tensor should contain normalized audio waveforms ready for model input.
/path/to/data/
βββ CodecFake/
β βββ real/ # Real CodecFake samples (.pt files)
β βββ fake/ # Fake CodecFake samples (.pt files)
βββ ASVspoof2021/
β βββ real/ # Real ASVspoof2021 samples (.pt files)
β βββ fake/ # Fake ASVspoof2021 samples (.pt files)
βββ FakeOrReal/
β βββ real/ # Real FakeOrReal samples (.pt files)
β βββ fake/ # Fake FakeOrReal samples (.pt files)
βββ ASVspoof2019LA/ # For generalization testing
βββ fake/ # Unseen ASVspoof2019LA samples (.pt files)
For CodecFake dataset, you'll also need CSV files containing paths and labels:
path,label
/path/to/sample1.pt,1
/path/to/sample2.pt,2
/path/to/sample3.pt,6
Prepare balanced train/validation/test splits:
# For ADA task (3-dataset classification)
python split_datasets.py
# Creates CSV files for ADA training with balanced sampling from all datasets
# For ADMR task (6-model classification)
# Automatically handled within the script for CodecFake datasetTrain individual components:
# Train Autoencoder
from autoencoder import train, DeepAutoencoder
autoencoder = DeepAutoencoder()
train(autoencoder, train_loader, val_loader, device, epochs=50)
# Train ADA Model
from audioDeepfakeAttribution import train_model, AudioDeepfakeAttributionModel
ada_model = AudioDeepfakeAttributionModel(pretrained_autoencoder)
train_model(ada_model, train_loader, val_loader, device, epochs=20)
# Train ADMR Model
from audioDeepfakeModelRecognition import train_ADMR_model, ADMR_model
admr_model = ADMR_model(pretrained_autoencoder)
train_ADMR_model(admr_model, train_loader, val_loader, device, epochs=10)Compare performance with and without attention mechanism:
from ablationStudies import run_task, ModelWithoutAttention
# Run ADA without attention
run_task(
task_name="ADA_no_attention",
num_classes=3,
csv_dir="/path/to/csv/ADA_split/",
model_path="/path/to/models/ADA_no_attention.pt",
output_dir="/path/to/results/",
label_to_zero_base=False
)Run the complete test suite:
python tester.pyThe test script provides:
- Error Propagation Analysis: How ADA errors affect ADMR performance
- Confusion Matrices: Detailed classification performance visualization
- Generalization Testing: Performance on unseen ASVspoof2019LA data
- Metrics: Accuracy, Precision, Recall, F1-score for both ADA and ADMR
| Component | Metrics | Description |
|---|---|---|
| ADA | Accuracy, F1-score, Confusion Matrix | 3-way dataset attribution performance |
| ADMR | Accuracy, F1-score, ROC-AUC | 6-way model recognition performance |
| Pipeline | Error Propagation Rate, End-to-End Accuracy | Complete system evaluation |
| Generalization | Zero-shot Accuracy on ASVspoof2019LA | Cross-dataset robustness |
We provide pretrained checkpoints in the models/ folder:
| Model | Description | Input | Output | Usage |
|---|---|---|---|---|
| autoencoder.pt | Deep convolutional autoencoder trained on fake audio | Audio tensors | Latent features (256-dim) | Feature extraction backbone |
| ADA_model.pt | Audio Deepfake Attribution classifier | Audio tensors | Dataset labels (0-2) | Technology source identification |
| ADMR_model.pt | Audio Deepfake Model Recognition classifier | Audio tensors | Model labels (0-5) | Generation model identification |
| Dataset | Usage in LAVA |
|---|---|
| CodecFake | ADMR & ADA training, Error propagation testing, Ablation studies |
| ASVspoof2021 | ADA training, Error propagation testing, Ablation studies |
| FakeOrReal | ADA training, Error propagation testing, Ablation studies |
| ASVspoof2019LA | Generalization testing |
- Training: CodecFake (both ADA & ADMR), ASVspoof2021 (ADA), FakeOrReal (ADA)
- Testing: Balanced samples from all training datasets for in-domain evaluation
- Ablation studies: Balanced samples from all training datasets to validate the impact of attention mechanism
- Error propagation: Balanced samples from all training datasets to analyze how ADA errors affect ADMR performances
- Generalization: ASVspoof2019LA for cross-dataset evaluation
Please refer to the paper for detailed split information and rejection protocol.
This project is licensed under the MIT License. See the LICENSE file for details.
If you have questions or find this project useful, feel free to contact us:
- Andrea Di Pierno β andrea.dipierno@imtlucca.it
This study has been partially supported by SERICS (PE00000014), including its vertical project FF4ALL, under the MUR National Recovery and Resilience Plan funded by the European Union β NextGenerationEU.
