Skip to content

question about evidence embedding file  #11

@ISSCA-ZED

Description

@ISSCA-ZED

the precomputed evidence embedding file is only 19GB if I download it by Google,and then I have a error message

Unpickling BlockData: /disk2/qby/Desktop/emdr2-main/embedding-path/emdr2-finetuning-embedding/psgs_w100-retriever-nq-emdr2-finetuning-base-topk50-epochs10-bsize64-async-indexer.pkl
Traceback (most recent call last):
File "tasks/run.py", line 67, in
main()
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 72, in main
open_retrieval_generative_qa(dataset_cls)
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 60, in open_retrieval_generative_qa
end_of_training_callback_provider=distributed_metrics_func_provider)
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/train_e2eqa.py", line 583, in train
model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)
File "/disk2/qby/Desktop/emdr2-main/megatron/training.py", line 134, in setup_model_and_optimizer
model = get_model(model_provider_func)
File "/disk2/qby/Desktop/emdr2-main/megatron/training.py", line 43, in get_model
model = model_provider_func()
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 36, in model_provider
evidence_retriever = PreComputedEvidenceDocsRetriever()
File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 387, in init
self.precomputed_index_wrapper()
File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 417, in precomputed_index_wrapper
self.get_evidence_embedding(args.embedding_path)
File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 412, in get_evidence_embedding
load_from_path=True)
File "/disk2/qby/Desktop/emdr2-main/megatron/data/emdr2_index.py", line 28, in init
self.load_from_file()
File "/disk2/qby/Desktop/emdr2-main/megatron/data/emdr2_index.py", line 50, in load_from_file
state_dict = pickle.load(open(self.embedding_path, 'rb'))
_pickle.UnpicklingError: pickle data was truncated

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions