FewShotEntityExtraction

Utility scripts to extract entities from sentences in few-shot dataset using tagme.

Requirements

python3
tagme==0.1.3

To use tagme api, you need to register an account here, and write the token as TAGME_TOKEN in config.py.

Fewshot dataset

Data source: https://github.com/zxlzr/FewShotNLP/tree/master/data
Command: python tag_fewshot.py
Output: original file named A.train(dev/test) -> A.train(dev/test).json
Data format:
- pos_begin: index of starting character in sentence
- pos_end: index of ending character in sentence
- entity_id
- score
Sample data:

[{
  "sentence": "lasts only 2 weeks ! try them if you don't believe me",
  "class": "-1",
  "entities": [
    {
      "pos_begin": 13,
      "pos_end": 18,
      "entity_id": 27493154,
      "score": 0.0007660030387341976
    }, {
      "pos_begin": 21,
      "pos_end": 24,
      "entity_id": 3276812,
      "score": 0.009006991051137447
    }, {
      "pos_begin": 30,
      "pos_end": 32,
      "entity_id": 1685851,
      "score": 0.07183314114809036
    }, {
      "pos_begin": 33,
      "pos_end": 36,
      "entity_id": 14148802,
      "score": 0.19438420236110687
    }, {
      "pos_begin": 37,
      "pos_end": 40,
      "entity_id": 294015,
      "score": 0.010517369955778122
    }, {
      "pos_begin": 37,
      "pos_end": 50,
      "entity_id": 27690196,
      "score": 0.005518087185919285
    }, {
      "pos_begin": 43,
      "pos_end": 53,
      "entity_id": 38740213,
      "score": 0.0672566369175911
    }
  ]
}]

Fewrel dataset

Data source: https://github.com/thunlp/FewRel/tree/master/data
Command: python tag_fewrel.py
Output: train(val).json -> train(val)_entity.json
Data format:
- index_begin: index of starting word in sentence
- index_end: index of ending word in sentence
- entity_id
- score
Sample data:

[{
  "tokens": ["In", "June", "1987", ",", "the", "Missouri", "Highway", "and", "Transportation", "Department", "approved", "design", "location", "of", "a", "new", "four", "-", "lane", "Mississippi", "River", "bridge", "to", "replace", "the", "deteriorating", "Cape", "Girardeau", "Bridge", "."],
  "h": ["cape girardeau bridge", "Q5034838", [[26, 27, 28]]],
  "t": ["mississippi river", "Q1497", [[19, 20]]],
  "entities": [
    {
      "index_begin": 5,
      "index_end": 6,
      "entity_id": 19591,
      "score": 0.35695624351501465
    }, {
      "index_begin": 6,
      "index_end": 7,
      "entity_id": 48519,
      "score": 0.22454741597175598
    }, {
      "index_begin": 8,
      "index_end": 10,
      "entity_id": 58235,
      "score": 0.015105740167200565
    }, {
      "index_begin": 11,
      "index_end": 12,
      "entity_id": 631159,
      "score": 0.08602561801671982
    }, {
      "index_begin": 12,
      "index_end": 13,
      "entity_id": 2272383,
      "score": 0.09302778542041779
    }, {
      "index_begin": 18,
      "index_end": 19,
      "entity_id": 95699,
      "score": 0.0887017548084259
    }, {
      "index_begin": 19,
      "index_end": 21,
      "entity_id": 19579,
      "score": 0.6045866012573242
    }, {
      "index_begin": 26,
      "index_end": 29,
      "entity_id": 4910093,
      "score": 0.5
    }
  ]
}]

Note

Max workers: specify in config.py. Using larger numbers would improve efficiency, but not all the time. 64 is the default value here.
Termination: The program terminates when all jobs are done. You may stop processing by ctrl+c whenever you want, but please make sure all workers exited and the program output Saved data., which means the extracted entities has been written into output files. You may restart and the program would process from where you stopped last time.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
.gitignore		.gitignore
README.md		README.md
config.py		config.py
entity_fewrel.py		entity_fewrel.py
entity_fewshot.py		entity_fewshot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FewShotEntityExtraction

Requirements

Fewshot dataset

Fewrel dataset

Note

About

Uh oh!

Releases

Packages

Languages

231sm/FewShotEntityExtraction

Folders and files

Latest commit

History

Repository files navigation

FewShotEntityExtraction

Requirements

Fewshot dataset

Fewrel dataset

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages