Skip to content

Conversation

@cthoyt
Copy link
Member

@cthoyt cthoyt commented Jun 30, 2025

Alternative to #80

The prioritization algorithm already assumed that a set of mappings had been processed with both inversion and chain inference.

Old algorithm:

  1. Make an undirected graph
  2. Get all connected components
  3. In each connected component, get the highest priority node
  4. Make mappings from each node to that one

This algorithm would produce incorrect results if there was no pre-existing mapping from a given node in a connected component to the highest priority node in the component (i.e., it would skip it entirely). This should not have happened because of the precondition for using the function

New algorithm:

  1. Still assume that inference has been run and that in each connected component, there are exact match mappings between all nodes (in both directions). Further, assume that assemble_evidences() has been run / there is only one exact match mapping for each subject/object pair
  2. Make a subject-object-mapping index
  3. For each subject
    1. assume that all objects comprise all nodes in the connected component that the subject belongs to.
    2. choose the highest priority object from the list and the associated mapping with that object

Benefits:

  • the new algorithm only has to loop through the mappings once
  • It doesn't have to create a networkx graph data structure nor run the connected components algorthm

Still todo:

  • document/harden behavior for mapping sets that don't induce fully connected components

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants