ggfm.data.higpt_prompt_generation

class ggfm.data.higpt_prompt_generation[source]

Bases:

Main function to execute all dataset processing steps.

Performs three main steps: 1. Assigns paper labels based on L2 level field connections 2. Generates node and edge type embeddings using BERT 3. Prepares training data by:

  • Sampling subgraphs

  • Creating conversations

  • Saving in required format

The processed data is saved in the following structure: - Labeled graph: ggfm/datasets/labeled_field_hg.bin - Label mapping: ggfm/datasets/label_to_field.json - Type embeddings: ggfm/models/meta_hgt/meta_dict/oag/ - Training data: ggfm/datasets/stage2_data/OAG-all/