ggfm.data.higpt_prompt_generation¶
- class ggfm.data.higpt_prompt_generation[source]¶
Bases:
Main function to execute all dataset processing steps.
Performs three main steps: 1. Assigns paper labels based on L2 level field connections 2. Generates node and edge type embeddings using BERT 3. Prepares training data by:
Sampling subgraphs
Creating conversations
Saving in required format
The processed data is saved in the following structure: - Labeled graph: ggfm/datasets/labeled_field_hg.bin - Label mapping: ggfm/datasets/label_to_field.json - Type embeddings: ggfm/models/meta_hgt/meta_dict/oag/ - Training data: ggfm/datasets/stage2_data/OAG-all/