ggfm.models.GraphLlamaModel

class ggfm.models.GraphLlamaModel(config: LlamaConfig)[source]

Graph Llama model

forward(input_ids: LongTensor | None = None, attention_mask: Tensor | None = None, past_key_values: List[FloatTensor] | None = None, inputs_embeds: FloatTensor | None = None, use_cache: bool | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, graph_data: Data | None = None, return_dict: bool | None = None)[source]

The [LlamaModel] forward method, overrides the __call__ special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

</Tip>

Parameters:
  • input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –

    Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

    Indices can be obtained using [AutoTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

    [What are input IDs?](../glossary#input-ids)

  • attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional) –

    Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

    • 0 for tokens that are masked.

    [What are attention masks?](../glossary#attention-mask)

    Indices can be obtained using [AutoTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

    If past_key_values is used, optionally only the last input_ids have to be input (see past_key_values).

    If you want to change padding behavior, you should read [modeling_opt._prepare_decoder_attention_mask] and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more information on the default strategy.

    • 1 indicates the head is not masked,

    • 0 indicates the head is masked.

  • position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –

    Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.n_positions - 1].

    [What are position IDs?](../glossary#position-ids)

  • past_key_values (Cache or tuple(tuple(torch.FloatTensor)), optional) –

    Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used to speed up sequential decoding. This typically consists in the past_key_values returned by the model at a previous stage of decoding, when use_cache=True or config.use_cache=True.

    Two formats are allowed: - a [~cache_utils.Cache] instance, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache); - Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)). This is also known as the legacy cache format.

    The model will output the same cache format that is fed as input. If no past_key_values are passed, the legacy cache format will be returned.

    If past_key_values are used, the user can optionally input only the last input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of all input_ids of shape (batch_size, sequence_length).

  • inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

  • use_cache (bool, optional) – If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

  • output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

  • return_dict (bool, optional) – Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

  • cache_position (torch.LongTensor of shape (sequence_length), optional) – Indices depicting the position of the input sequence tokens in the sequence. Contrarily to position_ids, this tensor is not affected by padding. It is used to update the cache in the correct position and to infer the complete sequence length.

get_graph_tower()[source]
initialize_graph_modules(graph_tower, graph_select_layer, pretrain_graph_mlp_adapter=None, fsdp=None)[source]
training: bool