Questions about training

Hi, it's a great work!

We have three inputs designated as `i1`, `i2`, and `i3`, which are to be processed by the llama-7b. For input `i1`, I will extract two hidden states at two distinct locations and label them `p11` and `p12`, respectively. Regarding the remaining inputs, `i2` and `i3`, I will select a single hidden state for each, which will be denoted as `n21` and `n31` correspondingly.

In this setup, `p11` paired with `n21` constitutes a positive pair, whereas `p11` coupled with `n22` forms a negative pair. Meanwhile, `p12` paired with `n22` constitutes a positive pair, whereas `p12` coupled with `n21` forms a negative pair. My objective is to compute the InfoNCE loss between these pairs.

So I set the `get_rep_fn` in the class `GradCache` to handle the different situations. Here is a sample snippet or a piece of example code:
```python
def get_rep_fn(x):
    if x.label == 2:
        return [x.e1, x.e2]
    else:
        return [x.e1]
``` 
In the same time, I changed the following code from `append` to `extend`:
https://github.com/luyug/GradCache/blob/0c33638cb27c2519ad09c476824d550589a8ec38/src/grad_cache/grad_cache.py#L187
https://github.com/luyug/GradCache/blob/0c33638cb27c2519ad09c476824d550589a8ec38/src/grad_cache/grad_cache.py#L270
I'd like to inquire about the correctness of the gradient computation. Could you please confirm if it's being done accurately?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about training #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions about training #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions