| update expected results due to numerical changes in swiglu |
|
More...
|
5 days ago
|
| fast reciprocal for swiglu |
|
More...
|
5 days ago
|
| refactor swiglu with utility function |
|
More...
|
5 days ago
|
| explicit stream arg |
|
More...
|
5 days ago
|
| update test targets due to new loss summation numerics |
|
More...
|
9 days ago
|
| Fixes |
|
More...
|
9 days ago
|
| Fixes |
|
More...
|
9 days ago
|
| fix python interface |
|
More...
|
9 days ago
|
| make python interface return full loss for now |
|
More...
|
9 days ago
|
| logging loss@1k |
|
More...
|
9 days ago
|
| allow inspecting the loss over a subset of sequence positions |
|
More...
|
9 days ago
|
| grouped loss sum kernel |
|
More...
|
9 days ago
|
| handle unknown devices safely in multi-gpu setup. |
|
More...
|
9 days ago
|
| check that TMA kernel has been compiled for fused_classifier_dispatch |
|
More...
|
9 days ago
|
| add B200 to SOl list |
|
More...
|
9 days ago
|
| fix wandb watcher end condition |
|
More...
|
10 days ago
|
| stricter tracking of data-loader state in checkpoints |
|
More...
|
about 1 month ago
|
| add option to re-initialize dataloader when continuing a training run, e.g., for mid-training when the dataset changes |
|
More...
|
about 1 month ago
|
| adjust tests |
|
More...
|
about 1 month ago
|
| better error when safetensor loading fails: show the file name |
|
More...
|
about 1 month ago
|
| bugfix for sharded optimizer states |
|
More...
|
about 1 month ago
|
| more flexible training continuation |
|
More...
|
about 1 month ago
|
| remove epoch logging -> log total progress instead |
|
More...
|
about 1 month ago
|
| use fp16 for rope frequencies to reduce rounding errors |
|
More...
|
about 1 month ago
|
| added llama3 shapes |
|
More...
|
about 2 months ago
|
| more optimizer generalization |
|
More...
|
about 2 months ago
|
| also handle non-block weights |
|
More...
|
about 2 months ago
|
| move buffer allocation to generic optimizer |
|
More...
|
about 2 months ago
|
| add a generic TensorContainer implementation |
|
More...
|
about 2 months ago
|
| fixes |
|
More...
|
about 2 months ago
|