Microblog


7/16/2024

Using HuggingFace Datasets Offline

# ml/ai

This is pretty simple, but quite helpful if you're running jobs on a compute node that doesn't have internet access.

On the login node or another machine with internet access, run the following Python code:

import datasets
 
x = datasets.load_dataset("my_dataset")
 
x.save_to_disk("./my_dataset_local")

Then, if needed, copy the files to the machine running your job. Now, from that offline machine, loading the dataset is simple!

y = datasets.load_from_disk("./hellaswag_local")
y # DatasetDict({...})

7/16/2024

Tips #1

# random
# ml/ai
  • You can turn on Markdown detection in Google Docs! Just go to Tools > Preferences > Automatically detect Markdown
  • Brave Browser for iOS supports swiping left and right between between tabs — you just have to swipe below the URL bar.
  • If you need to run a .ts file from the command-line, use Bun instead of ts-node! It's much simpler and doesn't have all the weird issues with the package.json "type" field.