gpu-uark

Logging on the cluster

There are many ways to interact with the cluster.

From the portal

Access https://hpc-portal2.hpc.uark.edu/pun/sys/dashboard and put credentials. For example, my username is igorf (without the @uark.edu)

From the terminal

Windows users

Open Windows PowerShell
Run:

ssh USERNAME@hpc-portal2.hpc.uark.edu

where USERNAME is your username.

Mac and Linux users

Open the Terminal
Run:

ssh USERNAME@hpc-portal2.hpc.uark.edu

where USERNAME is your username.

Note: for both cases, if you are outside campus, you need to run first ssh USERNAME@hpc-portal2.hpc.uark.edu -p 2022 to authenticate, then run ssh USERNAME@hpc-portal2.hpc.uark.edu.

Copying files to the cluster

From the portal

Download the repository code: https://github.com/igorkf/gpu-uark/archive/refs/heads/main.zip
Access the portal
Go to Files > Home Directory
Click Upload
Upload the file gpu-uark-main.zip

From the terminal

Open a new terminal (PowerShell for Windows or Terminal for Mac or Linux). Do not login to the cluster yet.
Go to the folder where you downloaded the zip file (e.g., Downloads):
```
cd Downloads
```
Copy the repository code to the cluster (change USERNAME by your username):
```
scp gpu-uark-main.zip USERNAME@hpc-portal2.hpc.uark.edu:/home/USERNAME
```

Unzipping folders

Login on the cluster:
```
ssh USERNAME@hpc-portal2.hpc.uark.edu
```
Unzip the main file and then enter the folder:
```
unzip gpu-uark-main.zip
cd gpu-uark-main
```
Unzip the data:
```
unzip data.zip
```

Running jobs on the cluster

Now, we are ready to run the tasks we need.

Preprocessing the data

To preprocess the data, run:
```
sbatch 1-preprocess.sh
```
To check job status, run:
```
squeue -u USERNAME
```

Some files will be written to logs and output.
Check the log:

cat logs/prep_geno.log

Check the genotypic data:

head output/geno_ok.csv

Bonus (optional):
Dr. Fernandes has two private nodes that have more computational power. To preprocess the data using the private nodes, we have to access a different operation system (OS) and then run the task:

Connect to the other OS:
```
ssh pinnacle-l12
```
Change to the code's directory:
```
cd gpu-uark-main
```
Run the job:
```
sbatch 1-preprocess-example-condo.sh
```

The output should be the same.

Configuring Python environment with GPU dependencies

Run the configuration task:
```
sbatch 2-config_python_env.sh
```

Check the logs:

cat logs/config_python_env.out

See that a new Python environment was created. It is packed in a single folder called myenv:

ls -lht myenv

It has all the libraries (dependencies) we need to train the models using Python.

Performing feature engineering and other steps

Now that we have the genotypic data cleaned (output/geno_ok.csv), we can perform other steps such as feature engineering to generate the final data for training and evaluating the models.

Run:
```
sbatch 3-create-datasets.sh
```

Check the output files:

ls -lht output

In this case, the training data comprises trials from 2021, whereas validation data trials from 2022. Thus, this is a case of CV0-Year, where environments from a future year are untested. You could test other cross-validation scenarios, but this would demand some changes in the code.

Train a LightGBM model

With all the data available, now we fit and evaluate the models.

Fit and evaluate the model:
```
sbatch 4-train_lgbm.sh 
```

Check the logs:

cat logs/train_lgbm.logs

Check mean predictive ability across environments:

cat logs/train_lgbm.log | grep "mean env corr:"

Predictions are stored at output/pred_lgbm.csv.

Train a neural network with PyTorch

As the cluster has GPU available, we can fit neural networks using the GPU.

Run:

# for CPU
sbatch 5-train_nn_cpu.sh

# for GPU
sbatch 5-train_nn_gpu.sh

It will take some minutes. We can check the job status. For example:

squeue -u igorf

gives:

(myenv) c1601:igorf:~/repos/gpu-uark$ squeue -u igorf
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            716165   cloud72   nn_cpu    igorf  R       7:07      1 c1601
            716169     gpu06   nn_gpu    igorf  R       4:59      1 c1702

One model is using CPUs (partition=cloud72) and the other is using GPU (partition=gpu06).

Checking GPU metrics

Open a new tab of your terminal and connect to the node displayed above (in this exmaple, it was ran at c1702):
```
ssh c1702
```
Show GPU metrics every 0.1 seconds:
```
watch -n 0.1 nvidia-smi
```

See the usage (%), temperature (C), etc.
In this case, as the data is not that big, the usage is about 20% only.
The results for both cases (CPU) and (GPU) should be similar, although using the GPU can be non-deterministic for some operations.

Logout from the node:
```
logout
```

Predictions are stored at output/pred_nn_*.csv.

Coding in the cluster using an Integrated Development Environment (IDE)

Two IDEs are pretty good for coding in Python:

Cursor is a new IDE with built-in LMMs. You can code and use ChatGPT-like tools inside Cursor to help you code better. You can even use your OpenAI API and choose different LLM models (gpt, claude, gemini, etc.).

Connecting to the cluster via SSH

After downloading and installing the IDE, we need to set some configurations.

Hit Ctrl + Shift + P, type Connect to Host... and hit Enter
Choose Configure SSH Hosts...
Choose the Users option (e.g., for my Windows it it C:\Users\igork\.ssh\config

Type the following (changing USERNAME by your username):

Host c1602
  HostName c1602
  ProxyJump hpc-portal2.hpc.uark.edu
  User USERNAME

Host hpc-portal2.hpc.uark.edu
  HostName hpc-portal2.hpc.uark.edu
  User USERNAME

The configuration above connects to the login node and "jumps" to the given node (in this case, c1602). This means the IDE will connect to the node c1602, which uses the cloud72 partition. If you want to connect to another node, change the Host NODE and HostName NODE parts.

Hit again Ctrl + Shift + P, type Connect to Host... and hit Enter
Choose the node you typed above.

A new window will pop up, and, if the connection was successful, you now are connect to the cluster. Open a terminal inside your IDE and you should see something like this:

c1602:igorf:~$

This means we are using the node c1602. For now on, you can code and run computational things in the terminal if you need.

Potential problems

Connection unauthorized

We tried to connect to a specific node, but this node doesn't "know" who is trying to connect. In this case, we need to add our public key in the node's authorized keys list.

Open a new terminal on your local machine (note connected to the cluster)
Create a public key (if you don't have one):
```
ssh-keygen
```
Follow the steps shown on the screen

Print your public key and copy it (change FILENAME to your filename)

# in Windows
type C:\Users\igork\.ssh\FILENAME.pub

# mac and Linux
cat ~/.ssh/FILENAME.pub

Login on the cluster (change USERNAME to your username):
```
ssh USERNAME@hpc-portal2.hpc.uark.edu
```
Connect to the node chosen above (in this case, c1602):
```
ssh c1602
```
Open the authorized keys file:
```
nano cat ~/.ssh/authorized_keys
```
Go to the last line and paste the content you copied in step 2
Hit Ctrl + X, then Y, then Enter

Running R

Load the R modules:

module load intel/21.2.0 mkl/21.3.0 R/4.3.0 gcc/11.2.1

Run:
```
R
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gpu-uark

Logging on the cluster

From the portal

From the terminal

Windows users

Mac and Linux users

Copying files to the cluster

From the portal

From the terminal

Unzipping folders

Running jobs on the cluster

Preprocessing the data

Configuring Python environment with GPU dependencies

Performing feature engineering and other steps

Train a LightGBM model

Train a neural network with PyTorch

Checking GPU metrics

Coding in the cluster using an Integrated Development Environment (IDE)

Connecting to the cluster via SSH

Potential problems

Connection unauthorized

Running R

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
logs		logs
output		output
src		src
.gitignore		.gitignore
1-preprocess-example-condo.sh		1-preprocess-example-condo.sh
1-preprocess.sh		1-preprocess.sh
2-config_python_env.sh		2-config_python_env.sh
3-create_datasets.sh		3-create_datasets.sh
4-train_lgbm.sh		4-train_lgbm.sh
5-train_nn_cpu.sh		5-train_nn_cpu.sh
5-train_nn_gpu.sh		5-train_nn_gpu.sh
README.md		README.md
data.zip		data.zip
requirements.txt		requirements.txt

igorkf/gpu-uark

Folders and files

Latest commit

History

Repository files navigation

gpu-uark

Logging on the cluster

From the portal

From the terminal

Windows users

Mac and Linux users

Copying files to the cluster

From the portal

From the terminal

Unzipping folders

Running jobs on the cluster

Preprocessing the data

Configuring Python environment with GPU dependencies

Performing feature engineering and other steps

Train a LightGBM model

Train a neural network with PyTorch

Checking GPU metrics

Coding in the cluster using an Integrated Development Environment (IDE)

Connecting to the cluster via SSH

Potential problems

Connection unauthorized

Running R

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages