There are many ways to interact with the cluster.
Access https://hpc-portal2.hpc.uark.edu/pun/sys/dashboard and put credentials. For example, my username is igorf (without the @uark.edu)
- Open Windows PowerShell
- Run:
ssh USERNAME@hpc-portal2.hpc.uark.edu
where USERNAME is your username.
- Open the Terminal
- Run:
ssh USERNAME@hpc-portal2.hpc.uark.edu
where USERNAME is your username.
Note: for both cases, if you are outside campus, you need to run first ssh USERNAME@hpc-portal2.hpc.uark.edu -p 2022 to authenticate, then run ssh USERNAME@hpc-portal2.hpc.uark.edu.
- Download the repository code: https://github.com/igorkf/gpu-uark/archive/refs/heads/main.zip
- Access the portal
- Go to Files > Home Directory
- Click Upload
- Upload the file
gpu-uark-main.zip
- Open a new terminal (PowerShell for Windows or Terminal for Mac or Linux). Do not login to the cluster yet.
- Go to the folder where you downloaded the zip file (e.g.,
Downloads):cd Downloads - Copy the repository code to the cluster (change
USERNAMEby your username):scp gpu-uark-main.zip USERNAME@hpc-portal2.hpc.uark.edu:/home/USERNAME
-
Login on the cluster:
ssh USERNAME@hpc-portal2.hpc.uark.edu -
Unzip the main file and then enter the folder:
unzip gpu-uark-main.zip cd gpu-uark-main -
Unzip the data:
unzip data.zip
Now, we are ready to run the tasks we need.
- To preprocess the data, run:
sbatch 1-preprocess.sh - To check job status, run:
squeue -u USERNAME
Some files will be written to logs and output.
Check the log:
cat logs/prep_geno.log
Check the genotypic data:
head output/geno_ok.csv
Bonus (optional):
Dr. Fernandes has two private nodes that have more computational power. To preprocess the data using the private nodes, we have to access a different operation system (OS) and then run the task:
-
Connect to the other OS:
ssh pinnacle-l12 -
Change to the code's directory:
cd gpu-uark-main -
Run the job:
sbatch 1-preprocess-example-condo.sh
The output should be the same.
- Run the configuration task:
sbatch 2-config_python_env.sh
Check the logs:
cat logs/config_python_env.out
See that a new Python environment was created. It is packed in a single folder called myenv:
ls -lht myenv
It has all the libraries (dependencies) we need to train the models using Python.
Now that we have the genotypic data cleaned (output/geno_ok.csv), we can perform other steps such as feature engineering to generate the final data for training and evaluating the models.
- Run:
sbatch 3-create-datasets.sh
Check the output files:
ls -lht output
In this case, the training data comprises trials from 2021, whereas validation data trials from 2022. Thus, this is a case of CV0-Year, where environments from a future year are untested. You could test other cross-validation scenarios, but this would demand some changes in the code.
With all the data available, now we fit and evaluate the models.
- Fit and evaluate the model:
sbatch 4-train_lgbm.sh
Check the logs:
cat logs/train_lgbm.logs
Check mean predictive ability across environments:
cat logs/train_lgbm.log | grep "mean env corr:"
Predictions are stored at output/pred_lgbm.csv.
As the cluster has GPU available, we can fit neural networks using the GPU.
- Run:
# for CPU sbatch 5-train_nn_cpu.sh# for GPU sbatch 5-train_nn_gpu.sh
It will take some minutes. We can check the job status. For example:
squeue -u igorf
gives:
(myenv) c1601:igorf:~/repos/gpu-uark$ squeue -u igorf
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
716165 cloud72 nn_cpu igorf R 7:07 1 c1601
716169 gpu06 nn_gpu igorf R 4:59 1 c1702
One model is using CPUs (partition=cloud72) and the other is using GPU (partition=gpu06).
-
Open a new tab of your terminal and connect to the node displayed above (in this exmaple, it was ran at
c1702):ssh c1702 -
Show GPU metrics every 0.1 seconds:
watch -n 0.1 nvidia-smi
See the usage (%), temperature (C), etc.
In this case, as the data is not that big, the usage is about 20% only.
The results for both cases (CPU) and (GPU) should be similar, although using the GPU can be non-deterministic for some operations.
- Logout from the node:
logout
Predictions are stored at output/pred_nn_*.csv.
Two IDEs are pretty good for coding in Python:
Cursor is a new IDE with built-in LMMs. You can code and use ChatGPT-like tools inside Cursor to help you code better. You can even use your OpenAI API and choose different LLM models (gpt, claude, gemini, etc.).
After downloading and installing the IDE, we need to set some configurations.
- Hit
Ctrl + Shift + P, typeConnect to Host...and hitEnter - Choose
Configure SSH Hosts... - Choose the Users option (e.g., for my Windows it it
C:\Users\igork\.ssh\config - Type the following (changing USERNAME by your username):
Host c1602 HostName c1602 ProxyJump hpc-portal2.hpc.uark.edu User USERNAME Host hpc-portal2.hpc.uark.edu HostName hpc-portal2.hpc.uark.edu User USERNAME
The configuration above connects to the login node and "jumps" to the given node (in this case, c1602). This means the IDE will connect to the node c1602, which uses the cloud72 partition. If you want to connect to another node, change the Host NODE and HostName NODE parts.
- Hit again
Ctrl + Shift + P, typeConnect to Host...and hitEnter - Choose the node you typed above.
A new window will pop up, and, if the connection was successful, you now are connect to the cluster. Open a terminal inside your IDE and you should see something like this:
c1602:igorf:~$
This means we are using the node c1602. For now on, you can code and run computational things in the terminal if you need.
We tried to connect to a specific node, but this node doesn't "know" who is trying to connect. In this case, we need to add our public key in the node's authorized keys list.
-
Open a new terminal on your local machine (note connected to the cluster)
-
Create a public key (if you don't have one):
ssh-keygen -
Follow the steps shown on the screen
-
Print your public key and copy it (change FILENAME to your filename)
# in Windows type C:\Users\igork\.ssh\FILENAME.pub# mac and Linux cat ~/.ssh/FILENAME.pub -
Login on the cluster (change USERNAME to your username):
ssh USERNAME@hpc-portal2.hpc.uark.edu -
Connect to the node chosen above (in this case,
c1602):ssh c1602 -
Open the authorized keys file:
nano cat ~/.ssh/authorized_keys -
Go to the last line and paste the content you copied in step 2
-
Hit
Ctrl + X, thenY, thenEnter
-
Load the R modules:
module load intel/21.2.0 mkl/21.3.0 R/4.3.0 gcc/11.2.1 -
Run:
R