CryoSPARC setup

CryoSPARC is setup to run on the CURC clusters using VM’s built in CUmulus. This setup is rather complex and requires special permissions to implement.

Express setup

  1. Make instance from cryosparc-base snapshot (Create a Cryosparc VM)

  2. Edit fstab on VM (Mount lab PetaLibrary)

  3. Delete and reinstall master (Install ‘master’ Cryosparc)

  4. Install worker (Install ‘worker’ Cryosparc)

  5. Set aliases (Create CURC aliases)

Full setup

Create a Cryosparc VM

We will spin up a small VM to run the ‘master’ instance of Cryosparc on CURC’s CUmulus cloud service. Currently, only the BioKEM IT admin has access to this allocaion. We will follow these instructions.

  1. Go to OpenStack

  2. Instances > Launch Instance

    • Details > Add name

    • Source > Ubuntu 20.04 LTS

    • Set volume to 16GB

    • Flavor > m5.large

    • Networks > projectnet2023-private

    • Security Groups > hpc-ssh, default, ssh-restricted, icmp, rfc-1918

    • Key Pair > add BioKEM global user’s RSA key**

  3. Associate Floating IP

    • +

    • Pool > scinet-internal

    • Allocate IP

    • Associate

Integrate SLURM

In order to submit jobs to Alpine’s SLURM environment, we need to install the rigth version of SLURM, import Alpine’s slurm config, and set up a user that has permission to submit jobs. We will be using a variation of this.

  1. Log on to the VM ssh -o KexAlgorithms=ecdh-sha2-nistp521 ubuntu@<IP>

    sudo apt-get update
    sudo apt install -y libmysqlclient-dev libjwt-dev munge gcc make
    
  2. Check SLURM version (on RC):

    ml slurm/alpine
    sbatch --version
    
  3. On VM (make sure to clone correct slurm):

    cd /opt
    sudo git clone -b slurm-22.05 https://github.com/SchedMD/slurm.git
    cd slurm
    sudo ./configure --with-jwt --disable-dependency-tracking
    sudo make && sudo make install
    sudo mkdir -p /etc/slurm
    cd /etc/slurm
    
    sudo scp <user>@login.rc.colorado.edu:/curc/slurm/alpine/etc/slurm.conf .
    sudo nano slurm.conf
    
    ControlMachine=alpine-slurmctl1.rc.int.colorado.edu
    BackupController=alpine-slurmctl2.rc.int.colorado.edu
    
  4. Edit /etc/default/useradd -> SHELL=/bin/sh to SHELL=bin/bash

  5. Make slurm user and group

    sudo groupadd -g 515 slurm
    sudo useradd -u 515 -g 515 slurm
    
  6. Make biokem user and group:

    sudo groupadd -g 2004664 biokempgrp
    sudo useradd -u 2004664 -g 2004664 biokem
    sudo mkdir /home/biokem
    sudo chown -R biokem /home/biokem
    sudo su biokem
    cd
    cp ../ubuntu/.profile .
    cp ../ubuntu/.bashrc .
    source .profile
    mkdir .ssh
    cd .ssh
    touch authorized_keys
    
  7. In future, let’s add the specific user group (also will need to edit fstab)

  8. Copy over curc.pub key

  9. Update /projects/biokem/software/biokem/users/src/lab_specific/cryosparc_vms.src

Mount lab PetaLibrary

Now we need to mount the lab’s PetaLibrary to the VM, according to CURC’s instructions.

  1. Set up directories

    exit
    sudo apt-get install sshfs
    sudo mkdir -p /pl/active/<lab's PL>
    sudo mkdir -p /pl/active/BioKEM/software/cryosparc/<lab>
    sudo chmod -R o+w /pl
    
  2. Make key pair on VM

    ssh-keygen -t ed25519
    
  3. Add key to biokem on RC

  4. Mount directories through fstab

    #User lab PL
    biokem@dtn.rc.int.colorado.edu:/pl/active/<lab> /pl/active/<lab> fuse.sshfs defaults,_netdev,allow_other,default_permissions,identityfile=/home/ubuntu/.ssh/cryo,uid=biokem,gid=biokempgrp,reconnect 0 0
    
  5. If you want to mount manually:

    sudo sshfs -o allow_other,IdentityFile=/home/ubuntu/.ssh/cryo biokem@dtn.rc.int.colorado.edu:/pl/active/<lab> /pl/active/<lab>
    

Install ‘master’ Cryosparc

Install the ‘master’ Cryosparc on the VM use their instructions. But we need to make a few important changes for this to work.

  1. Bring in presets

    sudo su biokem
    cd
    git clone https://github.com/CU-BioKEM/cryosparc_setup.git
    cd cryosparc_setup
    nano license.src -> export LICENSE_ID=" "
    mkdir ~/cryosparc
    cd ~/cryosparc
    
  2. Follow instructions

    source ../cryosparc_setup/license.src
    curl -L https://get.cryosparc.com/download/master-latest/$LICENSE_ID -o cryosparc_master.tar.gz
    tar -xf *gz
    cd ../cryosparc_setup
    
  3. Edit run_installer.sh and run

  4. Edit fix_cluster.sh to correct IP and run

  5. Start cryosparc

    source ~/.bashrc
    cryosparcm restart
    
  6. Connect cluster

    cd alpine
    nano cluster_info.json -> edit to correct worker bin path
    nano cluster_script.sh -> edit job names to cs-<lab>...
    cryosparcm cluster connect
    
  7. Edit run_first_user.sh and run

  8. The last thing to do is setup auto restarting of the instance in the event of a reboot

    crontab -e
    append this to end:
    @reboot rm /tmp/cryo*
    @reboot sleep 60 && /home/biokem/cryosparc/cryosparc_master/bin/cryosparcm restart
    

Install ‘worker’ Cryosparc

Now that we’ve installed the ‘master’ instance, we can install the worker on Alpine.

  1. Log onto RC

    ssh login10
    cd /pl/active/BioKEM/software/cryosparc
    
  2. Make a new directory for each lab

    sudo -u biokem mkdir <labname>
    cd <labname>
    
    git clone https://github.com/CU-BioKEM/cryosparc_setup.git
    cd cryosparc_setup
    
  3. Edit license.src to add correct CryoSPARC license

    nano license.src
    
    cd ..
    source cryosparc_setup/license.src
    curl -L https://get.cryosparc.com/download/worker-latest/$LICENSE_ID -o cryosparc_worker.tar.gz
    tar -xf *gz
    
    ssh login10
    ml slurm/alpine
    ainteractive
    ml cuda/11.4
    cd cryosparc_setup
    
  4. Edit run_worker_install.sh

    ./run_worker_install.sh
    
  5. Open new terminal

    cryosparc
    

    Login and try to test it out. Make sure you make all projects in PL

Create CURC aliases

To keep everything as simple for the end user as possible, I have made lab specific aliases in /projects/biokem/software/biokem/users/src/lab_specific. These will give users from each labs access to their specific Cryosparc builds.

  1. Edit cryosparc_vms.src to add easy access to VM alias <lab>-cryosparc-vm="ssh -o KexAlgorithms=ecdh-sha2-nistp521 ubuntu@<IP>" (only gives access to BioKEM IT)

  2. Update /projects/biokem/software/biokem/users/src/lab_specific/labs.src with new lab group

  3. Make lab specific functions: touch <lab>lab.src

    #cryosparc
    alias cryosparc='firefox http://<IP>:<base_port>'
    
  4. Make admin functions (may enable later, but not now)

    for USER in $(users)
      do
      if [ "$USER" == "<admin>" ]; then
        alias cryosparcm='ssh -o KexAlgorithms=ecdh-sha2-nistp521 <user>@<ip> "/home/<user>/cryosparc/cryosparc_master/bin/cryosparcm ${1}"'
      fi
      done`