These models were trained for ~1 Angstrom data collected on Pilatus 6M or Eiger 16M detectors:
# for predicting whether diffraction contains overlapping lattices
wget https://smb.slac.stanford.edu/~resonet/overlapping.nn # archstring=res34
# for predicting per-shot resolution estimates
wget https://smb.slac.stanford.edu/~resonet/resolution.nn # archstring=res50
Currently the models are loaded by resonet according to their resonet arch strings. Unfortunately, these are not currenty written to the .nn
model files, so the arch strings must be supplied manually before loading (hence the res34
and res50
that accompany each of the above models).
For questions running the code, or with results, submit an issue here. For installation questions, submit an issue here.
These installation instructions are for those who just wish to download the resonet models and test them on synchrotron image files. Multi-panel detectors are technically supported, but no official script exists [yet] to handle them.
Resonet has many options for processing image data. It can use fabio or dxtbx to read images from disk, or it can handle arrays already in memory (for highest throughput). As an example, and to test a resonet install on your system, we provide a DXTBX-based script for processing images currently on disk.
To install and test this script, start by downloading and installing a conda environment :
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh -b -u -p $PWD/miniforge
source miniforge/etc/profile.d/conda.sh
With conda, install a DXTBX environment which will include a python build:
conda create -n reso
conda activate reso
conda install -c conda-forge dxtbx
With the dxtbx environment, install resonet. One must clone the Github repository, and install it locally:
# get the source code
git clone --recurse-submodules https://github.com/ssrl-px/resonet.git
cd resonet
# get the build module to build the software
python -m pip install build
# build and install with pip
python -m build
python -m pip install dist/resonet-0.1.tar.gz
With that, the build should be ready. If you have some CBF files lying around, test it as follows:
# launch the Pyro4 nameserver
python -m Pyro4.naming &
# launch the resonet image Eater
resonet-imgeater resolution.nn res50 &
#Rank 0 Initializing predictor
#Rank 0 is ready to consume images... (followed by a URI)
And then, with the imgeater
actively waiting, pass some images to it using the script resonet-imgfeeder
which sends glob strings to image eater. Use the arg --maxProc 5
to only process 5 images from the glob (for testing purposes, otherwise all images in the glob will be processed):
resonet-imgfeeder "/path/to/somewhere/*cbf" 1 --maxProc 5
# Example output:
#Rank0 T34DpactE2_2_00001.cbf Resolution estimate: 4.164 Angstrom. (1/1800)
#Rank0 T34DpactE2_2_00002.cbf Resolution estimate: 3.319 Angstrom. (2/1800)
#Rank0 T34DpactE2_2_00003.cbf Resolution estimate: 2.897 Angstrom. (3/1800)
#Rank0 T34DpactE2_2_00004.cbf Resolution estimate: 3.569 Angstrom. (4/1800)
#Rank0 T34DpactE2_2_00005.cbf Resolution estimate: 3.679 Angstrom. (5/1800)
#Done. Processed 5 / 1800 shots in total.
The second argument to resonet-imgfeeder
is the number of processes running resonet-imgeater
. For this tutorial it is just 1 process, however the resonet-imgeater
can be called with mpirun
(e.g., mpirun -n 8 resonet-imgeater
), provided mpirun
is available (via openmpi, for example), as well as an accompanying install of mpi4py. The simulation-ready build below comes with MPI and mpi4py installs ready-to-go.
The real benefit of resonet
will come from running it in parallel (using e.g., MPI) and evaluating the resonet models on GPU devices, which requires a proper CUDA installation. You might need to fine-tune the PyTorch installation to match the CUDA version. In our experience, multiple processes can share the GPU device(s) in a combined MPI-GPU environment to get very high image throughput! Resonet is currently in-use at the SSRL macromolecular crystallography beamlines through the Python-based monitoring software, interceptor.
Note: this install is only necessary if one wishes to synthesize training data using CCTBX.
Create a conda environment for doing simulations:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
wget https://raw.githubusercontent.com/dials/dials/main/.conda-envs/linux.txt
bash ./Miniconda3-latest-Linux-x86_64.sh -b -u -p $PWD/miniforge
source miniforge/etc/profile.d/conda.sh
conda install -y -c conda-forge mamba
mamba create -y -n py39 --file linux.txt python=3.9
Then, install cuda, version 12+ (older versions work, but require different conda environments).
With the conda env and CUDA installed, one can build CCTBX with CUDA support. Verify CUDA is indeed installed correctly and then set the standard CUDA environment
export PATH=/usr/local/cuda-12.1/bin:$PATH
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64
Check you can see the GPU(s) using nvidia-smi
. Then, activate the conda env:
conda activate py39
Now create a subfolder for building CCTBX and get the CCTBX bootstrap script:
mkdir ~/xtal
cd xtal
wget https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py
We will first use bootstrap to download all the sources into the modules
folder:
python bootstrap.py hot update --builder=dials --python=39 --use-conda=$CONDA_PREFIX
Now, you can run the build step:
python bootstrap.py build --builder=dials --python=39 --use-conda=$CONDA_PREFIX \
--nproc=22 --config-flags="--enable_openmp_if_possible=True" --config-flags="--use_environment_flags" \
--config-flags="--enable_cuda" --config-flags="--enable_cxx11"
source build/setpaths.sh
The build step creates a build
subfolder that contains a shell script to launch the CCTBX environment. Its now a good idea to create a single startup script to launch CCTBX+CUDA environment at startup:
# example startup script:
export PATH=/usr/local/cuda-12.1/bin:$PATH
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64
source ~/xtal_reso/build/setpaths.sh
Lastly, install pytorch, a few useful tools like ipython, and lastly resonet:
# after sourcing setpaths.sh, libtbx.python will be in path
libtbx.python -m pip install torch torchvision torchmetrics jupyter ipython
libtbx.refresh
# Note, migration to PyPi is in progress, however for now one will need to build manually
# libtbx.python -m pip install resonet
git clone --recurse-submodules https://github.com/ssrl-px/resonet.git
cd resonet
libtbx.python -m build
libtbx.python -m pip install dist/resonet-0.1.tar.gz
libtbx.python patch_shebangs.py # only necessary is using CCTBX build for simulation functionality
With the above environment, you should now download the simulation meta data (2.3 GB):
wget https://smb.slac.stanford.edu/~resonet/for_tutorial.tar.gz
tar -xzvf for_tutorial.tar.gz
and then specify its location using the environment variable RESONET_SIMDATA
:
export RESONET_SIMDATA=/path/to/for_tutorial/diffraction_ai_sims_data/
Now, simulations can be run using:
resonet-simulate test_shots --nshot 10 --pdbName $RESONET_SIMDATA/pdbs/3nxs
Note, if MPI is installed the above script can be invoked with mpirun (or srun if using SLURM):
mpirun -n 6 resonet-simulate test_shots --nshots 10000 --pdbName $RESONET_SIMDATA/pdbs/3nxs
The above mpirun command took 8.03 hours using an Nvidia A100 and an Intel(R) Xeon(R) Gold 6126. In the future, this runtime will be decreased significantly after the background routines are ported to GPU. In parallel mode, each MPI-rank will write a unique output file, and these can be combined using the merge_h5s.py
script:
resonet-mergefiles test_shots test_shots/master.h5
This creates a master.h5
which can be passed directly to the training script.
The script net.py
has a lot of functionality, but is still under heavy development. Use it as follows:
resonet-train 100 test_shots/master.h5 test_opt --labelSel one_over_reso --useGeom --testRange 0 1000 --trainRange 1000 10000 --bs 64 --lr 0.01
The first argument is the number of training epochs. The second argument is the input, and the third argument is the output folder where results and a log file will be written. Note, the first epoch is usually slower than the subsequent epochs.
One can plot the training progress:
resonet-plotloss test_opt/train.log
Let's assume a model has been trained, and it is time to test its predictions for some images.
There are two simple scripts (resonet-imgeater
, resonet-imgfeeder
) supplied in the repository for testing models. These scripts use Pyro4 and MPI to for inter process communication. By exploring those scripts, one can hopefully design even more robust resonet frameworks.
First, one should launch resonet-imgeater
. The eater can be launched as a single process, or with mpirun as multiple processes:
libtbx.python -m Pyro4.naming &
mpirun -n 8 resonet-imgeater /path/to/nety_ep100.nn res50 --gpu &
Note, both of the above jobs were launched in the background. Now, the eater process will remain active, while we use resonet-imgfeeder
to send it images, in this case as python glob strings:
resonet-imgfeeder "/path/to/some/images/*cbf" 8
where the second argument simply specifies the number of processes launched with resonet-imgeater
. The eater will then write the inference results to STDOUT. Note, all images in the GLOB will be processed!