Train it. Quantize it. Synthesize and simulate it — in hardware. All in Python.
pyrtlnet is a self-contained example of a quantized neural network that runs
end-to-end in Python. From model training, to software inference, to hardware
generation, all the way to simulating that custom inference hardware at the
logic-gate level — you can do it all right from the Python REPL. We hope you
will find pyrtlnet (rhymes with turtle-net) a complete and understandable
walkthrough that goes from TensorFlow training
to hardware simulation, with the PyRTL
hardware description language. Main features include:
-
Quantized neural network training with TensorFlow. The resulting inference network is fully quantized, so all inference calculations are done with integers.
-
Four different quantized inference implementations operating at different levels of abstraction. All four implementations produce the same output, in the same format, providing a useful framework to extend either from the top-down or the bottom-up.
-
A reference quantized inference implementation, using the standard LiteRT
Interpreter. -
A software implementation of quantized inference, using NumPy and fxpmath, to verify the math performed by the reference implementation.
-
A PyRTL hardware implementation of quantized inference that is simulated right at the logic gate level.
-
A deployment of the PyRTL hardware design to a Pynq Z2 FPGA.
-
-
A new PyRTL linear algebra library, including a composable
WireMatrix2Dmatrix abstraction and an output-stationary systolic array for matrix multiplication. -
An extensive suite of unit tests, and continuous integration testing.
-
Understandable and documented code!
pyrtlnetis designed to be, first and foremost, understandable and readable (even when that comes at the expense of performance). Reference documentation is extracted from docstrings with Sphinx.
-
Install
git. -
Clone this repository, and
cdto the repository's root directory.$ git clone https://github.com/UCSBarchlab/pyrtlnet.git $ cd pyrtlnet -
Install
uv. -
(optional) Install Verilator if you want to export the inference hardware to Verilog, and simulate the Verilog version of the hardware.
-
Run:
$ uv run tensorflow_training.py
in this repository's root directory.
tensorflow_training.pytrains a quantized neural network with TensorFlow, on the MNIST data set, and produces a quantizedtflitesaved model file, namedquantized.tflite.Sample output:
Training unquantized model. Epoch 1/10 1875/1875 [==============================] - 1s 350us/step - loss: 0.6532 - accuracy: 0.8202 Epoch 2/10 1875/1875 [==============================] - 1s 346us/step - loss: 0.3304 - accuracy: 0.9039 Epoch 3/10 1875/1875 [==============================] - 1s 347us/step - loss: 0.2944 - accuracy: 0.9145 Epoch 4/10 1875/1875 [==============================] - 1s 350us/step - loss: 0.2719 - accuracy: 0.9205 Epoch 5/10 1875/1875 [==============================] - 1s 352us/step - loss: 0.2551 - accuracy: 0.9245 Epoch 6/10 1875/1875 [==============================] - 1s 348us/step - loss: 0.2403 - accuracy: 0.9288 Epoch 7/10 1875/1875 [==============================] - 1s 350us/step - loss: 0.2280 - accuracy: 0.9330 Epoch 8/10 1875/1875 [==============================] - 1s 346us/step - loss: 0.2178 - accuracy: 0.9358 Epoch 9/10 1875/1875 [==============================] - 1s 348us/step - loss: 0.2092 - accuracy: 0.9378 Epoch 10/10 1875/1875 [==============================] - 1s 350us/step - loss: 0.2023 - accuracy: 0.9403 Evaluating unquantized model. 313/313 [==============================] - 0s 235us/step - loss: 0.1994 - accuracy: 0.9414 Training quantized model and writing quantized.tflite and quantized.npz. Epoch 1/2 1875/1875 [==============================] - 1s 410us/step - loss: 0.1963 - accuracy: 0.9426 Epoch 2/2 1875/1875 [==============================] - 1s 408us/step - loss: 0.1936 - accuracy: 0.9423 ... Evaluating quantized model. 313/313 [==============================] - 0s 286us/step - loss: 0.1996 - accuracy: 0.9413 Writing mnist_test_data.npz.
The script's output shows that the unquantized model achieved
0.9414accuracy on the test data set, while the quantized model achieved0.9413accuracy on the test data set.This script produces
quantized.tfliteandquantized.npzfiles which includes all the model's weights, biases, and quantization parameters.quantized.tfliteis a standard.tflitesaved model file that can be read by tools like the Model Explorer.quantized.npzstores the weights, biases, and quantization parameters as NumPy saved arrays.quantized.npzis read by all the provided inference implementations. -
Run:
$ uv run litert_inference.py
in this repository's root directory.
litert_inference.pyruns one test image through the reference LiteRT inference implementation.Sample output:
The script outputs many useful pieces of information:
-
A display of the input image, in this case a picture of the digit
7. This display requires a terminal that supports 24-bit color, like gnome-terminal or iTerm2. -
The displayed image is the first image in the test data set (
image_index 0). The image is in the first batch of inputs processed (batch 0), and the image is the first in its batch (batch_index 0). -
The input's
shape (12, 12), and the input's data typedtype float32. -
The output from the first layer of the network (
layer0), withshape (18,)anddtype int8. -
The output from the second layer of the network (
layer1), withshape (10,)anddtype int8. -
A bar chart displaying the network's final output, which is the data from the
layer1 outputabove. The bar chart shows the un-normalized probability that the image contains each digit.In this case, the digit
7is the most likely, with a score of93, followed by the digit3with a score of58. The digit7is labeled asactualbecause it is the actual prediction generated by the neural network. It is also labeled asexpectedbecause the labeled test data confirms that the image actually depicts the digit7.
The
litert_inference.pyscript has many command line flags:-
--start_imageselects other images to run from the test data set. -
--num_imagesdetermines how many consecutive images to run from the test data set. When--num_imagesis greater than one, the script processes several images from the test data set, and prints an overall accuracy score. -
--batch_sizedetermines how many images are processed at once. When--num_imagesis not evenly divisible by--batch_size, the last batch will be a partial batch.
All of the provided inference scripts accept these command line flags. For example:
$ uv run litert_inference.py --start_image=7 --num_images=10 --batch_size=3 LiteRT Inference image_index 7 batch 0 batch_index 0 Expected: 9 | Actual: 9 LiteRT Inference image_index 8 batch 0 batch_index 1 Expected: 5 | Actual: 6 LiteRT Inference image_index 9 batch 0 batch_index 2 Expected: 9 | Actual: 9 LiteRT Inference image_index 10 batch 1 batch_index 0 Expected: 0 | Actual: 0 LiteRT Inference image_index 11 batch 1 batch_index 1 Expected: 6 | Actual: 6 LiteRT Inference image_index 12 batch 1 batch_index 2 Expected: 9 | Actual: 9 LiteRT Inference image_index 13 batch 2 batch_index 0 Expected: 0 | Actual: 0 LiteRT Inference image_index 14 batch 2 batch_index 1 Expected: 1 | Actual: 1 LiteRT Inference image_index 15 batch 2 batch_index 2 Expected: 5 | Actual: 5 LiteRT Inference image_index 16 batch 3 batch_index 0 Expected: 9 | Actual: 9 9/10 correct predictions, 90.0% accuracy
In this case, the model mispredicts
image_index 8, which is predicted to be a5, but actually depicts a6. -
-
Run:
$ uv run numpy_inference.py
in this repository's root directory.
numpy_inference.pyruns one test image through the software NumPy and fxpmath inference implementation. This implements inference for the quantized neural network as a series of NumPy calls, using the fxpmath fixed-point math library.Sample output:
The script's layer outputs should be nearly identical to
litert_inference.py's layer outputs. Differences of ±1 may occur due to slightly different rounding behavior innumpy_inference.py. -
Run:
$ uv run pyrtl_inference.py --verilog
in this repository's root directory.
pyrtl_inference.pyruns one test image through the hardware PyRTL inference implementation. This implementation converts the quantized neural network into hardware logic, and simulates the hardware with a PyRTLSimulation.The script's layer outputs should exactly match
numpy_inference.py's layer outputs.The
--verilogflag makespyrtl_inference.pygenerate a Verilog version of the hardware, which is written topyrtl_inference.v, and a testbench written topyrtl_inference_test.v. The next step will use these generated Verilog files. -
If
verilatoris installed, run:$ verilator --trace -j 0 --binary pyrtl_inference_test.v ... - V e r i l a t i o n R e p o r t: Verilator 5.032 2025-01-01 rev (Debian 5.032-1) - Verilator: Built from 0.227 MB sources in 3 modules, into 13.052 MB in 18 C++ files needing 0.022 MB - Verilator: Walltime 3.576 s (elab=0.014, cvt=0.598, bld=2.857); cpu 0.847 s on 32 threads; alloced 164.512 MB
This converts the generated Verilog files to generated C++ code, and compiles the generated C++ code. The outputs of this process can be found in the
obj_dirdirectory. -
If
verilatoris installed, runobj_dir/Vpyrtl_inference_test:$ obj_dir/Vpyrtl_inference_test ... time 1930 layer1 output: [ 33 -48 29 58 -50 31 -87 93 9 49] argmax: 7 - pyrtl_inference_test.v:858: Verilog $finish - S i m u l a t i o n R e p o r t: Verilator 5.032 2025-01-01 - Verilator: $finish at 2ns; walltime 0.005 s; speed 329.491 ns/s - Verilator: cpu 0.006 s on 1 threads; alloced 249 MB
The final
layer1 outputprinted by the Verilator simulation should exactly match thelayer1 outputoutput frompyrtl_inference.py.
See
fpga/README.md
for instructions on running pyrtlnet inference on a
Pynq Z2 FPGA.
The reference documentation has more information on how these scripts work and their main interfaces.
Try the pyrtl_matrix.py demo script, with
uv run pyrtl_matrix.py
to see how the PyRTL systolic array multiplies matrices. Also see the
documentation for
make_systolic_array:
pyrtl_matrix.py also supports the --verilog flag, so this systolic array
simulation can be repeated with Verilator.
-
Many TODOs are scattered throughout this code base. If one speaks to you, try addressing it! Some notable TODOs:
-
Support input batching, so the various inference systems can process more than one image at a time.
-
Improve the PyRTL hardware implementation so it can be run more than once without performing a full reset.
-
Extend
WireMatrix2Dto support an arbitrary number of dimensions, not just two. Extend the systolic array to support multiplying matrices with more dimensions. This is needed to support convolutional neural networks, for example. -
Add support for block matrix multiplications, so all neural network layers can share one systolic array that processes uniformly-sized blocks of inputs at a time. Currently, each layer creates its own systolic array that's large enough to process all of its input data, which is not very realistic.
-
Support arbitrary neural network architectures. The current implementation assumes a model with exactly two layers. Instead, we should discover the number of layers, and how they are connected, by analyzing the saved model.
-
-
Add an
inference_utilto collect image input data directly from the user. It would be cool to draw a digit with a mouse or touch screen, and see the prediction generated by one of the inference implementations. -
Support more advanced neural network architectures, like convolutional neural networks or transformers.
Contributions are welcome! Please check a few things before sending a pull request:
-
Before attempting a large change, please discuss your plan with maintainers. Open an issue or start a discussion and describe your proposed change.
-
Ensure that all presubmit checks pass:
$ uv run just presubmit
This will run all tests with
pytest, check code formatting withruff format, check for lint errors withruff check, and verify that Sphinx documentation can be generated.
uv pins all pip dependencies to specific versions for reproducible
behavior. These pinned dependencies must be manually updated with
uv lock --upgrade.
When a new
minor version version of Python is
released, update the pinned Python version with uv python pin $VERSION, and
the Python version in .readthedocs.yaml.



