Hello there. If you are reading this, we have probably already interacted in the past. I was created a long time ago, inspired by a project management tool called Program Evaluation and Review Technique (PERT), that was used to identify the critical path and slack time in projects. I still carry some of these terms with me - from 1966 all the way till today. You may know me today by different names like Synopsys PrimeTime, or Cadence Tempus - does that ring a bell? No? It’s okay, I know I am not as popular as those pretentious synthesis tools that bully me around. But I play an important role in your life, and I’m here to tell you why…
The harsh truth your synthesis tool didn’t tell you
As you may have guessed, what I do is called “Static Timing Analysis”. The cool kids call it STA, so let’s stick to that. You know how you use a modern hardware description language like Verilog to describe your digital design? Well those languages might make your life easy, but tools like me can’t comprehend all the fancy slang you use there. Am I supposed to know what “always @ posedge” means? No thank you. All I understand are the basics:
What are the inputs and outputs (Called “IO”)
Simple logic components, like gates (AND, OR, etc.) and MUXes (Called “Cells”)
Storage elements like Flip-Flops, Latches (Called “Registers”) and SRAMs (Called “Memories”)
Connections between them (called “Nets”)
So before you come asking for my help, you need to convert your fancy HDL design into something I understand called a “Netlist”. (A list of nets, get it?) That’s where the synthesis tools come in - they are just fancy translators who understand your HDL design, and break it down into the simple netlist that I understand. They don’t check whether it is even possible to have the connections you have described - they just want to please you by saying yes to everything you say. No wonder you like them so much.
But let me tell you the harsh truth - not everything you describe is realistic. You might think you are the god of chip design, but in my world, you must bow down to our god - Physics. What you are designing is not software - a chip is a physical entity that works by moving electrons. They’re much faster than anything you’ve ever seen, but they still take time to move from one place to another in the chip. Due to this fact, any logic you add in the chip will also add a delay. Your favorite synthesis tool conveniently left all this out.
Finding out exactly what this delay would be is challenging - remember, at this stage, I am doing this with no knowledge about how the chip is going to look at the very end. But even at this stage, I have something valuable that no one else has - estimated delays of different cells for the technology that the chip would be manufactured in. (In other words, a direct line to Dr. Morris Chang - ever heard of him?) I write down this information in a place called the “Cell Library” - I’m going to need to refer to this often once I get started with my work. But I still haven’t told you what my work is, have I? Before I do that, I will tell you why my job exists in the first place.
Welcome to my synchronized world
Now that you know delays are a thing in the real world, let me introduce you to the idea of synchronization in digital design. Consider this design I once worked on, where there were 5 inputs - A, B, C, D, and E, used to get the final output O. When I saw the netlist, I noticed that this design includes an AND gate, two OR gates, and a MUX, connected like this:
As you can see, I have also added the delays of each cell by checking the cell library. (You’re welcome.) Assuming no input delays, the time taken to observe the correct output O can be obtained by adding all the delays in the path between an input and the output. However, there are multiple different paths from the input and output, each with a different delay:
Path 1: Input A/B → U0 → U1 → U3 → Output O (Delay = 6 ns)
Path 2: Input C → U1 → U3 → Output O (Delay = 4 ns)
Path 3: Input C/D → U2 → U3 → Output O (Delay = 4 ns)
Path 4: Input E → U3 → Output O (Delay = 3 ns)
The purpose of any digital design, is to find the output when the input changes. Since the delays are different, you don’t exactly know when the output is ready - it could be 3 ns, or 6 ns after the inputs are applied. This makes the design pretty much useless. How do we deal with this problem? In the above example, let’s say that the input can only change once every 10 ns, and the output is also sampled with the same frequency - then, irrespective of the path between input and output, we are guaranteed to get the correct output value. In other words, by restricting when inputs to a design can change, and outputs from a design are sampled, we can ensure correct execution across many different paths - an idea called synchronization.
In order to maintain synchronization, we need to ensure that the input remains unchanged. But we have no idea where these inputs are coming from, so each input may have it’s own delays. This means we need to have a way to store the values of the inputs periodically, and use these stored values to calculate the output. This is done using a storage element like a latch or flip-flop - the general term for this is a “Register”. Let’s add registers R0 and R1 at the input and output of our design, respectively.
Each register also needs to know when to store the next input. (every 10 ns in our example.) To achieve this, we introduce a new input, and an STA tool’s best friend, the clock signal. (also known by the nickname ‘clk’.) The clock signal periodically changes from 0 to 1, and then back to 0 - each such transition is called a clock cycle. The time taken for the signal to complete this transition is called the Clock Period. In a typical register, the input to the register gets stored in the register when the clock signal goes from 0 to 1, called the positive clock edge. This allows us to store the inputs in the register R0 during a positive clock edge, then complete the evaluation of the logic, and store the final output in register R1 during the next positive clock edge.
Here’s a timing diagram to show how in the above example, input E and the intermediate output O’ are only sampled at the positive clock edge, ensuring that the correct inputs and outputs are seen after every 10 ns. (Inputs A, B, C and D are not changing in this case.)
As you can see, E’ (output of Register R0) and O (Output of Register R1) do not change between clock cycles - this is important, because each of these signals can be used as part of other logic with the assumption that the data won’t change during the clock cycle - this is very important to make your designs larger without increasing the delays indefinitely. But this synchronization comes with strings attached, and my job is to ensure you abide by certain rules. But I’m getting ahead of myself…
I’m gonna stop here for now, but we’ve just gotten started. Please share the story so that this hardworking tool gets the credit it deserves, and stay tuned for the next part.
Implementing this tool as part of a grad course made me pivot my interest in designing chips to designing chip design software.