System Level Power Budgeting

March 12, 2014, 5:58 am

≫ Next: IP Integration: Not a Simple Operation

Gabe Moretti, Contributing Editor

I would like to start by thanking Vic Kulkarni, VP and GM at Apache Design a wholly owned subsidiary of ANSYS, Bernard Murphy, Chief Technology Officer at Atrenta,and Steve Brown, Product Marketing Director at Cadence for contributing to this article.

Steve began by nothing that defining a system level power budget for a SoC starts from chip package selection and the power supply or battery life parameters. This sets the power/heat constraint for the design, and is selected while balancing functionality of the device, performance of the design, and area of the logic and on-chip memories.

Unfortunately, as Vic points out semiconductor design engineers must meet power specification thresholds, or power budgets, that are dictated by the electronic system vendors to whom they sell their products. Bernard wrote that accurate pre-implementation IP power estimation is almost always required. Since almost all design today is IP-based, accurate estimation for IPs is half the battle. Today you can get power estimates for RTL with accuracy within 15% of silicon as long as you are modeling representative loads.

With the insatiable demand for handling multiple scenarios (i.e. large FSDB files) like GPS, searches, music, extreme gaming, streaming video, download data rates and more using mobile devices, dynamic power consumed by SOCs continues to rise in spite of strides made in reducing the static power consumption in advanced technology nodes. As shown in Figure 1, the end user demand for higher performance mobile devices that have longer battery life or higher thermal limit is expanding the “power gap” between power budgets and estimated power consumption levels.

Typical “chip power budget” for a mobile application could be as follows (Ref: Mobile companies): Active power budget = 700mW @100Mbps for download with MIMO, 100mW @IDLE-mode; Leakage power <5mW with all power-domain off etc.

Accurate power analysis and optimization tools must be employed during all the design phases from system level, RTL-to-gate level sign-off to model and analyze power consumption levels and provide methodologies to meet power budgets.

Skyrocketing performance vs. limited battery & thermal limit (ref. Samsung- Apache Tech Forum)

The challenge is to find ways to abstract with reasonable accuracy for different types of IP and different loads. Reasonable methods to parameterize power have been found for single and multiple processor systems, but not for more general heterogeneous systems. Absent better models, most methods used today are based on quite simple lookup tables, representing average consumption. Si2 is doing work in defining standards in this area.

Vic is convinced that careful power budgeting at a high level also enables design of the power delivery network in the downstream design flow. Power delivery with reliable and consistent power to all components of ICs and electronic systems while meeting power budgets is known as power delivery integrity. Power delivery integrity is analogous to the way in which an electric power grid operator ensures that electricity is delivered to end users reliably, consistently and in adequate amounts while minimizing loss in the transmission network. ICs and electronic systems designed with inadequate power delivery integrity may experience large fluctuations in supply voltage and operating power that can cause system failure. For example, these fluctuations particularly impact ICs used in mobile handsets and high performance computers, which are more sensitive to variations in supply voltage and power. Ensuring power delivery integrity requires accurate modeling of multiple individual components, which are designed by different engineering teams, as well as comprehensive analysis of the interactions between these components.

Methods To Model System Behavior With Power

At present engineers have a few approaches at their disposal. Vic points out that the designer must translate the power requirements into block-level power budgeting to come up with specific metrics.

Dynamic power estimation per operating power mode, leakage power and sleep power estimation at RTL, power distribution at a glance, identification of high-power consuming areas, power domains, frequency-scaling feasibility for each IP, retention flop design trade-off, power-delivery network planning, required current consumption per voltage source and so on.

Bernard thinks that Spreadsheet Modeling is probably the most common approach. The spreadsheet captures typical application use-cases, broken down into IP activities, determined from application simulations/emulations. It also represents, for each IP in the system, a power lookup table or set of curves. Power estimation simply sums across IP values in a selected use-case. An advantage is no limitation in complexity – you can model a full smart phone including battery, RF and so on. Disadvantages are the need to understand an accurate set of use-cases ahead of deployment, and the abstraction problem mentioned above. But Steve points out that these spreadsheets are difficult to create and maintain, and fall short for identifying outlier conditions that are critical for the end users experience.

Steve also points out that some companies are adapting virtual platforms to measure dynamic power, and improve hardware / software partitioning decisions. The main barrier to this solution remains creation of the virtual platform models, and then also adding the notion of power to the models. Reuse of IP enables reuse of existing models, but they still require effort to maintain and adapt power calculations for new process nodes.

Bernard has experienced engineers that run the full RTL against realistic software loads, dump activity for all (or a large number) of nodes and compute power based on the dump. An advantage is that they can skip the modeling step and still get an estimate as good as for RTL modeling. Disadvantages include needing the full design (making it less useful for planning) and significant slowdown in emulation when dumping all nodes, making it less feasible to run extensive application experiments. Steve concurs. Dynamic power analysis is a particularly useful technique, available in emulation and simulation. The emulator provides MHz performance enabling analysis of many cycles, often times with test driver software to focus on the most interesting use cases.

Bernard is of the opinion that while C/C++/SystemC Modeling seems an obvious target, it also suffers from the abstraction problem. Steve thinks that a likely architecture in this scenario has the virtual platform containing the processing subsystem and memory subsystem and executes as 100s of MHz, and the emulator contains the rest of the SoC and a replication of the memory subsystem and executes at higher speeds and provides cycle accurate power analysis and functional debugging.

Again, Bernard wants to underscore, progress has been made for specialized designs, such as single and multiple processors, but these approaches have little relevance for more common heterogeneous systems. Perhaps Si2 work in this area will help.

↧

IP Integration: Not a Simple Operation

May 13, 2014, 8:03 am

≫ Next: Low Power Is The Norm, Not The Exception

≪ Previous: System Level Power Budgeting

Gabe Moretti, Contributing Editor

Although the IP industry is about 25 years old, it still presents problems typical of immature industries. Yet, the use of IP in systems design is now so popular one is hard press to find even one system design that does not use IP. My first reaction to the use of IP is “back to the future”.

For many years of my professional career I dealt with board level design as well as chip design. Between 1970 and 1990 IP was sold as discrete components by companies such as Texas Instruments, National, and Fairchild among many others. Their databooks described precisely how to integrate the part in a design. Although a defined standard for the contents did not exist, a de-facto standard was followed by all providers. Engineers, using the databooks information would choose the correct part for their needs and the integration was reasonably straight forward. All signals could be analyzed in the lab since pins and traces were available on the board.

Enters Complexity

As semiconductors fabrication progressed, the board became the chip, and the components are now IP modules. One would think that integration would also remain reasonably straight forward. But this is not the case. Concerns about safeguarding intellectual property rights took over and IP developers were reluctant to provide much information about the functioning of the module, afraid that its functionality would be duplicated and thus they would loose sales.

As the number of transistors on a chip increases, the complexity of porting a design from one process to the next also increases.

Figure 1. Projected number of transistors on a chip

Developers found that by providing a hard macro, that is a module already placed and routed and ready for fabrication by the chosen foundry, was the best way to protect their intellectual rights. But such strategy is costly because foundries cannot just validate every macro for free. The IP provider must be in the position to guarantee volume use by the foundry’s customers. Thus many IP modules must be synthesized. This means they must be verified.

Karthik Srinivasan Corporate Application Engineer Manager Analog Mixed Signal at Apache Design Solutions, an Ansys company, wrote that “SoC designs today integrate significant number of IPs to accelerate their design times and to reduce the risks to their design closure. But the gap in the expectations of where and how the sign-off happens between the IP and SoC designers create design issues that affect the final product’s performance and release. IP designers often validate their IPs in isolation with expectations of near ideal operating conditions. SoCs are verified and signed off with mostly abstracted or in many cases ‘black-box’ views of IPs. But as more and more high speed and noise sensitive IPs get placed next to each other or next to the core digital logic failure conditions that were not considered emerge. This worsens when these IPs share one or more power and ground supply domains. For example, when a bank of high speed DDR IPs are placed next to a bank of memories, the switching of the DDR can generate sufficient noise on the shared ground network that can adversely affect the operation of the memory.

As designs migrate to smaller technology nodes, especially those using FinFET based technologies this gap in the design closure process is going to worsen the power noise and reliability closure process.”

DDR memory blocks are becoming a greater and greater portion of a chip as the portion of functionality implemented in firmware increases. Bob Smith, Senior VP of Marketing and Business Development at Uniquify makes the case for a system view of memories.

” DDR IP is used in a wide variety of ASIC and SoC devices found in many different applications and market segments. If the device has an embedded processor, then it is highly likely that the processor requires access to external DDR memory. This access requires a DDR subsystem (DDR controller, PHY and IO) to manage the data traffic flowing to and from the embedded processor and external DDR memory.

Whether it is procured from an external source or developed by an internal IP group, almost all chip design projects rely on DDR IP to implement the on-chip DDR subsystem. The integration techniques used to implement the DDR IP in the chip design can have far reaching effects on DDR performance, chip area, power consumption and even reliability.

Figure 2. A non-optimized DDR implementation

The above fiogure illustrates a typical on-chip DDR implementation. Note that while the DDR I/Os span the perimeter of the chip, the DDR PHYs are configured as blocks and are placed in such a way that they are centered with the I/Os. As shown in the diagram, this not only wastes valuable chip area, but also creates other problems. ”

” A much more efficient way to implement the DDR subsystem IP is to deliver a DDR PHY that is exactly matched to the DDR I/O layout. By matching the PHY exactly to the I/Os, a tremendous amount of area is saved and power is reduced. Even better, the performance of the DDR can be improved since the PHY-I/O layout minimizes skew.”

Figure 3. Optimized DDR block

As process technology progresses and moves from 32 nm to 22nm and then 14 nm and so on, the role of the foundry in the place and route of an entire chip increases. In direct proportion the freedom of designers to determine the final topology of a chip decreases. Thus we are rapidly reaching the point where only hardened hard modules will be viable. The number of viable providers in the IP industry is shrinking rapidly and many significant companies have been acquired in the last three or four years by EDA companies that becoming major providers of IP products.

Synopsys started selling IP around 1990 and has now a wide variety of IP in its portfolio, mostly developed internally. Cadence, on the other hand, has built its extensive inventory of IP products mostly through acquisition.

Michael Munsey, Director of ENOVIA Semiconductor Strategy at Dassault Systemes points out that there are a number of issues to deal with regarding IP.

1. IP Sourcing: Companies are going to need a way to source IP. They will need access to a cataloging system that allows for searching of both internally developed or under development IP as well as externally available third party IP.

2. IP Governance: For internally developed IP, there needs to be systems and methodologies for handling the promotion of work in progress to company certified IP. For both internal and externally acquired IP, there needs to be a process to validate that IP, and then a system to rate the IP internally based on previous use, documentation available, and other design artifacts.

3. IP Issue Defect and Tracking: Since IP will be in use in multiple projects, a formal system is required to handle issue and defect tracking across multiple projects against all IP. If one group finds and issue with a piece of IP, all other project groups that are using that IP need to be alerted of any issues found and the plan on resolving the issue. Ideally this should be integrated into design tools that are used to assemble IP as well. If a product has already gone out the door with the defective IP, these issues need to managed and corrective actions need to ensue based on any defects found.

4. IP Security: There are different levels of protection needed for different types of IP and robust methods must be put in place to ensure the security of IP. First, company critical IP must be secure, and systems need to be put in place to make sure that the IP does not leave the company premises. If collaborating with partners, any acquired IP must also be handled so that it is only used in the designs that are being collaboratively designed. There need to be restrictions on using partner IP in design blocks which in turn can become IP in other designs. There needs to be a way to track the ‘pedigree’ of IP.

5. Variant driven platform based design: Ultimately, for companies to keep up with the shortening market windows and application driven platform design, companies will need to adopt a system where there are base platforms with pre-qualified IP that can be configured on the fly and used a s a starting point for new designs. These systems would automatically populate a design workspace with the required IP from a company approved catalog as the basis of a new design moving forward.

Integrating the Pieces

Farzad Zarrinfar, Managing Director of the Novelics Business Unit at Mentor Graphics,

provided a synthesis of the problems facing designers.

“For IP Integration, multiple IP like ‘Hard IP’, ‘Synthesizable Soft Peripheral IP’, and ‘Synthesizable Soft processor IP’ with different set of deliverables, use EDA tools for efficient ASIC/SOC designs. Selecting the optimal IP size (such as smallest embedded memory IP) is a critical design decision. While free IP is readily available, it does not always provide the best solution when compared to fee-based IP that provides much better characteristics for the specific applications.

IP integration to achieve smaller die size, lower leakage, lower dynamic power, or faster speed can provide designers with a more optimized solution that can potentially save millions of dollars over the life of the product, and better differentiate their chips in a highly competitive ASIC/SoC marketplace.”

Bill Neifert, Chief Technology Officer at Carbon Design Systems observes. “Certainly, some designers at the bleeding edge differentiate every aspect of the subsystem and their own IP, but we’re increasingly seeing others adopt whole subsystem designs and then making configuration tweaks. Think black box design and ARM’s big.LITTLE offerings are prime examples of this trend.

Of course, in order to make these configuration changes, designers need to know the exact impact of the changes that they’re making. We see users doing this a lot on our IP Exchange web portal. They will download a CPAK (Carbon Performance Analysis Kit), a pre-built system or subsystem complete with software at the bare metal or O/S level. This gets them up and running quickly but not with their exact configuration. They’ll then iterate various configuration options in order to meet their exact design goals. It’s not unusual for a design team to compile 20 different configurations for the same IP block on our portal and then compare the impact of each of these different models on system performance.

Naturally, all of this impacts the firmware team quite a bit. The software developers don’t need to know exactly what the underlying hardware is doing but the firmware team needs the exact IP configuration. The sooner these decisions can be made, the sooner they can start being productive. Integrating this level of software on to the hardware typically exposes a new round of IP optimizations that can be made as well. Therefore, it’s not unusual for IP configuration changes to happen in waves as additional pieces of IP and software are added to the system.”

Drew Wingard, CTO at Sonics points out that standards matter. “Because there are many sources for IP, the industry had to create and adhere to standards for integration. From a silicon vendors’ perspective, IP sources include third-party commercial components, internally designed blocks and cores, and customer-designed components. To meet the challenge of integrating IP components from many different places, SoC designers needed communication protocol standards. Communication protocol standards efforts began with the Virtual Socket Interface Alliance (VSIA), continued with the Open Core Protocol International Partnership (OCP-IP), and today reside with Accellera. Of course, our customers need to leverage de-facto standards such as ARM’s AMBA as well. We owe our ability to integrate IP to the fundamental communication protocol standards work that these organizations performed.”

The Challenge of Verification

Sunrise’s Prithi Ramakrishnan is concerned about system verification. “At a very high level, the main issue with IP is that the simulated environment is different from the final design environment. Analog and RF IP is dependent on process/node, foundry, layout, extraction, model fidelity, and placement. So you are either tied to just dropping it in ‘as is’ and treating it like a black box (nobody knows how it works and whether it meets the required specifications) or completely changing it (with the caveat that you can no longer expect the same results). Digital IP needs to be resynthesized followed by placement and routing, and it takes several iterations to make the IP you got work the way you want it to work. In addition, this process is extremely tool-dependent.

Finally, there are system level issues like interoperability, interface and controls (how does the IP talk to the rest of the SoC). A very important, often overlooked factor is the communication between the IP providers and the SoC implementation houses – there are documents outlining integration guidelines, but without an automated process that takes in all that information, a lot could be lost in translation.”

The issue of how well a third party IP has been verified will always hunt designers unless the industry finds a way to make IP as trustworthy as the TI 7400 and equivalent parts of the early days. Bernard Murphy, CTO at Atrenta observed: “One area that doesn’t get a lot of air-time is how a SoC verification team goes about debugging a problem around an IP. You have the old challenge – is this our bug or the IP developer’s bug? If the developer is down the hall, you can probably resolve the problem quickly. If they are now working for your biggest competitor, good luck with that. If this is a commercial IP, you work with an apps guy to circle around possibilities: maybe you are using it wrong, maybe you misunderstood the manual or the protocol, may be they didn’t test that particular configuration for that particular use-case… Then they bring in their expert and go back through the cycle until you converge on an answer. Problem is, all this burns a lot of time and you’re on a schedule. Is there a way to compress this debug cycle?”

He offers the following suggestion. “One important class of things to check for is the above didn’t test that configuration for that use-case. This is where synthesized assertions come in. These are derived automatically by the IP developer in the course of verifying the IP. They don’t look like traditional assertions (long, complex sequences of dependencies). They tend to be simpler, often non-obvious, and describe relationships not just at the boundary of the IP but also internal to the IP. Most importantly, they encode not just functionality but also the bounds of the use-cases in which the IP was tested. Think of it as a ‘signature’ for the function plus the verification of that function.”

Thomas L. Anderson, VP of Marketing at Breker Verification Systems pleads: integrate, but verify. He argues that “The truth is that most SoC teams trust integration too much and verify too little. Many SoC products hit the market only after two or three iterations through the foundry. This costs a lot of money and risks losing market windows to competitors. Most SoC teams follow a five-step verification process:

Ensure that each block, whether locally designed or licensed as IP, is well verified
Use formal methods to verify that each block has been integrated into the SoC properly
Assemble a minimal chip-level simulation testbench and run a few sanity verification tests
Hand-write some simple C tests and run on the embedded processors in simulation or emulation
Run the production software on the processors in simulation, emulation, or prototyping

The problem with this process is that the tests in steps 3 and 4 are too simple since they are hand-written. They typically verify only one block at a time, ignoring interaction between blocks. They also perform only one operation at a time, so they don’t stress cache coherency or any inherent concurrency within the design. Humans aren’t good at thinking and coding in parallel. Thorough SoC pre-silicon verification can occur only with multi-threaded, multi-processor test cases that string blocks together into realistic scenarios representing end-user applications of the chip.”

An Example of Complexity

Charlie Cheng, CEO of Kilopass gave me an example of complexity in choosing the correct IP for a design by using sparse matrix math.

With semiconductor IP comprising 90 percent of today’s semiconductor devices and memory IP accounting for over 50 percent of these complex SoCs, it’s no wonder that IP is the fastest growing sector of the overall semiconductor industry. As a result, managing third-party IP is a growing responsibility within today’s semiconductor companies. How to make the right choice from a growing quantity of IP is the major challenge facing engineering teams, purchasing departments, and executive management. The process of these groups buying IP can be viewed as a sparse matrix mathematical exercise but without the actual math formulas and data manipulation.

1.8V	Vendor A OTP
TSMC	28HP
UMC	28HP
SMIC	28HP
GLOBALFOUNDRIES	28HP

The table shows two dimensions of a multi-dimensional matrix representing the variables confronting the purchasing company teams. In this two-dimensional matrix, imagine three additional tables for 1.8v operations with the four foundries at 28HPL, 28HPM, and 28HP. Now, replicate this in a fourth dimension for the variable of 2.5v. Add a fifth and sixth dimension to the matrix for Vendor B’s OTP. If this were a mathematical evaluation, a figure of merit would be assigned to each cell in each plane of the multidimensional matrix.

For example, Vendor A’s OTP at 1.8v has JEDEC three-lot at 28HP at TSMC, UMC and GLOBALFOUNDRIES, similarly for 28HPM and 28HP but has only working silicon at 28LP. Vendor B’s OTP may not have received three lot qualification at any of the vendors on any of the processes, but may have first silicon or one or two lot qualification on one or more of the vendors. In the mathematical exercise, a figure of merit would be assigned to the being fully qualified, one or two lot qualified, first silicon or not taped out. Using matrix algebra, the formal mathematical exercise would return a result but an intuitive evaluation of the process would suggest vendor A with more three-lot qualifications at multiple foundries would have an edge over vendor B which did not.

The above exercise would be typical of the evaluation occurring in the engineering team. A similar exercise would be occurring in the purchasing department with terms and conditions presented in the licensing agreement and royalty schedule each vendor submits. Corporate and legal would perform a similar exercise.

If the mathematical exercise was actually performed and all the cells in the matrix had assigned values, then a definitive solution is easily achieved. However, if the cells throughout the matrix are sparsely populated, then the solution ends up with a probable outcome.

↧

Low Power Is The Norm, Not The Exception

September 26, 2014, 7:57 am

≪ Previous: IP Integration: Not a Simple Operation

Gabe Moretti, Senior Editor

The issue of power consumption took front stage with the introduction of portable electronic devices. It became necessary for the semiconductor industry and thus the EDA industry to develop new methods and new tools to confront the challenges and provide solutions. Thus Low Power became a separate segment of the industry. EDA vendors developed tools specifically addressing the problem of minimizing power consumption, both at the architecture, the synthesis, and the pre-fabrication stage of IC development. Companies instituted new design methodologies that focused specifically on power distribution and consumption.

Today the majority of devices are designed and fabricated with low power as a major requirement. As we progress toward a world that uses more wearable devices and more remote computational capabilities, low power consumption is a must. I am not sure that dedicating a segment to low power is relevant: it makes more sense to have a sector of the industry devoted to unrestricted power use instead.

The contributions I received in preparing this article are explicit in supporting this point of view.

General Considerations

Mary Ann White, Director of Product Marketing, Galaxy Design Platform, at Synopsys concurs with my position. She says: “Power conservation occurs everywhere, whether in mobile applications, servers or even plug-in-the-wall items. With green initiatives and the ever-increasing cost of power, the ability to save power for any application has become very important. In real-world applications for home consumer items (e.g. stereo equipment, set-top boxes, TVs, etc.), it used to be okay to have items go into standby mode. But, that is no longer enough when smart-plug strips that use sensors to automatically turn off any power being supplied after a period of non-usage are now populating many homes and Smart Grids are being deployed by utility companies. This trend follows what commercial companies have done for many years now, namely using motion sensors for efficient energy management throughout the day.”

Vic Kulkarni, Senior VP and GM, RTL Power Business Unit, at Apache Design, Inc., a wholly-owned subsidiary of ANSYS, Inc. approached the problem from a different point of view but also points out wasted power.

“Dynamic power consumed by SoCs continues to rise in spite of strides made in reducing the static power consumption in advanced technology nodes.

There are many reasons for dynamic power consumption waste – redundant data signal activity when clocks are shut off, excessive margin in the library characterization data leading to inefficient implementation, large active logic cones feeding deselected mux inputs, lack of sleep or standby mode for analog circuits, and even insufficient software-driven controls to shut down portions of the design. Another aspect is the memory sub-system organization. Once the amount of memory required is known, how should it be partitioned? What types of memories should be used? How often do they need to be accessed? All of these issues greatly affect power consumption. Therefore, design must perform power-performance-area tradeoffs for various alternative architectures to make an informed decision.”

The ubiquity of low power designs was also pointed out by Guillaume Boillet, Technical Marketing Manager, at Atrenta Inc. He told me that: “Motivations for reducing the power consumed by chips are multiple. They range from purely technical considerations (i.e. ensuring integrity and longevity of the product), to differentiation factors (i.e. extend battery life or reduce cost of cooling) to simply being more socially responsible. As a result, power management techniques, which were once only deployed for wireless applications, have now become ubiquitous. The vast majority of IC designers are now making a conscious effort to configure their RTL for efficient power partitioning and to reduce power consumption, in particular the dynamic component, which is increasingly becoming more dominant at advanced technology nodes.” Of course experience by engineers has found that minimizing power is not easy.” Guillaume continued: “The task is vast and far from being straight-forward. First, there is a multitude of techniques which are available to designers: Power gating, use of static and variable voltage domains, Dynamic Voltage and Frequency Scaling (DVFS), biasing, architectural tradeoffs, coarse and fine-grain clock gating, micro-architectural optimizations, memory management, and light sleep are only some examples. When you try combining all of these, you soon realize the permutations are endless. Second, those techniques cannot be applied blindly and can have serious implications during floor planning, timing convergence activities, supply distribution, Clock Tree Synthesis (CTS), Clock Domain Crossing management, Design For Test (DFT) or even software development.”

Low power considerations have also been at the forefront of IP designs. Dr. Roddy Urquhart is Vice President of Marketing at Cortus, a licensor of controllers, noted that: “A major trend in the electronics industry now, is the emergence of connected intelligent devices implemented as systems-on-chip (SoC) – the ‘third wave’ of computational devices. This wave consists of the use of locally connected smart sensors in vehicles, the emergence of “smart homes” and “smart buildings” and the growing Internet of Things. The majority of these types of devices will be manufactured in large volumes, and will face stringent power constraints. While users may accept charging their smartphones on a daily basis, many sensor-based devices for industrial applications, environmental monitoring or smart metering rely on the battery to last months or even a number of years. Achieving this requires a focus on radically reducing power and a completely different design approach to the SoC design.”

Architectural Considerations

Successful power management starts at the architectural level. Designers cannot decide on a tactic to conserve power once that system has already been designed, since power consumption is the result of architectural decisions aimed at meeting functional requirements. These tradeoffs are made very early in the development of an IC.

Jon McDonald, Senior Technical Marketing Engineer, at Mentor Graphics noted that: “Power analysis needs to begin at the system level in order to fix a disconnect between the measurement of power and the decisions that affect power consumption. The current status quo forces architectural decisions and software development to typically occur many months before implementation-based power measurement feedback is available. We’ve been shooting in the dark too long. The lack of visibility into the impact of decisions while they are being made incurs significant hidden costs for most hardware and software engineers. System engineers have no practical way of measuring the impact of their design decisions on the system power consumption. Accurate power information is usually not available until RTL implementation, and the bulk of power feedback is not available until the initial system prototypes are available.”

Patrick Sheridan, Senior Staff Product Marketing Manager, Solutions Group, at Synopsys went into more details.

“Typical questions that the architect can answer are:

1) How to partition the SoC application into fixed hardware accelerators and software executing on processors, determining the optimal number and type of each CPU, GPU, DSP and accelerator.

2) How to partition SoC components into a set of power domains to adjust voltage and frequency at runtime in order to save power when components are not needed.

3) How to confirm the expected performance/power curve for the optimal architecture.

To help expand industry adoption, the IEEE 1801 Working Group’s charter has been updated recently to include extending the current UPF low power specification for use in system level power modeling. A dedicated system level power sub-committee of the 1801 (UPF) Working Group has been formed, led by Synopsys, which includes good representation from system and power architects from the major platform providers. The intent is to extend the UPF language where necessary to support IP power modeling for use in energy aware system level design.” But he pointed out that more is needed from the software developers.

“In addition, power efficiency continues to be a major product differentiator – and quality concern – for the software manager. Power management functions are distributed across firmware, operating system, and application software in a multi-layered framework, serving a wide variety of system components – from multicore CPUs to hard-disks, sensors, modems, and lights – each consuming power when activated. Bringing up and testing power management software is becoming a major bottleneck in the software development process.

Virtual prototypes for software development enable the early bring-up and test of power management software and enable power-aware software development, including the ability to:

- Quickly reveal fundamental problems such as a faulty regulation of clock and voltages

- Gain visibility for software developers, to make them aware of problems that will cause major changes in power consumption

- Simulate real world scenarios and systematically test corner cases for problems that would otherwise only be revealed in field operation

This enables software developers to understand the consequences of their software changes on power sooner, improving the user-experience and accelerating software development schedules.”

Drew Wingard, CTO, at Sonics also answered my question about the importance of architectural analysis of power consumption.

“All the research shows that the most effective place to do power optimization is at the architectural level where you can examine, at the time of design partitioning, what are the collections of components which need to be turned on or can afford to be turned off. Designers need to make power partitioning choices from a good understanding of both the architecture and the use cases they are trying to support on that architecture. They need tooling that combines the analysis models together in a way that allows them to make effective tradeoffs about partitioning versus design/verification cost versus power/energy use.”

Dr. Urquhart underscored the importance of architectural planning in the development of licensable IP. “Most ‘third wave’ computational devices will involve a combination of sensors, wireless connectivity and digital control and data processing. Managing power will start at the system level identifying what parts of the device need to be always on or always listening and which parts can be switched off when not needed. Then individual subsystems need to be designed in a way that is power efficient.

A minimalist 32-bit core saves silicon area and in smaller geometries also helps reduce static power. In systems with more complex firmware the power consumed by memory is greater than the power in the processor core. Thus a processor core needs to have an efficient instruction set so that the size of the instruction memory is minimized. However, an overly complex instruction set would result in good code density but a large processor core. Thus overall system power efficiency depends on balancing power in the processor core and memory.”

Implementation Considerations

Although there is still a need for new and more powerful architectural tools for power planning, implementation tools that help designers deal with issues of power distribution and use are reaching maturity and can be counted as reliable tools by engineers.

Guillaume Boillet observed that: “Fine-grain sequential clock gating and removal of redundant memory accesses are techniques that are now mature enough for EDA tools to decide what modifications are best suited based on specific usage scenarios (simulation data). For these techniques, it is possible to generate optimized RTL automatically, while guaranteeing its equivalence vs. the original RTL, thanks to formal techniques. EDA tools can even prevent modifications that generate new unsynchronized crossings and ensure proper coding style provided that they have a reliable CDC and lint engine.”

Vic Kulkarni provided me with an answer based on sound an detailed technical theory that lead to the following: “There are over 20 techniques to reduce power consumption which must be employed during all the design phases from system level (Figure 1), RTL to gate level sign-off to model and analyze power consumption levels and provide methodologies to meet power budgets, at the same time do the balancing act of managing trade-offs associated with each technique that will be used throughout the design flow Unfortunately there is NO single silver bullet to reduce power!

Fig. 1. A holistic approach for low-power IP and IP-based SoC design from system to final sign-off with associated trade-offs [Source: ANSYS-Apache Design]

To successfully reduce power, increase signal bandwidth, and manage cost, it is essential to simultaneously optimize across the system, chip, package, and the board. As chips migrate to sub-20 nanometer (nm) process nodes and use stacked-die technologies, the ability to model and accurately predict the power/ground noise and its impact on ICs is critical for the success of advanced low-power designs and associated systems.

Design engineers must meet power budgets for a wide variety of operating conditions. For example, a chip for a smart phone must be tested to ensure that it meets power budget requirements in standby, dormant, charging, and shutdown modes. A comprehensive power budgeting solution is required to accurately analyze power values in numerous operating modes (or scenarios) while running all potential applications of the system.”

Jon McDonald described Mentor’s approach. He highlighted the need for a feedback loop between architectural analysis and implementation. “Implementation optimizations focus on the most efficient power implementation of a specific architecture. This level of optimizations can find a localized minimum power usage, but are limited by their inability to make system-wide architectural trade-offs and run real world scenarios.

Software optimizations involve efforts by software designers to use the system hardware in the most power efficient manner. However, as the hardware is fixed there are significant limitations on the kinds of changes that can be made. Also, since the prototype is already available, completing the software becomes the limiting factor to completing the system. As well, software often has been developed before a prototype is available or is being reused from prior generations of a design. Going back and rewriting this software to optimize for power is generally not possible due to time constraints on completing the system integration.

Both of these areas of power optimization focus can be vastly improved by investing more in power analysis at the system level – before architectural decisions have been locked into an implementation. Modeling power as part of a transaction-level model provides quantitative feedback to design architects on the effect their decisions have on system power consumption. It also provides feedback to software developers regarding how efficiently they use the hardware platform. Finally, the data from the software execution on the platform can be used to refine the architectural choices made in the context of the actual software workloads.

Being able to optimize the system-level architecture with quantitative feedback tightly coupled to the workload (Figure 2) allows the impact of hardware and software decisions to be measured when those decisions are made. Thus, system-level power analysis exposes the effect of decisions on system wide power consumption, making them obvious and quantifiable to the hardware and software engineers.”

Figure 2. System Level Power Optimization (Courtesy of Mentor Graphics)

Drew Wingard of Sonics underscored the advantage of having in-depth knowledge of the dynamics of Network On Chip (NOC) use.

“Required levels of power savings, especially in battery-powered SOC devices, can be simplified by exploiting knowledge the on-chip network fabric inherently contains about the transactional state of the system and applying it to effective power management (Figure 3). Advanced on-chip networks provide the capability for hardware-controlled, safe shutdown of power domains without reliance on driver software probing the system. A hardware-controlled power management approach leveraging the on-chip network intelligence is superior to a software approach that potentially introduces race conditions and delays in power shut down.”

Figure 3.On-Chip Network Power Management (courtesy of Sonics)

“The on-chip network has the address decoders for the system, and therefore is the first component in the system to know the target when a transaction happens. The on-chip network provides early indication to the SOC Power Manager that a transaction needs to use a resource, for example, in a domain that’s currently not being clocked or completely powered off. The Power Manager reacts very quickly and recovers domains rapidly enough that designers can afford to set up components in a normally off state (Dark Silicon) where they are powered down until a transaction tries to access them.

Today’s SOC integration is already at levels where designers cannot afford to have power to all the transistors available at the same time because of leakage. SOC designers should view the concept of Dark Silicon as a practical opportunity to achieve the highest possible power savings. Employing the intelligence of on-chip networks for active power management, SOC designers can set up whole chip regions with the power normally off and then, transparently wake up these chip domains from the hardware.”

Conclusion

The Green movement should be proud of its success in underlying the importance of energy conservation. Low Power designs, I am sure, was not one of its main objective, yet the vast majority of electronic circuits today are designed with the goal of minimizing power consumption. All is possible, or nearly so, when consumers demand it and, importantly, are willing to pay for it.

↧