[Trilinos-Users] XYCE math formulation article

Thu Jul 19 11:56:14 MDT 2007

Hi All
Seeing all the discussions on XYCE, I was wondering whether it's
available for in-house use by commercial companies? 
Best Regards 

Yaron Kretchmer

Methodology Office, Altera Corp.
Office: 408-544-7326
Cell: 510-673-2826 

-----Original Message-----
From: trilinos-users-bounces at software.sandia.gov
[mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of Jon
Engelbert
Sent: Thursday, July 19, 2007 10:23 AM
To: Eric Keiter; Neal Parsons; Rob Hoeksta
Cc: Randall Bramley; Mike Heroux; trilinos-users at software.sandia.gov
Subject: RE: [Trilinos-Users] XYCE math formulation article

Eric (and others),
First, let me express my admiration for your XYCE team... I've been
working with Spice for 15 years now, and I know first-hand how difficult
it is.  To be successful takes many skills, including software, math,
circuits, and device models, and a hell of a lot of persistence.
OK, enough fawning...

I suspected that our strategy for solving for "dx" was OK because we
implemented that as a new option for the spice engine in B2 Spice, and I
tested many circuits with both solving for X and solving for delta X.
The simulations generally were within a very fine difference (e.g. .1%)
with the default tolerances, and some circuits even converged when
solving for delta X that didn't converge when solving for X.  So
changing the equation around was worth the effort.  If you want more
details, I could send you some of these circuits.  I assume that this is
due to the numerical effects of fixed/limited precision and the greater
precision in the solution vector
when it is "Delta X" and converges to zero than when it is "X".   Again,
this is with the default sparse matrix functionality (I think it's
sparse
1.3) in berkeley spice 3.

Also, I don't know if you have a good test suite of circuits.  If you
do, please let me know about them.  I used the NCSU suite of circuits
from 1990, CircuitSim90.  Also, we have many circuits that our customers
have sent in over the years, but I can't share those.  Have you tested
xyce with circuitsim90?  If so, then how many ran to completion and gave
the correct results?

When testing with CircuitSim90, I was surprised to find that the default
berkeley spice simulator chokes on many of them... probably 30%.  I
found some mundane errors in the mos1 & mos2 models that were easy to
fix and helped some circuits to converge, and switching so that SPARSE
solves for "DX" instead of "X" helped with a few others, but there are
still many (e.g.
25%) that fail.  I don't know how many of these failures are due to
discontinuities in the mos models, but assuming that the MOS models are
not to blame, then the 'problem' is in the numerical methods.  I don't
remember how many failed to converge on an operating point, but of those
that did, the failures are probably due to limitations of the dc solver
or the non-linear algorithm (niiter/ sparse) that calls it.  The XYCE
documents on the web seem to indicate that this is the case.  Also, I
don't recall how many of these errored out in the transient solver, but
of those that did, it could be due to limitations of the transient
(time-step/integration/truncation-error) code, or limitations of the
non-linear solver.

Now that I have your ear, I'm going to rattle off a bunch of
questions... If you don't have time or interest in answering them, I'll
understand, but I would really appreciate your feedback.

Bias point:
>From the XYCE news on the web site, I learned that you switched 
>originally
to KSparse from Spice's Sparse module, and then to KLU.
It sounds like you totally replaced KSparse with KLU, and we're going to
follow your example.
Neal is trying to implement a nonlinear solver for our spice engine
using NOX & KLU using Amesos.  He hasn't found any examples showing how
to do this, and he's been struggling for many hours with it.  If you can
give him any advice, I'm sure that he'd appreciate it.
I am also wondering what options we should use in NOX... i.e., there's
trust region based and tensor based and newton.
I think that I read in the math article that voltage limiting prevents
some of these strategies from working.  Can we make use of strategies
other than Newton if we tell NOX not to use the solutions when voltage
limiting is being used?
Forgive my ignorance if these questions make no sense... As you know,
I'm a novice with numerical methodes.  Neal has been looking into this
more than I have, but I can tell that his one undergrad class in
numerical methods didn't prepare him for this.

Lambert W-Function (and expl):
The Xyce news (from version 1.1) also mentions that XYCE uses the
Lambert W-function... I think that I read that you use it as a
replacement for the exponential in the diode (and other pn junctions?)
Is this something that I should look into?  I did some brief reading
just now on this function, but I don't see how it will help.  Regarding
exponentials, there's also the
expl(x,limit) function, which behaves like an exponential up to the
limit then becomes a linear function with the slope of exp(limit).  I
can see how this would potentially help the simulator with diodes.  Can
you tell me any more about how you use Lambert-W and why?  And did your
changes using the W function have a big impact on simulator performance?

Transient considerations:
Also, we've been taking circuits from Christophe Basso SMPS book
(version 2) and trying to get them to work in B2 Spice.  None of them
failed for the initial state, but some of them had problems with the
transient (e.g.
current mode open-loop buck).  In fact, we've had many cases over the
years of circuits that fail in the transient simulation.  We can usually
get them to work if we play around with them enough, e.g. cutting down
the transient time step to a crawl, and switching to gear method, but
for a typical user, they wouldn't know the right tricks to use and it
wastes time, and sometimes the simulation takes forever to run, and this
is obviously not desirable.
This makes me think that we should enhance elements of the transient
'engine', i.e. the time stepping algorithm, or the
integration/truncation-error code.  I've read in the XYCE news that your
group has made good progress in this area.
I've read about TR-BDF2 being superior to other integration methods
(e.g.
spice's trapezoidal 'gear'... which have several 'orders', and relate to
backward euler, &/or forward euler... I'm not an expert on these).  But
I haven't implemented TR-BDF2 yet.  Also, I've thought about storing
'good'
states every so often, and detecting when the transient simulation is in
serious trouble, and then backing up to the good states and reducing the
time-step or altering other parameters at that point, then continuing
from that point in time.
Is there code in Trilinos to help with any of this stuff?  If not, can
you steer me in fruitful directions?

By the way, even PSpice has trouble with some of Basso's power-related
circuits.  They have behavioral models within them, e.g. nonlinear
controlled sources with if-then-else statements and tables of data and
max/min functions... elements that create discontinuities.  This may be
a source of the problems when simulating the circuits.  Which brings up
the topic of discontinuities and rapidly changing voltages caused by
user-defined behavioral sources, and how to deal with those.  I'll stop
there.

If you can give any advice to Neal & myself and others to give us
direction, I'd appreciate it.  I.e. pointing us to research articles
that will help us understand the issues better... or advice on which
modules in trilinos will help and how to use them.

Also, if you or any of the other xyce developers are interested in a
copy of
B2 Spice, let me know and I'll send you download instructions and a free
full license.  And if you ever want to collaborate on code related to
schematics, the simulator (I have a lot of experience with the spice
engine even though I'm relatively ignorant on the numerical methods),
post-processing, or interactive results display such as virtual
instruments, let me know.

Thanks again, and sorry if I'm asking for too much help.  Please feel
free to be direct with me if I am... I realize that I'm asking a lot of
questions, and they might not be things that you can reply to quickly.

Jon Engelbert
primary author of B2 Spice A/D
734 332-0487
jon at beigebag.com, www.beigebag.com

-----Original Message-----
From: Eric Keiter [mailto:erkeite at sandia.gov]
Sent: Wednesday, July 18, 2007 4:32 PM
To: Neal Parsons; Rob Hoeksta
Cc: trilinos-users at software.sandia.gov; jon at beigebag.com; Mike Heroux;
Randall Bramley; Eric Keiter
Subject: Re: [Trilinos-Users] XYCE math formulation article

Hi Neal,

I think I received an email from your boss a few months ago.  Sorry I
didn't reply - I was on travel at the time.

Anyway, I'm the original author of the math document.  The thing you are
doing in SPICE to produce the J*dx = -f system should work, as long as
the J*x you are subtracting is based on the correct x.  I actually
considered doing that at one point in time, (mostly for doing exact
comparisons between Xyce and SPICE) but never got around to it.

The dx that the linear solver is solving for needs to be the total dx -
ie the sum of the dx from the Newton step and also the dx due to voltage
limiting.  SPICE applies the dx_{voltlim} in an implicit way, so SPICE's
unmodified system is in fact:

  J * x^{k+1} = -f + J * x_{voltlim}

Where x_{voltlim} is the solution after voltage limiters have been
applied to it.

If you add -J*x^{k} to both sides you get:

  J* dx_{total} = -f + J * dx_{voltlim}

  where:  dx_{voltlim} = x_{voltlim}-x^{k}
          dx_{total} = dx_{Newton} + dx_{voltlim}

SPICE applies the voltlim terms internally to each device load, by
applying the limiter functions to local variables that are derived from
what is in x_{k}.  Then, when it calculates J and f, it uses the
"limited" values, and essentially throws away x_{k}.

Anyway, this is all a long way of saying that what you are doing should
work.  And, I wouldn't call it "cheating" ;  doing it your way is a lot
faster than rewriting every device model!

Also regarding your question on numerical errors;  there will of course
be roundoff differences between what you are now doing and SPICE's
original form.  However, those roundoff differences won't necessarily be
more or less accurate.  In particular, if you want to use iterative
linear solvers (you probably don't...), iterative solvers will generally
be happier if you are solving for dx, as the values in dx will tend to
be small, especially as the solution gets near convergence.

The main reason we used the DX formulation, however, is that this is
what most nonlinear solvers do, and we planned to use  the NOX solver
library.

By the way, the math document is mostly written for folks who have
numerical backgrounds but haven't ever tried to understand circuit
simulation before; it doesn't have much in it that would be pertinent to
making a circuit simulator faster or more robust.

Thanks,
Eric Keite
erkeite at sandia.gov

On 7/18/07 11:59 AM, "Neal Parsons" <nparsons at umich.edu> wrote:

> Hi Robert (etc.),
> I have a few questions/ observations about the xyce math formulation
article.
>
> First, in the Xyce Math document, it describes reformulating the 
> equations in xyce to solve for delta X instead of X.  I'm wondering 
> "how" you do that internally in the code and if our approach is sound.
> We are solving for delta X instead of X, but we aren't doing in each 
> devices' load routine.  We are doing it by "cheating".  Specifically, 
> our strategy is to generate J and the RHS with Spice's unmodified 
> device Load routines, then we are subtracting J * X[k] from the RHS to

> get "f".  So the first question I have is whether this approach is 
> inferior to modifying each device's DevLoad routine to directly place 
> "f" (or is it "-f") in the RHS.
> I'm concerned about mathematical errors related to the magnitudes of 
> the terms being subtracted (e.g. 5.000000000000000001 - 5.0 = 0).
>
> Second, regarding voltage limiting, the math article (p. 27) suggests 
> that we need to add J * (delta X hat)[k+1] to "-f".  But with our 
> "cheat", this modification to "-f" seems to be unnecessary.  Perhaps 
> this is because when the RHS is calculated by spice in the "Spice" 
> way, it already takes into account the voltage limiting?  I'm not sure

> about that.  But if you have any insight into these issues, I'd 
> appreciate hearing it.
>
>
> --
> Neal Parsons (and Jon Engelbert, my boss) University of Michigan 
> Mechanical Engineering
> Cell: 517-896-2171
>
>
> Quoting Neal Parsons <nparsons at umich.edu>:
>
>> Thank you all for your responses. Since our SPICE package generally 
>> deals with smaller circuits, it sounds like Amesos/KLU solver would 
>> be better than the iterative AztecOO solver. I'm working for a very 
>> small software company for the summer that currently uses the Berkely

>> SPICE engine, which fails roughly 1/3 of our NCSU benchmark test 
>> suite. The motivation for integrating Trilinos libraries comes from 
>> reading about XYCE's use of Trilinos.  We are hoping to get better 
>> convergence and faster results.
>>
>> Our solvers were updated to solve J*deltaX(k+1) = -f, as Xyce does.
>> We initially used AztecOO Line Search Based solver with ILU 
>> preconditioning for our solver. This gave accurate operating point 
>> results, but as noted, was generally slow for transient computations.
>> After wrestling with AztecOO for several weeks to no avail and given 
>> the advice here, it appears that my next step would be to change-over

>> from AztecOO to Amesos/KLU. I just want to make sure that this 
>> implementation will be beneficial to our application. Also, any other

>> suggestions might be helpful as well. Again, thanks for your help.
>>
>> Neal Parsons
>>
>>
>> Quoting "Hoekstra, Robert J" <rjhoeks at sandia.gov>:
>>
>>> As Mike mentioned, KLU (by Tim Davis, also the originator of the 
>>> aforementioned UMFPack) which is the default solver in the 
>>> Trilinos/Amesos package has specific optimizations for circuit 
>>> problems and should outperform the more generic tools (UMFPack and 
>>> SuperLU) for sparse direct solves.  As mentioned by both Mike and 
>>> Randy, it is very difficult to get iterative sparse methods to 
>>> perform well on this class of problems.  For the Xyce simulator, 
>>> (Sandia's in house tool) we use GMRES with ILU(k) preconditioning 
>>> only for very large scale problems (approaching a million or more 
>>> devices).  If you are bent on demonstrating an iterative method 
>>> rather than performance our default usage is along the lines of:
>>>
>>> AztecOO GMRES
>>> EpetraExt Singleton Filtering
>>> Ifpack ILU(k) w/ RCM Reordering
>>>
>>> Just using AztecOO's ILU(t) may do the trick but I am not sure if 
>>> you will get the RCM reordering by default and it's bandwidth 
>>> minimization is critical to the performance of the preconditioner.  
>>> You can probably neglect the singleton filter unless you have a very

>>> large distributed problem but not the bandwidth minimizing reorder 
>>> (approx. minimum degree works even better than RCM and is what Tim 
>>> Davis uses in KLU but it is not available in Ifpack or AztecOO).
>>>
>>> Robert Hoekstra
>>> ____________________________
>>> Electrical & Microsystems Modeling
>>> Sandia National Laboratories
>>> P.O. Box 5800 / MS 0316
>>> Albuquerque, NM 87185
>>> phone: 505-844-7627
>>> fax:      505-284-5451
>>> e-mail: rjhoeks at sandia.gov
>>> web: http://www.cs.sandia.gov
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: trilinos-users-bounces at software.sandia.gov
>>> [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of 
>>> Randall Bramley
>>> Sent: Thursday, July 12, 2007 12:37 PM
>>> To: Heroux, Michael A
>>> Cc: trilinos-users at software.sandia.gov
>>> Subject: Re: [Trilinos-Users] Slow AztecOO solve time
>>>
>>>
>>>>> I'm a student, and a beginner with Trilinos trying to use the
AztecOO
>>>>> solver in a SPICE package.   I am getting relatively slow results,
>>>
>>> Circuit simulation problems are notoriously difficult for iterative 
>>> solvers with non-specialized preconditioners like ILUt, etc. It's 
>>> not just a matter of a bad condition number, but the overall 
>>> eigenvalue distributions tend to give worst-case properties for
Krylov solvers.
>>> That would explain having a large number of iterations, but not 
>>> necessarily the time spent in memory management - unless it is used 
>>> on every iteration, or you have an lax dropping parameter set in the

>>> preconditioner (and hence many fill-in locations). So this may not 
>>> be of help on your problem but be aware that just about any 
>>> iterative solver package (e.g., Hypre, PETsC, ...) will in general 
>>> perform slowly for SPICE.
>>>
>>> There may be some specialized application-specific preconditioners 
>>> available now for circuit simulation, but I've been out of the game 
>>> for too long to know. Still, that is something I'd recommend hunting

>>> for - it would well repay a couple of hours on Google if such a 
>>> magic preconditioner has been developed. And if you do find such a 
>>> thing, please post it back to this mail list - a lot of us would be 
>>> interested and grateful.
>>>
>>> Way back when *I* was a student (shortly after the War of 1812) we 
>>> typically had to bite the bullet and use a direct sparse solver 
>>> rather than iterative methods. Or more accurately, a complete LU 
>>> decomposition followed by a few steps of iterative refinement. Ain't

>>> no such thing as purely direct or purely iterative solvers nowadays 
>>> and all practical solvers like Trilinos are essentially hybrids.
>>>
>>> Mike's recommendation is good, but I'd go further. You can get a 
>>> quick estimate of the number of nonzeros (and sparsity pattern) of 
>>> the LU factors when using SuperLU or UMFPACK. Take that number, 
>>> multiply by 8 to get the number of bytes, and see if that fits 
>>> inside the amount of memory you have on the target machine. For 
>>> parallel solves like SuperLUdist you'd want to do the same on a per 
>>> process basis.  Using memory size can also be used to determine how 
>>> much fill-in would be optimal for a solver, modulo adding in 3-10 
>>> vectors for the iterative solver.
>>>
>>> _______________________________________________
>>> Trilinos-Users mailing list
>>> Trilinos-Users at software.sandia.gov
>>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>>
>>>
>>> _______________________________________________
>>> Trilinos-Users mailing list
>>> Trilinos-Users at software.sandia.gov
>>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>>
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Trilinos-Users mailing list
>> Trilinos-Users at software.sandia.gov
>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>>
>>
>>
>
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> http://software.sandia.gov/mailman/listinfo/trilinos-users

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.10.6/902 - Release Date:
7/15/2007
2:21 PM

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.10.6/902 - Release Date:
7/15/2007
2:21 PM

_______________________________________________
Trilinos-Users mailing list
Trilinos-Users at software.sandia.gov
http://software.sandia.gov/mailman/listinfo/trilinos-users