A series of interesting articles...
(original available at oon-list.archiv)
>> Forum:
comp.lang.fortran >> Thread: Fortran vs. C++ - good reading >> Message 1 of 124 |
|
Subject: | Fortran vs. C++ - good reading |
Date: | 03/23/2000 |
Author: | John Molitor <molitor@stat.missouri.edu> |
Hello, |
From: Herman D. Knoble <hdk@psu.edu> Newsgroups: comp.lang.fortran Subject: Re: ? How to have a FORTRAN function emulate flops for desired digits Date: Thu, 07 Sep 2000 14:30:13 -0400 Organization: Penn State University Center for Academic Computing Lines: 138 Message-ID: <e8jfrs4q76ofoa3qdcpppt9s7i09pfelg6@4ax.com> References: <39B50161.C0D18525@yahoo.com> Reply-To: hdk@psu.edu NNTP-Posting-Host: hdknt.cac.psu.edu Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Authinfo-User: hdk@psu.edu X-Newsreader: Forte Agent 1.8/32.548 One technique to illustrate to students how significant digits can affect computations is to use examples which break down very quickly and drastically. (Worse yet are ones which break down more subtly!) Back in '93 John R. Ehrman (then at IBM Santa Teresa Labs) and I did a back to back presentation at Share 75 (See Proceedings of Share 75) on this topic. John presented an excellent "Floating-point Numbers and Arithmetic Tutorial showing the various models and implementations and how arithmetic really breaks down. My presentation, there was "Reality of REAL Arithmetic's Really Wrong Results". The following example is from that presentation and has been for us, a fairly good teaching example that you don't need to limit how many digits participate in the model to make a point. ! !=== Example 1: Numeric breakdown for IEEE Arithmetic ! (or worse yet for IBM 360 Floating-point arithmetic). ! For example try: A=1000000., X=0.0000001 ! and see P computed as: 4534650. in Single Precision ! or: 4768372. in Double Precision. ! Note that P is algebraically equal to 1.0 for all A and X<>0. ! ! Penn State University Center for Academic Computing ! H. D. Knoble, August 1979. REAL (KIND=1) P,A,X DO I=1,99999 WRITE(6,*) 'Please enter A and X (or press Enter to Quit):' WRITE(6,*) '(Try values 10000., 0.0001, then 0.0001, 10000.)' READ(5,*,END=99) A,X IF (A.EQ.0 .OR. X.EQ.0) GOTO 99 P=((A+X)**2 - A**2 - 2.*A*X)/X**2 WRITE(6,100) P,A,X 100 FORMAT(' P=',G14.7,' A=',G14.7,' X=',G14.7) ENDDO 99 WRITE(6,*) '** Example 1 Ended **' PAUSE STOP END Interesting enough, the bigger the "spread" of A and X in general the bigger the "wrong" answer. However essentially a decent result is computed even with a very large spread between A and X if the rolls of A and X are reversed (X large, A, small). The problem here of course is subtracting nearly equal quantities which are formed by converting decimal numbers that have infinite binary expansions; when subtracting nearly equal quantities, the most significant digits cancel during the subtraction, thus shifting the "noise" toward the most significant digits. Then dividing by X**2 is like multiplying the "noise" by a very large number (scaling it up). "Really wrong results" here illustrate with very little arithmetic so-called "significance loss". A second part to a student exercise studying such an example (like the one above or one using the so-called "calculator formula" for Standard Deviation which subtracts two sums that will be nearly equal for a small Standard Deviation) is to make a set of "recommendations" to help avoid numerical computational difficulties. My undergrad students, working as a team, back in 1993 came up with: 1) Use Proven (refereed or commercial) library programs that are as robust (give the right result or an error message if not) as possible. For example, IMSL, NAG mathematical statistical libraries. Avoid blindly taking numerical code from textbooks and magazines unless they are refereed or presented by a proven expert in numerical computations. 2) Study (know) the possible domain of values that each program input quantity will take on and test each algorithm thoroughly, especially on the limits of that domain. (This of course is an art in itself). 3) Use at least Double Precision Arithmetic. (In more critical cases use multiple or quadruple precision arithmetic; see http://www.nersc.gov/~dhbailey/ and http://www.ozemail.com.au/~milleraj/ respectively). 4) For a given mantissa precision (computer) do not input or output numbers in ASCII (or EBCDIC) format with more decimal digits than can be handled by the conversion mapping algorithms for that precision (computer) unless you document that that's what you are doing. 5) Use relative comparisons in place of absolute comparisons of real quantities. (Also note that you can also use "fuzzy" comparisons. See http://ftp.cac.psu.edu/pub/ger/fortran/hdk/eps.f90 and extfloor.for ) 6) Avoid computations which are algebraically undefined (or singular) or which develop resul;ts whose magnitudes approach or exceed machine limits.E.g., Blindly setting results of underflows to zero can yield "Really Wrong Results". 7) Avoid mixing Fortran INTEGER and REAL quantities without knowing how they will be compiled. E.g., X=X+I**(-J) does not increment REAL typed X except when INTEGER typed I=1 or INTEGER typed J=0 and I<0. 8) Do not program a numerical model beyond your competence to either avoid significance loss or monitor it. Be especially careful when less than a pre-specified number of mantissa digits participate in a sum or when all but a few mantissa digits cancel during a subtraction. You are responsible for disasters caused by other people using your code! I'm sure your students can come up with more and better here, especially along the ethics implied by the last sentence in (8) above. Good luck with it. Skip Knoble, Penn State On Tue, 05 Sep 2000 22:21:22 +0800, Cheng Cosine <acosine@yahoo.com> wrote: -| -|Hi ALL: -| -| In FORTRAN, there are two types of real floating number, single -| -|and double precision. Single precision has 6 significant digits -| -|while double precision has 16. ( when represented in 10 based ) -| -| From what I learned from some numerical analysis books, the -| -|floating point system is machine dependent. Say there are other -| -|system used in Cray and IBM mainframe that are other than the -| -|IEEE SP/DP system. -| -| Now how can I write a FORTRAN90 code that works in lower significant -| -|number only. Say only works for 3, 4, or 10 digits? -| -| Thanks, -| -| by Cheng Cosine -| Sep/05/2k UT -| Herman D. (Skip) Knoble, Research Associate Mailto:hdk@psu.edu Web: http://www.personal.psu.edu/hdk Center for Academic Computing Penn State University 214C Computer Building University Park, PA 16802-2101 Phone:+1 814 865-0818 Fax:+1 814 863-7049
In July 1999, David H. Munro posted this terrific document. Note that the code reproduced below has been superseded and an up-to-date version can be found in the code of yorick. However, as I updated the copy below, it may still provide some useful information.
From: munro@icf.llnl.gov (David H. Munro) Subject: SIGFPE delivery Date: 28 Jul 1999 00:00:00 GMT Message-ID: <873dy88du3.fsf@yorick.llnl.gov> Content-Type: text/plain; charset=US-ASCII X-Complaints-To: abuse@lll-winken.llnl.gov X-Trace: lll-winken.llnl.gov 933269651 18025 128.115.36.163 (29 Jul 1999 17:34:11 GMT) Organization: Lawrence Livermore National Laboratory Mime-Version: 1.0 (generated by tm-edit 7.106) Reply-To: munro@icf.llnl.gov NNTP-Posting-Date: 29 Jul 1999 17:34:11 GMT Newsgroups: comp.os.linux.development.system Hello, Like many other numerical code developers, I need to get SIGFPE delivered to my programs as promptly as the hardware can do so. This is a crucial aid in debugging all numerical software, but in the case of my yorick interpreter there are serious code correctness or performance problems without SIGFPE delivery. Unfortunately, there is a bug built into the ANSI C signal() and POSIX sigaction() mechanisms on Linux and nearly every other operating system: Setting a signal handler for SIGFPE has no effect because when the system starts every program, it sets (in crt0) the exception mask bits in the FPU control register so that SIGFPE will never be delivered to the program. For years I have wondered why the signal() and sigaction() functions do not reset the FPE mask bits to deliver SIGFPE, since I cannot imagine that any program that does not need SIGFPE delivered would call signal() or sigaction() to set up a SIGFPE signal handler! Perhaps the excuse for this misbehavior is that most FPUs require detailed bits to be set to tell precisely which FPEs will generate signals: The possibilities are zero divide, invalid operation, overflow, underflow, inexact result, and (for some FPUs) denormalized operand. However, the vast majority of numerical applications care about the first three and want the others ignored, so delivery of zero divide, invalid operation, and overflow, and non-delivery of all the rest is a very obvious default behavior for programs registering a SIGFPE handler. Here's my immediate problem: A recent GNU libc release withdrew the __setfpucw function (or, more correctly, made it a static function only callable by their own crt0 code!). This reduces anyone who wants SIGFPE delivered to duplicating the assembly code used by __setfpucw -- making Linux the only OS requiring assembly code to perform this important function. (The only exception is the alpha Linux platform, which has adopted Digital's ieee_set_fp_control() function. See fpuset.c below.) I was told by Cygnus support that the new fenv.h interface replaces the __setfpucw functionality. I have read the C9X draft standard describing fenv.h very carefully, and it provides no means for setting the FPE mask bits -- you can use the fenv.h interface only to set, query, and test the bits which determine whether an exception condition has occurred. Numerical software requires that the FPU actually generate an interrupt; it doesn't do any good to know something happened long afterward. I also spoke with a Livermore representative on the C9X committee, who told me that delivery or non-delivery of any signal is not regarded as a C language issue (by way of an excuse for the fenv.h interface not providing this crucial functionality). Unless GNU libc extends the fenv.h interface, it is useless.
Note: Most notably thanks to David Munro, glibc 2.2 and later extends C99 with feenableexcept, fedisableexcept and fegetexcept declared in <fenv.h>. Previously, _FPU_SETCW declared in <fpu_control.h> could be used to perform the same task. -- Arnaud Desitter (August 2001)
I suspect that the reason __setfpucw was withdrawn has to do with the proliferation of Linux architectures. Unfortunately, the absence of any standard to replace it makes my programming job substantially harder: I only have access to Intel Linux machines, so I have a hard time testing code for my favorite OS anywhere else. If any of you can help me by building, testing, and correcting the enclosed test program on non-Intel Linux platforms, I would be extremely grateful. Any immediate help with the test program, or hope for a more rational interface in the future will be deeply appreciated. Dave Munro ------------------------------------------------------------------------ Here are two files: fpuset.c defines a function u_fpu_setup() which turns on SIGFPE delivery on every platform where I have a guess about how to do it. fputest.c is a test driver for u_fpu_setup(). It works by the unusual tactic of #including the fpuset.c code, so you only need to compile fputest.c and run the resulting program to perform the test. The following lines should work: gcc -g -O -I. fputest.c -o fputest ./fputest The program should print SIGFPE handling works properly Any other message indicates some sort of serious problem for numerical software. For example (the default behavior under Linux/Intel): $ gcc -g -O -I. -Ulinux -DFPU_IGNORE fputest.c -o fputest $ ./fputest SIGFPE handling failed on pass 1, 1./0.= inf $ ------8><---------fputest.c------8><-------fputest.c------8><----------- /* * fputest.c -- test for SIGFPE delivery */ #ifdef linux # if defined(i386) #define FPU_GCC_I86 # elif defined(alpha) #define FPU_ALPHA_LINUX # elif defined(sparc) #define FPU_GCC_SPARC # elif defined(powerpc) #define FPU_GCC_POWERPC # elif defined(m68k) #define FPU_GCC_M68K # elif defined(arm) #define FPU_GCC_ARM # endif #endif #ifndef NO_FPUSET #include "fpuset.c" #endif #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <setjmp.h> static jmp_buf u_jmp_target; extern void u_sigfpe(int sig); extern double reciprocal(double x); extern double quotient(double x, double y); extern double powpow(double x, int n); extern int decrement(int *ip); static int always_true= 0; int main(int argc, char *argv[]) { int i= 5; double zero= (double)(argc>1000); double huge= (argc>1000)? 2.0 : 1.e20; always_true= !(argc>1000); /* signal *ought* to be enough to get SIGFPE delivered * -- but it never is -- see README.fpu */ signal(SIGFPE, &u_sigfpe); #ifndef NO_FPUSET u_fpu_setup(); #endif setjmp(u_jmp_target); /* need to make sure that loop index i actually decrements * despite interrupt */ while (decrement(&i)) { printf("SIGFPE handling failed on pass %d, 1./0.= %g\n", 5-i, reciprocal(zero)); if (i) break; } if (!i) { i= 5; setjmp(u_jmp_target); while (decrement(&i)) { printf("SIGFPE handling failed on pass %d, 0./0.= %g\n", 5-i, quotient(zero, zero)); if (i) break; } if (!i) { i= 5; setjmp(u_jmp_target); while (decrement(&i)) { printf("SIGFPE handling failed on pass %d, 10.^20480= %g\n", 5-i, powpow(huge, 10)); if (i) break; } if (!i) { if (setjmp(u_jmp_target)) { puts("SIGFPE improperly generated on underflow"); i= 11; } else { double x= powpow(1./huge, 10); if (x != 0.0) printf("SIGFPE handling works, but 10.^-20480= %g\n", x); else puts("SIGFPE handling works properly"); } } } } exit(i? 2 : 0); } void u_sigfpe(int sig) { if (sig==SIGFPE) { signal(SIGFPE, &u_sigfpe); #ifndef NO_FPUSET u_fpu_setup(); #endif longjmp(u_jmp_target, 1); } else { puts("u_sigfpe called, but with bad parameter"); } exit(1); } int decrement(int *ip) { int i= *ip; if (always_true) i--; else i-= 2; *ip= i; return i>0; } double reciprocal(double x) { return 1./x; } double quotient(double x, double y) { return x/y; } extern double powpow(double x, int n) { double y= x; while (n--) y*= y; return y; } ------8><---------fpuset.c-------8><-------fpuset.c-------8><----------- /* * fpuset.c -- set up FPU to trap floating point exceptions * - this is very non-portable, not covered by ANSI C, POSIX, or even C9X * - if you port to a new platform (eg- Ultrix) please contact the author */ extern void u_fpu_setup(void); #if defined(FPU_DIGITAL) || defined(FPU_ALPHA_LINUX) /* man pages: exception_intro, ieee */ # ifdef FPU_DIGITAL # include <machine/fpu.h> # else extern void ieee_set_fp_control(long); # define IEEE_TRAP_ENABLE_INV 0x000002 # define IEEE_TRAP_ENABLE_DZE 0x000004 # define IEEE_TRAP_ENABLE_OVF 0x000008 # endif void u_fpu_setup(void) { ieee_set_fp_control(IEEE_TRAP_ENABLE_INV | IEEE_TRAP_ENABLE_DZE | IEEE_TRAP_ENABLE_OVF); } #elif defined(FPU_GCC_I86) /* see also: fpu_control.h or i386/fpu_control.h, __setfpucw function */ void u_fpu_setup(void) { unsigned int fpucw= 0x1372; __asm__ ("fldcw %0" : : "m" (fpucw)); }
/* Note: Starting glibc 2.2, one should use: #define _GNU_SOURCE 1 #include <fenv.h> feenableexcept(FE_INVALID | FE_DIVBYZERO | FE_OVERFLOW); -- Arnaud Desitter (August 2001) */
#elif defined(FPU_GCC_POWERPC) void u_fpu_setup(void) { unsigned int tmp[2] __attribute__ ((__aligned__(8))); tmp[0] = 0xFFF80000; /* More-or-less arbitrary; this is a QNaN. */ tmp[1] = 0xd0; __asm__ ("lfd 0,%0; mtfsf 255,0" : : "m" (*tmp) : "fr0"); } #elif defined(FPU_GCC_SPARC) void u_fpu_setup(void) { unsigned int fpucw= 0xd400000; /* the 4 is nonstandard arithmetic bit */ __asm__ ("ld %0,%%fsr" : : "m" (fpucw)); } #elif defined(FPU_GCC_M68K) /* works on NeXT as well as m68k Linux */ void u_fpu_setup(void) { unsigned int fpucw= 0x7400; __asm__ volatile ("fmove%.l %0, %!" : : "dm" (fpucw)); /* includes bit to trap on signalling NaN (may affect libm behavior) */ } #elif defined(FPU_GCC_ARM) void u_fpu_setup(void) { unsigned int fpucw= 0x70200; __asm__ ("wfs %0" : : "r" (fpucw)); /* includes bit to trap on signalling NaN (may affect libm behavior) */ } #elif defined(FPU_AIX) /* man pages: fp_trap, fp_enable */ #include <fptrap.h> void u_fpu_setup(void) { fp_trap(FP_TRAP_FASTMODE); fp_enable(TRP_INVALID | TRP_DIV_BY_ZERO | TRP_OVERFLOW); } #elif defined(FPU_HPUX) /* man pages: fpsetmask * library: -lm */ /* HPUX turns off FP_X_* without this (_INCLUDE_HPUX_SOURCE) */ #ifndef _HPUX_SOURCE #define _HPUX_SOURCE 1 #endif #include <math.h> void u_fpu_setup(void) { fpsetmask(FP_X_INV | FP_X_DZ | FP_X_OFL); /* 0x1c */ fpsetfastmode(1); /* fast underflows */ }
/* Note: Starting HP-UX 11.0, one should use: #ifndef _HPUX_SOURCE # define _HPUX_SOURCE 1 #endif #include <fenv.h> fesettrapenable(FE_INVALID | FE_DIVBYZERO | FE_OVERFLOW); -- Arnaud Desitter (August 2001) */
#elif defined(FPU_IRIX) /* man pages: handle_sigfpes, note lethal TRAP_FPE environment variable * library: -lfpe * note: earlier versions used get_fpc_csr/set_fpc_csr?, sys/fpu.h */ #include <sigfpe.h> static int already_set= 0; void u_fpu_setup(void) { if (!already_set) { extern void u_sigfpe(int sig); /* from handler.c (or fputest.c) */ handle_sigfpes(_ON, _EN_OVERFL|_EN_DIVZERO|_EN_INVALID, (void (*)())0, _REPLACE_HANDLER_ON_ERROR, &u_sigfpe); already_set= 1; } }
/* Another way to set the fp mask (without linking with -lfpe): #include <sys/fpu.h> union fpc_csr exp; exp.fc_word = 0; exp.fc_struct.en_invalid = 1; exp.fc_struct.en_divide0 = 1; exp.fc_struct.en_overflow = 1; set_fpc_csr(exp.fc_word); -- Arnaud Desitter (August 2001) */
#elif defined(FPU_SOLARIS) /* man pages: fpsetmask * Sun's -fnonstd compiler switch switches between __fnonstd.o * and __fstd.o under Solaris, as far as I can tell. Use FPU_IGNORE * if you do this. */ #include <ieeefp.h> void u_fpu_setup(void) { fpsetmask(FP_X_INV | FP_X_DZ | FP_X_OFL); /* this doesn't set the "nonstandard arithmetic" bit, which prevents * software emulation of IEEE gradual underflow * -- apparently no way to do this in libc (see FPU_GCC_SPARC) */ }
/* Note: Sun offers an extension to C99: fex_set_handling. -- Arnaud Desitter (August 2001) */
#elif defined(FPU_SUN4) /* man pages: ieee_handler * nonstandard_arithmetic is undocumented, but rumored * to be important to get rapid underflows * library: -lsunmath (under /usr/lang hierarchy) * may also be in -lm (standard libm)? * note: libsunmath.a is provided by Sun only if you purchase their * compilers; if you are trying to compile with gcc on a SPARC * architecture, try FPU_GCC_SPARC * Sun's -fnonstd compiler switch buggers crt1.o under SunOS 4, * as far as I can tell. Use FPU_IGNORE if you do this * (not possible with gcc?). */ static int already_set= 0; void u_fpu_setup(void) { if (!already_set) { extern void u_sigfpe(int sig); /* from handler.c (or fputest.c) */ nonstandard_arithmetic(); ieee_handler("set","common", &u_sigfpe); already_set= 1; } } #elif defined(FPU_UNICOS) /* delivers SIGFPE by default, this just arranges to trap on * libm errors as well */ static int already_set= 0; void u_fpu_setup(void) { if (!already_set) { int flag= -1; libmset(&flag); already_set= 1; } } #elif defined(FPU_MSX86) #include "float.h" #include "signal.h" _control87(EM_DENORMAL | EM_UNDERFLOW | EM_INEXACT, MCW_EM);
/* With MS VC++, compiling and linking with -Zi will permit */ /* clicking to invoke the MS C++ debugger, which will show */ /* the point of error -- provided SIGFPE is SIG_DFL. */ signal(SIGFPE, SIG_DFL); /* From f2c source. -- Arnaud Desitter (August 2001) */
#elif defined(FPU_IGNORE) void u_fpu_setup(void) { } #else #error one of the FPU_* symbols must be defined #endif ------8><------------------------8><----------------------8><-----------
$Id: Articles.html,v 1.32 2005/08/26 10:01:55 adesitter Exp $
Maintained by Arnaud Desitter.