# Issue 1315: dneupd selects/calculates wrong eigenpairs if rvec = true

Reported by Valentin Zauner, Sep 27, 2013

dneupd selects/calculates wrong eigenpairs if rvec = true

Steps to reproduce the problem:

1. Have an installation of arpack-ng, lapack and blas, have fortran
and C++ compiler (C++11 capable)
installation, only include folder needed)
3. compile both arpack_test_c.cpp and arpack_test_f from
https://gist.github.com/Darkdragon84/6728023 as described below
4. play around some with arpack_test_c until the result for nev=1 is
wrong

Expected result:

after specifying which='LR', DNEUPD should calculate and return the
NEV eigenvalues with largest real part and the corresponding
eigenvectors from the arnoldi factorization computed by DNAUPD

Actual result:

if eigenvectors are also desired (rvec = true), DNEUPD sometimes
selects the wrong eigenpairs to compute, and even sometimes computes
eigenpairs which are plain wrong for nev=1. Also for nev>1 the
ordering of the eigenpairs is often scrambled, this might be related
to issue 667
http://forge.scilab.org/index.php/p/arpack-ng/issues/667/

=== FURTHER ELABORATION ===

My task is the follwing: I have a real, nonsymmetric MxM matrix A
that is guaranteed that its eigenvalue with largest real part is
real and positive (see Remark 1 below if details are of interest). I
want to calculate only this one eigenvalue and its corresponding
eigenvector. I therefore use DNAUPD with mode = 1, bmat = 'I', which
= 'LR', nev = 1, TOL=1e-15 (I also use ISHIFT = 1) and DNEUPD with
rvec = 1, howmny='A'.

I have been doing this in MATLAB (which also uses arpack in the
background) for a long time now and never got any wrong
results/wrongly selected eigenpairs. As I am now porting my code to
C++ I implemented a simple, matlab-'eigs'-like interface to ARPACK.
To represent vectors and matrices in C++ I use the Armadillo library
http://arma.sourceforge.net/

=== IMPLEMENTATION ===

I have written a C++ test program that generates random matrices of
the above kind (and saves them in binary format) and tries to
calculate nev=1 eigenvalues of largest real part and their
eigenvectors. As a control I also do this for nev=4 and compare if
nev=1 gives the right result (To be super sure I also check with
matlab).

Now on a random basis the nev=1 result is either completely off
(also not fulfilling the eigencondition A*x = lambda*x) or I just
get the wrong eigenpair (not the one with largest real part). If I
don't seed the random number generator I can reproduce the result if
I load the problematic matrix again (I generate my own random
starting vector RESID for DNAUPD, which of course uses the same seed
then).

To check if this is a C++/Fortran interface problem I have written a
separate fortran program, using the example program dndrv1.f in the
EXAMPLES directory as a template (sry this was basically my first
fortran program, so it's pretty sloppy, but does the job). This
program loads the stored matrix 'Amat.bin' and calls the same DNAUPD
+ DNEUPD combination. In many cases, where C++ fails, the fortran
program also fails, but there are also instances where only 1 of the
2 fails (so sometimes fortran fails when C++ succeeds).

It has to be noted that fails are more frequent when the matrix
dimension is higher (-> i suspected memory overlap/corruption,
see below)

The source codes can be found at:
https://gist.github.com/Darkdragon84/6728023

to build arpack_test_c.cpp (where CC is e.g. g++ or icpc): CC -g
arpack_test_c.cpp -lblas -llapack -larpack
to run arpack_test_c:
./arpack_test_c N  (where N*N is the total matrix dimension of A)
or
./arpack_test_c name.bin (to load a previously saved matrix
name.bin)

to build arpack_test_f.f (where FF is e.g. gfortran or ifort): FF -g
-I/path/to/arpack-ng/SRC/ -o arpack_test arpack_test.f -lblas
-llapack -larpack
to run (a matrix file Amat.bin generated by arpack_test_c must be
present in this directory): ./arpack_test_f

=== DELVING FURTHER ===

I started delving through the fortan code of DNAUPD and DNEUPD. As
DNAUPD seems to do its job right, I looked into DNEUPD.f. After a
few checks of debugging traces (turning on mneupd = 3 defined in
ARPACKROOT/SRC/debug.h) I found what I believe is the source of all
this evil:

In the case of failure, the ordering of the eigenvalues of the upper
Hessenberg matrix H, computed by LAPACK's dlahqr, is different than
the ordering of the Ritz values passed by DNAUPD. The logical array
'select' determines, which eigenpairs should be computed and is
computed upon the results of ARPACK's dngets (which selects the
wanted eigenvalues, in this case 'LR'). dngets however works with
the ordering of the ritz values passed by DNAUPD, whereas select is
furtheron used to select from the eigenvalues computed by dlahqr.

Ergo, if the orderings of the ritz values passed by DNAUPD and the
eigenvalues computed by dlahqr differ, wrong eigenvalues will be
selected!

I however have no idea, what triggers the ordering mismatch in my
failure cases. IMHO it would be much smarter to determine 'select'
upon the results of dlahqr, rather than the original ritz values
passed by DNAUPD, or is this not good for some reason?

This fact also sometimes causes a screwup in the ordering of
eigenpairs for nev>1. In that case mostly because we are
targeting several eigenpairs, we will also get the ones we want by
chance, but not always!

I have included some debug traces of 3 characteristic failure cases
(trace1-3.txt) here:
https://gist.github.com/Darkdragon84/6728023

If the corresponding matrices Amat1-3.bin are wanted, let me know.

I actually cannot believe I am the first person to observe this,
after all DNAUPD and DNEUPD are heavily used AFAIK. So if there is a
simple, obvious way of doing this right, please tell me what I have
missed!!

I also want to stress, that MATLAB never had this problem and is
also using arpack. From all this I am suspecting something is wrong
with the LAPACK routine dlahqr
=== BUILD ENVIRONMENT  ===

I use a machine with an intel core-i7 CPU running Linux Ubuntu 12.04
64bit. I have installed the repository BLAS (atlas 3.8.4) and LAPACK
(3.3.1) packages.
I also have an installation of the current Intel Composer 2013.sp1
(both C++ and Fortran Compiler 14.0) and MKL (which have their own
BLAS and LAPACK implementations)

I installed arpack-ng using gnu gfortran/g++ and the supplied
automake. I also produced a static libarpack-ng using the intel
Fortran Compiler following
http://forge.scilab.org/index.php/p/arpack-ng/page/Compile-arpack-ng-
without-configure/

I tried both the C++ and Fortran programs with
1.) the gnu compilers (g++ / gfortran), where I linked against the
system BLAS, LAPACK and automade ARPACK-NG
2.) the intel compilers (icpc / ifort), where I linked against MKL
and the static libarpack-ng

I also tried these combinations on other machines (laptop, work PC,
home PC, ...)

I have also performed memory checks with valgrind (to see if any
arrays overlap), which reported no errors.

I have tried several flags and switches for the compilers:
1) I use -m64 for both C++ and Fortran, both gnu and intel
2) For intel C++ and Intel Fortran to generate the static
libarpack-ng I have tried -align for both to assure that memory is
properly aligned
3) I have tried -fpic to generate the static libarpack-ng
4) I have tried to generate CPU specific code by using
-march=corei7-avx

ALL the different combinations/flags of trying the programs had the
same effect: nev=1 sometimes fails (roughly 1 out of 5-10), which is
As of now I cannot say if this is
1) an arpack bug
2) a lapack/blas bug
3) a C++/fortran interface bug (although I highly doubt it given
that the pure fortran program also fails)
4) possibly something else

=== REMARKS ===

Remark 1:
A is actually a superoperator acting on NxN matrices and is similar
to a completely positive (CP) map. The action of the superoperator
on a matrix C is defined as A(C) = herm(B)*C*B, with B a given NxN
matrix describing the superoperator A and herm() the hermitian
conjugate. By reshaping NxN matrices into vectors of length N*N one
can represent the superoperator as a MxM matrix, where M=N*N. In my
case the reshaped superoperator matrix can be constructed as A =
kron(conj(B) , B), kron is the kronecker tensor product and conj()
means complex conjugation (in my case I use real matrices for now,
so A=kron(B,B) really). Operators of this kind are guaranteed that
their dominant eigenvalue is real and positive.

Comment 1 by Valentin Zauner, Oct 3, 2013

I just saw that a similar issue 852 had been filed some time ago,
but closed due to the lack of examples:
http://forge.scilab.org/index.php/p/arpack-ng/issues/852/

Comment 2 by Jordi Gutiérrez Hermoso, Oct 3, 2013

Please be patient, I'm digesting this mammoth bug report.

Comment 3 by Edouard Canot, Oct 13, 2013

May be the following discussion helps:
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=3196

Roughly, it explains that if you get ARPACK errors with dlahqr (from
LAPACK 3.x version), you may obtain correct results by using the olf
dlahqr version provided with the original ARPACK library, and which
was from the LAPACK 2.0

Comment 4 by Edouard Canot, Oct 13, 2013

In addition to the previous comment, we can observe that the last
version of matlab seems to follow the same way...

If we search for the dlahqr symbol in libmwarpack.so, we obtain:
\$ nm libmwarpack.so | grep dlahqr
00000000000050f0 T dlahqr2_

This result would tend to demonstrate that the Matlab arpack library
itself do not use the official dlahqr of Lapack, but a renamed one,
dlahqr2, which is certainly the 2.0 version provided in old Arpack.

Hope this will help.

Comment 5 by Valentin Zauner, Oct 14, 2013

Hi Edouard!

Great, thanks for this insight! I was already suspecting a possible
change of the dlahqr routine between different LAPACK versions. As
it appears there was a major change between dlahqr from LAPACK 3.0
to 3.1 (but possibly also sooner ones), which is probably
responsible for the reordering of the eigenvalues.

For now I see 2 possible fixes:

1.) Use an old version of dlahqr.f from LAPACK 2.0 or so, rename it
to dlahqr2 (or similar), change the corresponding calls within
ARPACK and include dlahqr2 into the ARPACK distro. This would be a
pity however, because dlahqr in the newer LAPACK is much more
optimized than the old version.

2.) The workaround that I used:
I changed the routine dneupd.f quite a bit by reordering the steps
taken for rvec= .true. :

a) call dlahqr right away.
b) copy the real and imag parts of the obtained eigenvalues
(workl(iheigr) and workl(iheigi)) over the values still stored from
dnaupd (workl(irr) and workl(iri))
c) call dngets as before
d) set the select array using the ordering results from dngets but
DO NOT PERFORM ritz estimate bound checking (they are still in the
dnaupd - order)

This way the info-flag -15 becomes obsolete, but it appears to me
one will only get this error if one screws with the workspace
variables between calling dnaupd and dneupd anyways, so I

On the other hand one can use the optimized and newer version of
dlahqr while only having to do 2 additional (very cheap) memcopys of
arrays of dimension NCV.

I have attached my edited dneupd routine as dneupd2.f, it is also
available on https://gist.github.com/Darkdragon84/6728023

cheers and thanks for the clarification!

Comment 6 by Edouard Canot, Oct 14, 2013

I am looking for a matrix (not too small) for which the original
dneupd routine, linked to the Lapack-3-dlahqr, gives wrong
results...

Could you give me such a matrix?

Comment 7 by Edouard Canot, Oct 14, 2013

Valentin,

I provide, for a practical reason, this new version of dneupd2.f. It
is exactly the same algorithm as the previous one, but the two
utilitaries routines at the end (printmatvec and printmat) have been
removed. Also, the blank trailing space at some lines have been
restored, so a diff between dneupd.f and dneupd2.f will reveal only

I have also added a small comment (at lines 303-304) which show the
reason of the modification and also your name.

Comment 8 by Valentin Zauner, Oct 15, 2013

Edouard,

Hey thanks for the cleanup and the modifications and yeah, oops, I
forgot to remove the 2 aux routines at the end... ^^

About a problematic matrix: In my case this current issue is
actually of stochastic nature. So a matrix that was problematic
before can be fine after logout/login or reboot or just several
hours later. I however noticed that the bug tends to show up more
often for large matrix sizes. If you want you can try the test
programs that I have supplied in the first comment...

I don't know what causes the different behavior in the new version
of dlahqr and why it sometimes behaves like the old routine and
sometimes gives different orderings...

Anyways, I actually think it's a flaw in the original dneupd
algorithm setup to trust an external routine to always show a
certain behavior that is nowhere guaranteed.

Thanks for your help and input d_(^^)_b

Comment 9 by Valentin Zauner, Oct 15, 2013

on a side note, I think the newly introduced for-loop from line
642-646 in the source-code is not of FORTRAN77 standard, but uses

do ...
statements
end do

do [label] ...
statements
[label] continue

idk if backwards compatibility to F77 is wanted. if so, this for
loop should be edited using a (not yet used) label...

Comment 10 by Edouard Canot, Oct 15, 2013

Valentin,

Keep the do-enddo loop. It is more readable and most (if not all)
Fortran compilers understand it.

Besides, I have tried your new dneupd2.f routine in my library but,
at least for one 8x8 matrix, I got the old wrong result.

The matrix is stored in the ASCII attached file: A_8x8.dat

I'm trying to compute the two smallest eigenvalues, in the two
cases:
(i) without eigenvectors (rvec=.false.)
(ii) with eigenvectors (rvec=.true.)

The reference is Matlab-8.2 which gives the expected values:
2.5347,+ 0.0000i
2.6160,- 1.2918i

Note that Octave-3.6.4 gives also, curiously enough, the correct
above values.

Scilab-5.4.1 gives a wrong result (for sure, as it use the original
dneupd routine):
2.6160,+ 1.2918i
2.6160,- 1.2918i

Unfortunately, your dneupd2 routine gives also the wrong result. I
don't why. Even with rvec=.false. !

Comment 11 by Valentin Zauner, Oct 16, 2013

Hi Edouard,

That is unfortunate :-( I have to admit however that I was only
considering and testing the case where I need either the eigenvalues
of largest magnitude ('LM') or largest real part ('LR').

When you say smallest eigenvalue do you mean 'SR' or 'SM'? When you
look at the matlab source code for eigs they actually set for 'SM':
which = 'LM'
sigma = 0
style = 'G' (generalized!!)
mode = 3

and they do sth a little bit different in the IDO = 1 case... isk if
this is the case you are considering?

Have you checked if the ritz values from dnaupd have converged as
you wanted them? If they have and dneupd2 still gives you wrong
results then i suspect that sth is wrong with the dtrsen.f routine
which select the wanted ritz values.

I will see if I can get the eigenvalues you wanted from the matrix
you supplied if I have time.

cheers,
Valentin

Comment 12 by Edouard Canot, Oct 16, 2013

In my library also, for not too bad conditionned matrix (this is the
case for A_8x8), 'SM' implies larger eigenvalues for OP=inv(A).

I verified that all ritz values converged!

Comment 13 by Valentin Zauner, Oct 16, 2013

OK, so does that mean to obtain the smallest eigenvalues you are
using 'LM' and OP=inv(A) (which is basically 'SM')? Does dtrsen give
you the correct ordering then?

Sorry I haven't implemented anything else than 'LR' and 'LM' yet, do
it's a bit hard for me to check :-P

Comment 14 by Edouard Canot, Oct 16, 2013

>> Does that mean to obtain the smallest eigenvalues you are
using 'LM' and OP=inv(A) (which is basically 'SM')? Yes

>> Does dtrsen give you the correct ordering then? No idea!

I further checked the first option, which was "using the old
dlahqr routine, renamed dlahqr2"; for my A_8x8 matrix, calling
dnaupd/dneupd for smallest eigenvalues, with rvec=.true., gives now
the good result.

BUT, curiously enough, the same calling sequence with rvec=.true.
(i.e. no eigenvectors) gives the wrong answer!

Comment 15 by Valentin Zauner, Oct 16, 2013

hmm... you can check the output of dtrsen by setting the appropriate
debug message level (set the variable mneupd to anything > 2).

And oops, sorry, I actually always meant the routine dngets (which
orders the eigenvalues the way you want them, e.g. 'SM': smallest
first) when I said dtrsen (which does a reordering of the
schur-decomposition done by dlahqr.

So:
after calling dngets, the debug output should show an array, where
the wanted eigenvectors are IN THE LAST FEW ENTRIES.

if the desired NCONV eigenvalues are not among the first NCONV
eigenvalues obtained by dlahqr, the routine dtrsen is called to
reorder the schur-decomposition. With or without that call, in the
end the desired eigenvalues should be among the first NCONV
eigenvalues of the schur decomposition. This can all be checked by
looking at the debug info (like the .txt files I posted in my first
comment)

If all this is fine but you still get wrong results, then I suspect
something later on is going wrong, that is not called when WHICH =
'LR' or 'LM'.

>> BUT, curiously enough, the same calling sequence with
>> rvec=.true. (i.e. no eigenvectors) gives the wrong
Did you mean rvec=.false. or (i.e. WITH eigenvectors), sry I'm
confused...

cheers

Comment 16 by Edouard Canot, Oct 18, 2013

Sorry, Valentin! I have to say that your 'dneupd2.f' is correct. I
used it to many tests with my library Muesli.

The problem was the storing of eigenvectors (as in Lapack), and it
is not so easy when you want to sort them according the module of
eigenvalues, either in ascending or descending order.

My tests are not yet complete, but I think that all is good. Many
thanks, I've improved my numerical library!

To the arpack-ng maintainers, I recommend to patch the dneupd()
routine using the file 'dneupd2.f' of V. Zauner.

Regards, (and sorry again for my wrong news two days ago).

Comment 17 by Valentin Zauner, Oct 18, 2013

Muesli, lol :-)

I'm glad to hear it works in your case as well! That suggests that
there are no more adaptations to be made to the rest of the routine
after the selecting of the wanted eigenpairs.

routines.

cheers,
Valentin

Comment 18 by Jordi Gutiérrez Hermoso, Oct 24, 2013

I'm evidently falling behind applying this patch. Would either of
you feel competent enough to handle git and apply the patch
yourself? I'm asking, do either of you want to become an arpack-ng
maintainer? If Sylvester agrees, I think we could use more help.

Comment 19 by Sylvestre Ledru, Oct 24, 2013

I would be very happy to see new maintainers in arpack :)

Edouard, do you want me to give you permissions on the git
repository ?

Comment 20 by Edouard Canot, Oct 25, 2013

Jordi and Valentin, I think we have to delayed the patch of dneupd.
Yesterday, I have found a matrix for which dneupd failed, alas :-(

I must check whether it comes from my library or from dneupd.

Sylvestre, concerning the fact to become a new Arpack maintainer, I
would like to accept but for the moment I cannot :-(

Comment 21 by Valentin Zauner, Oct 29, 2013

hi guys, sorry for the delay.

Thanks for the offer as a maintainer! I however also have to refuse
as I have only very rudimentary knowledge of fortran and my time
also doesn't really permit it :-(

I am however happy to help clarify further issues with dneupd

Edouard: the debugging information that arpack provides was really
very helpful for me in finding out the culprit. Maybe it helps you
also to see what goes wrong for your special case?

cheers

Comment 22 by Jordi Gutiérrez Hermoso, Oct 30, 2013

You two have already spent far more time debugging this issue than I
have ever spent in my entire time as a maintainer of ARPACK.

All this offer gives you, really, is the right to upload a fix once
you have it ready, and it comes with absolutely no obligations other
whatever you decide to impose on yourself.

Comment 23 by Sylvestre Ledru, Nov 6, 2013

Edouard, have you been able to make some progress on the patch ?
Thanks

Comment 24 by Chris cmiller730, Feb 17, 2014

Sorry to be late to the party. I'm trying to write Haskell wrappers
to arpack and beleive that I'm encountering this issue as well. I'm
just doing a sanity check with a diagonal matrix (using the
non-symmetric routine),

A_ii = (i+1)  0 <= i < 1000.

Here's a summary of my ARPACK params

bmat = 'I'
N = 1000
which = "LM"
nev = 9
tol = 0
ncv = 2*nev + 1
rvec = 1

The computed eigenvalues should be 1000, 999,...,992. I'm getting

21.240820
209.280818
491.134969
1000.000000
999.000000
998.000000
997.000000
996.000000
995.000000

The failure mode appears to depend on nev. For nev = 1..8 I get the
expected results, breaks for 9 as described,  nev = 10 returns
[999,998,..990] (missing 1000), 11 fails in a manner similar to 9,
15 returns correctly. Sorry I got carried away. By induction it's
broken ;). All of these examples return ido = 99 info = 0

As others have reported I also always get correct results when rvec
= 0. I'm attaching c code for this example.

Comment 25 by Chris cmiller730, Feb 17, 2014

I now also confirm that the above fix to dneupd.f corrects the
problem at least in this simple test case.

What would be involved in patching this? Would updates to _neupd be
enough?

Comment 26 by Sylvestre Ledru, Feb 17, 2014

Well, to be clear, I don't have the skills to fix this bug I will be
happy to apply a merge and use the various tests cases as unit tests.

Comment 27 by Chris cmiller730, Feb 17, 2014

I don't really feel like I have the skills either. It's been a long
time since I've seriously looked at Fortran code past "how do I
call it from C?"

I guess I have three concerns

1) The problem appears to be related to a change in the way lapack
handles things. Would the fix break people using older lapack? Does
anyone care?

2) I don't really have a full grasp of what exactly the problem is
so I'm hesitant to apply someone else's fix.

3) Unless I'm missing something, the test coverage in arpack-ng
seems a little sparse. I'd worry that I'd break something down the
line.

If I have time and can get a better handle on what's going wrong
I'll try and make a patch and write a few verification tests to send
to the maintainers.

I've setup a fork at https://github.com/cmiller730/arpack-ng to work
on this in case anyone's interested in my progress.

Comment 28 by Tim Mitchell, Mar 5, 2014

Issue 1315 just caught my attention as it sounds very similar to
issue 664 which I discovered and patched a couple years ago and that
issue 1315 could be related to issue 667 which I also reported.

really is a bug here because there is more than one issue here being
conflated together, thus making it difficult to determine what is
causing what.

First, the business of migrating to LAPACK 3.0 or later routines
seems a very premature step.  It seems part of the “bug" was
caused by overlooking ARPACK instructions to use the included LAPACK
2.0 routines, which I’d say is user-error.  The “bug" should
first be replicated with the LAPACK routines included with ARPACK,
not any other version.  Migrating to a newer LAPACK version may be a
good idea long-term but feature enhancement should be a completely
separate issue from determining whether this bug exists, its cause,
and its solution, unless we are certain that 2.0 routines are
somehow at fault.  So before moving to the much modified dneupd2, I
think it would be best to focus on making sure this bug truly exists
with the regular dneupd and LAPACK 2.0 routines.

With that in mind, the next very important issue is whether the demo
codes are properly giving ARPACK a fixed initial vector.  The fact
that bug seems to only occur sometimes for a given matrix likely
indicates that the ARPACK is generating a random initial vector
every call, which would explain why Valentin Zauner is reporting
that the problem doesn’t always happen every time.  I just skimmed
Valentin Zauner’s demo codes and it looks like that indeed INFO is
set to 0, thus telling ARPACK to ignore whatever is in RESID and to
instead generate its own random initial vector every time it is
called.  See lines 123-127 of the comments for dnaupd.f for RESID
and INFO:

http://forge.scilab.org/index.php/p/arpack-ng/source/tree/master/SRC/
dnaupd.f

I did this very quickly so perhaps someone could double-check my
source-code skimming?  I am pretty confident that if RESID is
initially set to a vector of ones (or any other fixed set of values)
and INFO is set to 1, then the bug will either always occur or it
always won’t.

So I think the number one priority moving forward should be to
create a demo code that generates a random vector for RESID with
INFO := 1 and then stores RESID to disk before calling ARPACK.  That
way, if the bug then occurs with that initial vector, we can load up
the copy of the initial RESID vector on disk and can replicate the
bug every time with a specific matrix and that specific initial
vector.

Krylov methods are of course sensitive to the choice of initial
vector and it could be that sometimes just a bad initial vector is
randomly generated and as a result, not all the desired Ritz values
converge, which would be normal.  With a really bad choice of
initial vector (at least non-zero though… since zero would indeed be
very very bad), ARPACK may not even have a single Ritz value
converge before hitting its max iteration count.  To confirm that we
have a bug, we need to confirm that indeed ARPACK is returning
undesired / not-converged Ritz values with eigenvectors when in
fact, ARPACK has already computed more desired converged Ritz values
that it isn’t returning for some reason.  It’s not yet clear to me
that this is really happening and we can’t do that test unless we
have a fixed initial vector and matrix that supposedly replicates
the problem.

To sum up, my advice at this point is the following:

1) Use only the LAPACK routines included with ARPACK for the timing
being and not the modified dneupd2

2) obtain a fixed example matrix A and a fixed initial vector x0
that replicates the problem every time

3) then with this A and x0, verify whether Matlab replicates the
same behavior or not.  At this point, I doubt the claim that the
problem doesn’t occur in Matlab only because the initial vector and
LAPACK routines don’t seem to have been adequately controlled for.

If after all that, we have a matrix A and vector x0 which causes a
problem with ARPACK-NG but in Matlab, then I think we’ll have
verified that something fishy is going on here and we can start
looking into whether ARPACK is not properly returning the desired
set of converged Ritz values and corresponding eigenvectors.

Unfortunately, I don’t have the time at the moment to do the
investigative legwork at the moment, but if someone else here can
follow up on getting a concrete problematic example generated that
recreates the problem every time, I’d be happy to consult on it and
time permitting, get more involved then.

Comment 29 by Valentin Zauner, Mar 5, 2014

Hi Tim,

Thank you for your detailed analysis of what you choose not to call
a bug. If indeed the ARPACK-NG documentation specifically points out
to use included LAPACK 2.0 routines then I agree with you that the
problem I described is user induced, although somewhat finicky.

Could you please point out to me where exactly this is stated (and
it has to be in the NG documentation, not the "old" ARPACK
one!).

Even if it is so, the LAPACK routine that causes the problem
(dlahqr) is actually not meant to be called directly by the user
from what I can tell from the LAPACK documentation. So we could
continue to be finicky along that direction...

Let me elaborate on your other points:

1) This has already been suggested as the simplest bugfix

2) Actually this is of course what I tried to supply in the
beginning, as of course it would greatly help to have a set of input
data to ALWAYS reproduce the error. The problem however is the
following:

Let's say you give me a fixed matrix A. I run my code many times
with a random initial vectors x0 until I see the problem occuring. I
save A and the problematic x0.

If I then load x0 again in the next run and use it as an initial
vector I can reproduce the problem many times over and over (I tried
all this!).

If however in the meantime I run some other program, close/open some
programs or reboot the machine, the same x0 which caused problems
before is now fine. So if I then give you the A and x0 that I saved
you will probably not be able to reproduce the problem.

I suspect that the routine dlahqr also uses some stochastic
algorithm and somehow this gets messed up by
opening/closing/rebooting etc. Anyways there is some undtable
behavior and that's trouble enough.

3) Every problematic x0 I double checked with MATLAB. MATLAB has
NEVER EVER failed a single time (the considered matrices were well
behaved). I don't know exactly what you want to say in 3) but as
stated above somewhere, MATLAB actually also commits what you call a
user-error by using a current release of LAPACK and just using the
LAPACK 2.0 version of dlahqr. In this sense, MATHWORKS has also
found and fixed this bug and confirms that the routine dlahqr is the
culprit.

on a side remark: I have confirmed that the requested ritz values
have converged but ARPACK just returns the wrong eigenpairs by
supplying debug traces (if you care you can skim through them at
https://gist.github.com/Darkdragon84/6728023).

To conclude:
- even if ARPACK tells you to use LAPACK 2.0 I think this is way out
of date and users should be at least SPECIFICALLY pointed to that
issue.
- I actually just wanted to check if anyone else experienced this
problem and alongside supplied my own (admittedly) dirty hack to fix
the problem. It works fine for me, I never requested it to be
included in the library. Whoever likes it can use it, if not, well
then not :-)

Hope this clarifies a few things, if not, let's exchange another few
hundred lines ;-)

cheers,
Valentin

Comment 30 by Chris cmiller730, Mar 5, 2014

If it says only use the included lapack that's fine. But I didn't

4. Unlike ARPACK, ARPACK-NG is providing autotools based build
system.
Therefor, the classical:
\$ ./configure
\$ make
\$ make install
should work as expected.

If nothing else the build script should be changed so it doesn't

I'm also 99.9999% sure the problem has nothing to do with
sensitivity to starting vectors.

1) ARPACK reports convergence when the results clearly violate the
convergence criterion. Even if the failure is related to starting
vectors this would still be a bug.

2) The results are always wrong for me in exactly the same ways.
I've also tried seeding with a vector of all ones and and info = 1 I
get the same incorrect results.

3) It's failing on matricies it has no buisness failing on
(Diagonal) no matter what the starting vector is.

I don't have matlab but I've verified that the matrix from my trial
script works with scipy's sparse.linalg.eigs.

Comment 31 by Tim Mitchell, Mar 5, 2014

Hi Valentin and Chris,

clarify my stance, I wasn’t stating that what you've reported is not
a bug or that is a bug.  All I was claiming is that it wasn’t really
clear what was being reported.  Given the additional replies by you
both, it does sound like something wrong is going on.

Concerning the initial vector issue, it also wasn’t clear to me from
the codes provided here that is was accounted for, since INFO was
set to 0 and I didn’t see any code relevant to storing or uploading
a specific initial vector in either Valentin’s or Chris’s demo
codes.  It’s good to know that they are both reporting that problems
occur even with a vector of 1’s as input.  However, even with a
fixed initial vector of 1’s, does the problem happen every time for
you Chris or is it intermittent like Valentin reports?  If it’s
intermittent, can you at least get to occur “on demand” quickly by
calling ARPACK-NG with the same matrix and initial vector in a loop
over and over again?

Concerning using the newer LAPACK routines, is it confirmed that
this bug you only happens when using the newer 3.x versions of
LAPACK and not the older 2.0 release that ARPACK included with it
and that ARPACK-NG stripped out of its source tree?   I could have
sworn I had seen something regarding only using certain version of
routines in LAPACK with ARPACK but this was long time ago and I
can’t find it now so maybe I am mistaken.  So yeah, my use of
“user-error” was probably too harsh.  Still, I think expecting
is on a plane of optimism that I neither possess nor even imagine
having.  :)  It’s been nearly 20 years (!) since ARPACK was first
released and it hasn’t been maintained by the original authors for a
/long/ time.

If it’s just the newer LAPACK routines that are inducing the problem
in ARPACK-NG, priority number one should be instructing ARPACK-NG
users and distros to only link to LAPACK 2.0 (or the most recent
non-problematic release) for the time being until a new official
update of ARPACK-NG is ready to be released with compatibility with
the latest LAPACK versions.   Certainly correctness of results are
more important than any speed gains given by newer LAPACK code.

Comment 32 by Roland Wirth, May 14, 2014

> I could have sworn I had seen something regarding only using
certain version of routines in LAPACK with ARPACK but this was long
time ago and I can’t find it now so maybe I am mistaken.

The passage you are referring to is in the README of the original
ARPACK package. There, they strongly recommend that you use the
bundled LAPACK if the system LAPACK is not 2.0. They also state that
the current LAPACK was 3.0 at the time of writing. So, it might be
that you have rediscovered a long-forgotten incompatibility.

Comment 33 by Tim Mitchell, May 14, 2014

Hi Roland,

Thank you so much for reminding me where I had seen that warning
concerning using LAPACK 2.0 instead of 3.0 with ARPACK.  I knew I
had seen it somewhere but I was rather swamped with other things
last time I checked in here to go digging around for it.

It seems the proper course of action moving forward is to put the
2.0 LAPACK code included with the official ARPACK back into the
arpack-ng tree, such that it is always linked to upon compiling,
along with new documentation warnings that specifically detail this
incompatibility with LAPACK 3.0.  I'd recommend also putting such a
comment/warning in the makefile so no one in the future falls into
the same trap.

Once that is done, I believe issue 1315 should then be changed from
"type defect" to "type enhancement" and when
someone in the future has time and motivation, the arpack-ng code
can be updated in a future release to properly support LAPACK 3.0,
perhaps by following the work Valentin has already done on that
front.

Thanks to everyone for bringing attention to this regression that
had been compared to the official ARPACK.

Comment 34 by Jordi Gutiérrez Hermoso, May 14, 2014

This isn't an "unofficial" ARPACK. We contacted the ARPACK
developers telling them that we were doing this, and they agreed to
it (well, they didn't disagree to it, they just never really
responded except with promises to maintain ARPACK, which never
happened).

We shouldn't be packaging our own LAPACK either. This isn't a real
fix. The whole point of creating arpack-ng in the first place was to
avoid shipping embedded libraries. The real fix is to figure out how
to make ARPACK work with LAPACK other than 2.0.

Comment 35 by Valentin Zauner, May 15, 2014

I agree with Jordi.

As I gather, the point of arpack-ng is exactly to provide more than
the (not well maintained) original ARPACK, i.e. also ensure
compatibility with current libraries. In our case this is just
LAPACK and BLAS. I understand that this might be a cumbersome task,
but otherwise I see no point in arpack-ng in the first place. Sorry
if this might sound a bit harsh, but I think otherwise arpack-ng
would just be the usual ARPACK with a nice make environment...

Comment 36 by Sylvestre Ledru, May 15, 2014

Clearly, we won't be shipping a copy of LAPACK in Arpack-ng.

However, I will be happy to apply any patch to support LAPACK 3.0
(but I won't be the one doing the work, sorry).

Comment 37 by Valentin Zauner, May 15, 2014

OK, this rules out the MATLAB approach, where they use a compiled
current LAPACK library, where just the troublemaker routine is
replaced with the old one from the 2.0 release.

Comment 38 by Sylvestre Ledru, May 15, 2014

Valentin, it is just about one file, we can take it as a workaround.

I don't know if it is possible with Fortran but a load on the fly
could be a workaround...

Comment 39 by Valentin Zauner, May 15, 2014

By "one file", you mean the ARPACK fortran file, i.e.
dneupd.f?

I actually strongly suspect that the same issue will also arise for
all the other drivers (i.e. single/double real/complex
symmetric/non-symmetric) that compute eigenvectors. One would have
to test all of these and apply the corresponding workaround.

I haven't done this since in my application I only need the double
real non-symmetric routine.

Comment 40 by Tim Mitchell, May 15, 2014

Valentin, yes, I am 99.999% certain that the same issue will arise
for all the other drivers.  However, I believe Sylvestre was
suggesting to only take the one LAPACK 2.0 file dlahqr and linking
to to just that specific version for arpack-ng, possibly
dynamically.  I'd be fine with that approach as a temporary
workaround (or by taking the Matlab approach which presumably
includes it as dlahqr2 as Edouard has suggested). Is that what you
meant Sylvestre?

I agree with everyone that it'd be great for arpack-ng to push
forward and support the newer LAPACK releases.  However, until those
improvements are ready and thoroughly tested, I feel it is prudent
to provide a temporary workaround that deals with this
bit of compromise to include and/or link to specific 2.0 versions as
necessary, the alternative of having arpack-ng users live with some
fairly erroneous results seems a worse trade-off, particularly since
it seems like it will be some time before compatibility with LAPACK
3.0 is fully handled in arpack-ng.

Valentin, as to the point of arpack-ng, I completely disagree that
without feature enhancements that arpack-ng would just be regular
ARPACK with a nice make environment.  The official ARPACK tree is
not being maintained /at all/ and as a result, the bug reports and
fixes were that have cropped up over the years weren't being
organized or compiled into one centralized source.  I personally am
very grateful that arpack-ng exists so that there is now one
centralized and organized repository for all ARPACK related bugs and
fixes to be available from.  Even if arpack-ng intended to only be
just such a maintenance release, that's still a huge benefit to the
community.

Comment 41 by Viral Shah, May 25, 2014

Having the erroneous answer is certainly the worse solution. It
would be really great if we can include DLAHQR2 as part of
arpack-ng, so that users get a correct working solution out of the
box.

Comment 42 by Johannes Jünemann, Jun 11, 2014

Hello everybody, I am encountering a problem that is related to your
discussion. I am trying to find the 'LR' eigenvalues of given
matrices (with real eigenvalues!) using the dnaupd/dneupd methods,
however, the lowest eigenvalues do not always appear in the right
order. Below a typical matrix example:

A = RESHAPE((/ 0d0 , 1d-4, 1d-4, -1d0,
&             1d-4, 0d0, -1d0, 1d-4,
&             1d-4, -1d0, 0d0, 1d-4,
&             -1d0, 1d-4, 1d-4, 0d0
&      /), (/4,4/) )

When calling the method repeatedly, the ordering of the two-lowest
EV (-1.0002 resp. -0.9998) goes wrong in some cases. I used the
arpack_test_f program (as provided by Valentin Zauner) to
double-check and tried both the fix for the dlahqr.f and using
dneupd2.f, but none of the two changes the problem. The ordering (at
least for this particular example) goes right when choosing
'LR','SM' or 'LM'.

I would be very glad if one of you could confirm this problem in
oder to know if the issue should have been fixed by your patches
(and I didn't follow the instructions through completely) or if this
is an unrelated error.

Comment 43 by Johannes Jünemann, Jun 11, 2014

I am sorry, the second sentence was supposed to read:

I am trying to find the 'SR' eigenvalues of given
matrices (with real eigenvalues!) using the dnaupd/dneupd methods...

Comment 44 by Valentin Zauner, Jun 11, 2014

I have never used the 'SR' mode, maybe some additional stuff happens
when you call this? there should be some shift-and-invert type of
operation involved, maybe this screws it up?

Also, your matrix is symmetric and you should use the symmetric
driver dsaupd/dseupd to begin with...

Comment 45 by Johannes Jünemann, Jun 12, 2014

Thank you for the very quick response. I am using the non-symmetric
libraries, as I apply the procedure to a set of different (not
necessarily) symmetric matrices.

The problem persists when slightly changing the matrix entries as to
make it non-symmetric.

Comment 46 by Marco Caliari, Feb 12, 2015

Dear all,

I had similar problems with rvec = .true.
It was not clear to me how the Ritz values (computed, sorted,
selected, ...) by dnaupd were related with the eigenvalues computed
by the famous dlahqr.f. Till today, when I saw in
arpack/src/dlaqrb.f

c  This is mostly a modification of the LAPACK routine dlahqr.

Now it is clear: they are related because computed in the same way
(by the same routine). So, I compared dlaqrb.f with dlahqr.f (2.0)
and made the same small changes to dlahqr.f (3.5.0) and called the
result dlaqrb35.f (attached). All my problems disappeared! If you
all confirm that it works also for you, I can continue with the job
(modify the description, the comments and the other routines).

Comment 47 by Marco Caliari, Feb 12, 2015

the file

Comment 48 by Valentin Zauner, Feb 12, 2015

Hi!

Tracking the changes back to within the auxiliary routine is
probably a better way :-) I will see if it works for me when I have
the time! Thanks for your work!

Cheers,
Valentin

Comment 49 by Sylvestre Ledru, Feb 12, 2015

This bug is fixed in github:
https://github.com/opencollab/arpack-ng/issues/3

FYI, this forge is no longer used.

Comment 50 by Sylvestre Ledru, Feb 12, 2015

Status: Fixed

Comment 51 by Valentin Zauner, Feb 12, 2015

So does that mean that the whole issue is now fully fixed in the
current 3.1.5 release? Also for other drivers such as dseupd,
zneupd, etc?

Comment 52 by Sylvestre Ledru, Feb 12, 2015

This project is only here for legacy purposes.

Created: 5 years 2 months ago

Updated: 3 years 9 months ago

Status: Fixed

Followed by: 9 persons

Labels:
Type:Defect
Priority:Medium

This issue duplicates