Error while loading shared libraries libtorque.so.2

error while loading shared libraries libtorque.so.2

Your library is a dynamic library. You need to tell the operating system where it can locate it at runtime. To do so, we will need to do those easy steps. tst_h_enums: error while loading shared libraries: libimf.so: cannot away.libtorque.so.2 libgks.a libldapcpp.so.0 libnco.so > libsz.so. pbs_server: error while loading shared libraries: libtorque.so.2: cannot open shared object file: No such file or directory qmgr: error while loading shared.

youtube video

Creating and Linking Shared Libraries

Error while loading shared libraries libtorque.so.2 - opinion

force-reload}" >&2
exit 3
;;
esac

exit 0


###/etc/init.d/pbs_sched###


#! /bin/sh
### BEGIN INIT INFO
# Provides: skeleton
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Example initscript
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO
#
# Author: Miquel van Smoorenburg <[email protected]>.
# Ian Murdock <[email protected]>.
#
# Please remove the "Author" lines above and replace them
# with your own name if you copy and modify this script.
#
# Version: @(#)skeleton 2.85-23 28-Jul-2004 [email protected]
#

set -e

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin
DESC="PBS Scheduler Daemon"
NAME=pbs_sched
DAEMON=/opt/local/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

# Gracefully exit if the package has been removed.
test -x $DAEMON

[maker-devel] MPI

Anna Nyiri's profile photo

Anna Nyiri

unread,
Nov 5, 2015, 8:49:00 PM11/5/15

Reply to author

Sign in to reply to author

Forward

Sign in to forward

Delete

You do not have permission to delete messages in this group

Link

Report message as abuse

Sign in to report message as abuse

Show original message

Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message

to [email protected]

Hi,

I tried to use MAKER with MPICH2, but I got an error message:
"/molbio/bin/danna/mpich2-install/bin/hydra_pmi_proxy: error while loading shared libraries: libtorque.so.2: cannot open shared object file: No such file or directory"

The attached file contains the shell sctipt, which I used. Is this script correct?

The script should contain MPICH module. But I can't find it on my computer. Where can I find this module?

Thanks for your help,
Anna Nyiri
Carson Holt's profile photo

Carson Holt

unread,
Nov 5, 2015, 8:57:47 PM11/5/15

Reply to author

Sign in to reply to author

Forward

Sign in to forward

Delete

You do not have permission to delete messages in this group

Link

Report message as abuse

Sign in to report message as abuse

Show original message

Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message

to Anna Nyiri, [email protected]

The problem is the actual MPICH2 installation.  You may be missing prerequisites or you may not have compiled with the necessary shared library flags (-enable-shared). You may also be compiling on one machine that has certain libraries installed then running on another that doesn’t have access to those libraries (this can happen if running on a cluster).  Try reinstalling MPICH2 or switching to OpenMPI.

If you decide to use OpenMPI, he following is from the INSTALL file that should be included with MAKER —>

If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so).


1.  Say yes to the 'configure for MPI' question when running 'perl Build.PL’ in step 1 of the EASY INSTALL.

2.  Give path to 'mpicc'. Note to make sure you do not give the path to ‘mpicc' from another MPI flavor that might be installed on your system.

3.  Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths.

4.  Finish installation according to steps 2-4 of the EASY INSTALL

Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings.

Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER.

        Example: mpiexec -mca btl ^openib -n 20 maker

Thanks,

Carson

grep -e "Open MPI:" -e "C compiler absolute:" Open MPI: 2.0.2a1 C compiler absolute: /opt/solstudio12.5b/bin/cc loki spawn 121 which mpiexec /usr/local/openmpi-2.1.0_64_cc/bin/mpiexec loki spawn 122 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master Parent process 0 running on loki I create 4 slave processes [loki:21301] OPAL ERROR: Timeout in file ../../../../openmpi-v2.x-201612232156-5ce66b0/opal/mca/pmix/base/pmix_base_fns.c at line 195 [loki:21301] *** An error occurred in MPI_Comm_spawn

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-23ThreadHoward Pritchard

Hi Paul, Thanks very much Christmas present. The Open MPI README has been updated to include a note about issues with the Intel 16.0.3-4 compiler suites. Enjoy the holidays, Howard 2016-12-23 3:41 GMT-07:00 Paul Kapinos

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-23ThreadPaul Kapinos

Hi all, we discussed this issue with Intel compiler support and it looks like they now know what the issue is and how to protect after. It is a known issue resulting from a backwards incompatibility in an OS/glibc update, cf. https://sourceware.org/bugzilla/show_bug.cgi?id=20019 Affected versions of the Intel compilers: 16.0.3, 16.0.4 Not affected versions: 16.0.2, 17.0 So, simply do not use affected versions (and hope on an bugfix update in 16x series if you cannot immediately upgrade to 17x, like we, despite this is the favourite option from Intel). Have a nice Christmas time! Paul Kapinos On 12/14/16 13:29, Paul Kapinos wrote: Hello all, we seem to run into the same issue: 'mpif90' sigsegvs immediately for Open MPI 1.10.4 compiled using Intel compilers 16.0.4.258 and 16.0.3.210, while it works fine when compiled with 16.0.2.181. It seems to be a compiler issue (more exactly: library issue on libs delivered with 16.0.4.258 and 16.0.3.210 versions). Changing the version of compiler loaded back to 16.0.2.181 (=> change of dynamically loaded libs) let the prevously-failing binary (compiled with newer compilers) to work propperly. Compiling with -O0 does not help. As the issue is likely in the Intel libs (as said changing out these solves/raises the issue) we will do a failback to 16.0.2.181 compiler version. We will try to open a case by Intel - let's see... Have a nice day, Paul Kapinos On 05/06/16 14:10, Jeff Squyres (jsquyres) wrote: Ok, good. I asked that question because typically when we see errors like this, it is usually either a busted compiler installation or inadvertently mixing the run-times of multiple different compilers in some kind of incompatible way. Specifically, the mpifort (aka mpif90) application is a fairly simple program -- there's no reason it should segv, especially with a stack trace that you sent that implies that it's dying early in startup, potentially even before it has hit any Open MPI code (i.e., it could even be pre-main). BTW, you might be able to get a more complete stack trace from the debugger that comes with the Intel compiler (idb? I don't remember offhand). Since you are able to run simple programs compiled by this compiler, it sounds like the compiler is working fine. Good! The next thing to check is to see if somehow the compiler and/or run-time environments are getting mixed up. E.g., the apps were compiled for one compiler/run-time but are being used with another. Also ensure that any compiler/linker flags that you are passing to Open MPI's configure script are native and correct for the platform for which you're compiling (e.g., don't pass in flags that optimize for a different platform; that may result in generating machine code instructions that are invalid for your platform). Try recompiling/re-installing Open MPI from scratch, and if it still doesn't work, then send all the information listed here: https://www.open-mpi.org/community/help/ On May 6, 2016, at 3:45 AM, Giacomo Rossi

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-14ThreadPaul Kapinos

Hello all, we seem to run into the same issue: 'mpif90' sigsegvs immediately for Open MPI 1.10.4 compiled using Intel compilers 16.0.4.258 and 16.0.3.210, while it works fine when compiled with 16.0.2.181. It seems to be a compiler issue (more exactly: library issue on libs delivered with 16.0.4.258 and 16.0.3.210 versions). Changing the version of compiler loaded back to 16.0.2.181 (=> change of dynamically loaded libs) let the prevously-failing binary (compiled with newer compilers) to work propperly. Compiling with -O0 does not help. As the issue is likely in the Intel libs (as said changing out these solves/raises the issue) we will do a failback to 16.0.2.181 compiler version. We will try to open a case by Intel - let's see... Have a nice day, Paul Kapinos On 05/06/16 14:10, Jeff Squyres (jsquyres) wrote: Ok, good. I asked that question because typically when we see errors like this, it is usually either a busted compiler installation or inadvertently mixing the run-times of multiple different compilers in some kind of incompatible way. Specifically, the mpifort (aka mpif90) application is a fairly simple program -- there's no reason it should segv, especially with a stack trace that you sent that implies that it's dying early in startup, potentially even before it has hit any Open MPI code (i.e., it could even be pre-main). BTW, you might be able to get a more complete stack trace from the debugger that comes with the Intel compiler (idb? I don't remember offhand). Since you are able to run simple programs compiled by this compiler, it sounds like the compiler is working fine. Good! The next thing to check is to see if somehow the compiler and/or run-time environments are getting mixed up. E.g., the apps were compiled for one compiler/run-time but are being used with another. Also ensure that any compiler/linker flags that you are passing to Open MPI's configure script are native and correct for the platform for which you're compiling (e.g., don't pass in flags that optimize for a different platform; that may result in generating machine code instructions that are invalid for your platform). Try recompiling/re-installing Open MPI from scratch, and if it still doesn't work, then send all the information listed here: https://www.open-mpi.org/community/help/ On May 6, 2016, at 3:45 AM, Giacomo Rossi

Re: [OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)

2016-11-25ThreadGeorge Bosilca

At the first glance I would say you are confusing the variables counting your requests, reqcount and nrequests. George. On Fri, Nov 25, 2016 at 7:11 AM, Paolo Pezzutto

[OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)

2016-11-25ThreadPaolo Pezzutto

Dear all, I am struggling with an invalid memory reference when calling SUB EXC_MPI (MOD01), and precisely at MPI_StartAll (see comment) below. @@ ! ** file mod01.f90 ! MODULE MOD01 implicit none include 'mpif.h' ! alternatively ! use mpi ! implicit none PRIVATE ! ... INTERFACE exc_mpi MODULE PROCEDURE exc_mpi END INTERFACE PUBLIC exc_mpi CONTAINS subroutine exc_mpi (X) !! send and receive from procs PN0 <-> PN1 and PN0 <-> PN2 real, dimension (ni:ns, m, l), intent(inout) :: X logical, save :: frstime=.true. integer, save :: mpitype_sn, mpitype_sp, mpitype_rn, mpitype_rp integer, save :: requests(4), reqcount integer :: istatus(MPI_STATUS_SIZE,4), ierr if (frstime) then call exc_init() frstime = .false. end if call MPI_StartAll(reqcount,requests,ierr) !! <-- segfault here call MPI_WaitAll(reqcount,requests,istatus,ierr) return contains subroutine exc_init integer :: i0, ierrs(12), ktag nrequests = 0 ierrs=0 ktag = 1 ! find i0 if ( condition1 ) then ! send to PN2 call MPI_Type_Vector(m*l, messlengthup(PN2), ns-ni+1, MPI_REAL, mpitype_sn, ierrs(1)) call MPI_Type_Commit(mpitype_sn, ierrs(3)) call MPI_Send_Init(X(i0, 1, 1), 1, mpitype_sn, PN2-1, ktag, MPI_COMM_WORLD, requests(reqcount+1), ierrs(5)) ! recieve from PN2 call MPI_Type_Vector(m*l, messlengthdo(PN0), ns-ni+1, MPI_REAL, mpitype_rn, ierrs(2)) call MPI_Type_Commit(mpitype_rn,ierrs(4)) call MPI_Recv_Init(X(nend(irank)+1, 1, 1), 1, mpitype_rn, PN2-1, ktag+1, MPI_COMM_WORLD, requests(nrequests+2), ierrs(6)) nrequests = nrequests + 2 end if if ( condition2 ) then ! send and rec PN0 <-> PN1 nrequests = nrequests + 2 end if return end subroutine exc_init end subroutine exc_mpi ! ... END MODULE MOD01 @@ The calls are coming from this other module in a separate file: @@ ! ** file mod02.f90 ! MODULE MOD02 use MOD01, only: exc_mpi IMPLICIT NONE include 'mpif.h' ! alternatively ! use mpi ! implicit none PRIVATE ! ... INTERFACE MYSUB MODULE PROCEDURE MYSUB END INTERFACE PUBLIC MYSUB CONTAINS SUBROUTINE MYSUB (Y) IMPLICIT NONE REAL,INTENT(INOUT) :: Y(nl:nr, m, l) ! ni<=nl, nr>=ns REAL, ALLOCATABLE, DIMENSION(:,:,:) :: Y0 !... allocate ( Y0(n-1:ns, 1:m, 1:l) ) DO i = 1, icount Y0(nl:nr,:,:) = F3(:,:,:) call exc_mpi ( Y0(ni:ns, :, :) ) ! <-- segfault here call mpi_barrier (mpi_comm_world, ierr) Y0(ni-1,:,:) = 0. CALL SUB01 END DO deallocate (Y0) RETURN CONTAINS SUBROUTINE SUB01 !... FRE: DO iterm = 1, m DIR: DO iterl = 1, l DO itern = nl, nr !Y(itern, iterm, iterl) = some_lin_combination(Y0) END DO END DO DIR END DO FRE END SUBROUTINE SUB01 ! ... END SUBROUTINE MYSUB END MODULE MOD02 @@ Segmentation fault is raised at runtime when MAIN (actually a sub in a module) calls MYSUB (in MOD02) for the second time, i.e. just MPI_StartAll without re-initialisation. The segfault is an invalid mem reference, but I swear that the bounds aren't changing. The error is not systematic, in the sense that the program works if splitting the job up to a certain number of processes, say NPMAX, which depend on the size of decomposed array (the bigger the size, the higher NPMAX). With more procs than NPMAX, the program segfaults. The same issue arises with [gfortran+ompi], [gfortran+mpich], while with [ifort+mpich] does not always segfault but one process might hang indefinitely. So I bet it is not strictly an ompi issue, so apologize for posting here. It is not a single version issue too: same for deb-jessie, ubuntu 14 and personal 2.0.1 -can share config.log if necessary-. The only thing in common is glibc (2.19, distro stable). Actually the backtrace of ifort-mpich lists libpthread.so. I have not tried with alternative c-libs, nor with newest glibc. Intel Virtual threading is enabled on all the three archs (one mini hpc and two pc). This error is not reported on "serious" archs like nec, sun (ifort+ompi) and ibm. I am trying to find a possible MPI workaround for deb-based systems, maintaining efficiency as much as possible. As can be seen, MOD02 passes to the exchange procedure (MOD01) a sliced array Y0 which is non contiguous. But I should not worry because MPI_Type_Vector is expected to do the remapping job for me. I could almost overcome the fault (NPMAX growing by one order of magnitude) is to exchange the dimensions back and forth, but this causes the

[OMPI users] Segmentation fault with openmpi-v2.0.1-134-g52bea1d on SuSE Linux

2016-11-02ThreadSiegmar Gross

Hi, I have installed openmpi-v2.0.1-134-g52bea1d on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.14 beta and gcc-6.2.0. Unfortunately, I get an error when I run one of my programs. loki spawn 149 ompi_info grep -e "Open MPI repo revision" -e"Configure command line" Open MPI repo revision: v2.Greetings from task 1: message type:3 msg length: 132 characters ... (complete output of my program) [nfs2:01336] *** Process received signal *** [nfs2:01336] Signal: Segmentation fault (11) [nfs2:01336] Signal code: Address not mapped (1) [nfs2:01336] Failing at address: 0x7feea4849268 [nfs2:01336] [ 0] /lib64/libpthread.so.0(+0x10c10)[0x7feeacbbec10] [nfs2:01336] [ 1] /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0(+0x7cd34)[0x7feeadd94d34] [nfs2:01336] [ 2] /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0(+0x78673)[0x7feeadd90673] [nfs2:01336] [ 3] /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0(+0x7ac2c)[0x7feeadd92c2c] [nfs2:01336] [ 4] /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0(opal_finalize_cleanup_domain+0x3e)[0x7feeadd56507] [nfs2:01336] [ 5] /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0(opal_finalize_util+0x56)[0x7feeadd56667] [nfs2:01336] [ 6] /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0(opal_finalize+0xd3)[0x7feeadd567de] [nfs2:01336] [ 7] /usr/local/openmpi-master_64_gcc/lib64/libopen-rte.so.0(orte_finalize+0x1ba)[0x7feeae09d7ea] [nfs2:01336] [ 8] /usr/local/openmpi-master_64_gcc/lib64/libopen-rte.so.0(orte_daemon+0x3ddd)[0x7feeae0cf55d] [nfs2:01336] [ 9] orted[0x40086d] [nfs2:01336] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7feeac829725] [nfs2:01336] [11] orted[0x400739] [nfs2:01336] *** End of error message *** Segmentation fault (core dumped) loki hello_1 116 I would be grateful, if somebody can fix the problem. Do you need anything else? Thank you very much for any help in advance. Kind regards Siegmar /* An MPI-version of the "hello world" program, which delivers some * information about its machine and operating system. * * * Compiling: * mpicc -o -lm * * Running: * mpiexec -np * * * File: hello_1_mpi.c Author: S. Gross * Date: 04.08.2017 * */ #include #include #include #include #include #include "mpi.h" #define BUF_SIZE 255 /* message buffer size */ #define MAX_TASKS 12 /* max. number of tasks */ #define SENDTAG 1 /* send message command */ #define EXITTAG 2 /* termination command */ #define MSGTAG 3 /* normal message token */ #define ENTASKS -1 /* error: too many tasks */ static void master (void); static void slave (void); int main (int argc, char *argv[]) { int mytid,/* my task id */ ntasks,/* number of parallel tasks */ namelen;/* length of processor name */ char processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Init (, ); MPI_Comm_rank (MPI_COMM_WORLD, ); MPI_Comm_size (MPI_COMM_WORLD, ); MPI_Get_processor_name (processor_name, ); /* With the next statement every process executing this code will * print one line on the display. It may happen that the lines will * get mixed up because the display is a critical section. In general * only one process (mostly the process with rank 0) will print on * the display and all other processes will send their messages to * this process. Nevertheless for debugging purposes (or to * demonstrate that it is possible) it may be useful if every * process prints

[OMPI users] segmentation fault to use openMPI

2017-10-11ThreadRUI ZHANG

Hello everyone, I am trying to debug through the MPI functionality at our local clusters. I use openmpi 3.0 and the executable were compiled by PGI 10.9. The executable is a regional air quality model called "CAMx" which is widely used in our community. In our local clusters setting, I have a cluster (npsx2) with 24 CPUs with 24G memory and three clusters with 40 CPUs with 65G memory (npsx4, npsx5,npsx6). The OS on all the clusters is CentOS 6.5. I use the command "lstopo" to generate the CPU architecture and attached below. I can run through the CAMx benchmark case and the outputs is the same as the benchmark outputs by using all the available CPUs across nodes with the command: mpirun -np 72 --hostfile [mynodes.txt] [myexe] Then I move to run my own specific case. CAMx model has the function of MPI as well as OpenMP to speed up the computation. Previously, our group only use the OpenMP, it works smoothly. Now I try to run it use MPI. The wired thing is if I assign 4 cpus, it run through and the results is correct, BUT if I assign 5 CPUs, it will stuck at certain time steps and idle there like forever, furthermore, if I assgin 6 CPUs or more for MPI run, it will crash at the first few time steps and report segmentation fault. My specific case has 5 times more total grids than the benchmark case, so my first guess is the memory issue. However, if I try this on npsx2 with fewer total memory or npsx5 with larger total memory, it has the same error pattern: works for assigning 4 CPUs, idle for assigning 5 CPUs and crash for assigning 6 CPUs. I tried to look at some hints for the previous post, but didn't find particular insightful post. I use the valgrind tool to try to debug the executable on cluster npsx5 as: valgrind mpirun -np 6 [myexe] It crashed with log file attache below and I can not find a clue how to solve it, so please help me to troubleshooting this if you have time. Thanks for your attention and hope your suggestions. Best regards, zhangrui log.npsx5.segmentation_fault Description: Binary data ___ users mailing list [email protected] https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] [OMPI USERS] segmentation fault at startup

2017-09-08ThreadAlberto Ortiz

Hi, I have a system running openmpi programs over archlinux. I had the programs compiled and running on July when I was using the version 1.10.4 or .7 of openmpi if I remember correctly. Just recently I updated the openmpi version to 2.1.1 and tried running a compiled program and ran correctly. The problem came when I tried compiling and running it again. Using mpicc doesn't seem to give any problems, but when running the program with mpirun it gives the next message: mpirun noticed that process rank 0 with PID 0 on node alarm exited on signal 11 (Segmentation fault). I have tried putting a printf in the first line of the main function and it doesn't reach that point, so I have assumed it must be a startup problem. I have tried running simple programs that only say "hello world" and they run correctly. What bothers me is that the same code compiled and ran correctly with an earlier version of openmpi and now it doesn't. If it helps I am running the programs with "sudo -H mpirun --allow-run-as-root -hostfile hosts -n 8 main". I need to run it with root privileges as I am combining SW with HW accelerators and I need to access some files with root permissions in order to communicate with the HW accelerators. There is no instantiation or use of those files until after running some functions in the main program, so there should be no problem or concern with that part. Thank you in advance, Alberto ___ users mailing list [email protected] https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8

[email protected]

Can you provide a backtrace with line numbers from a debug build? We don’t get much testing with lsf, so it is quite possible there is a bug in there. > On Feb 21, 2017, at 7:39 PM, Hammond, Simon David (-EXP)

[OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8

2017-02-21ThreadHammond, Simon David (-EXP)

Hi OpenMPI Users, Has anyone successfully tested OpenMPI 1.10.6 with PGI 17.1.0 on POWER8 with the LSF scheduler (—with-lsf=..)? I am getting this error when the code hits MPI_Finalize. It causes the job to abort (i.e. exit the LSF session) when I am running interactively. Are there any materials we can supply to aid debugging/problem isolation? [white23:58788] *** Process received signal *** [white23:58788] Signal: Segmentation fault (11) [white23:58788] Signal code: Invalid permissions (2) [white23:58788] Failing at address: 0x108e0810 [white23:58788] [ 0] [0x10050478] [white23:58788] [ 1] [0x0] [white23:58788] [ 2] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libopen-rte.so.12(+0x1b6b0)[0x1071b6b0] [white23:58788] [ 3] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libopen-rte.so.12(orte_finalize+0x70)[0x1071b5b8] [white23:58788] [ 4] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libmpi.so.12(ompi_mpi_finalize+0x760)[0x10121dc8] [white23:58788] [ 5] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libmpi.so.12(PMPI_Finalize+0x6c)[0x10153154] [white23:58788] [ 6] ./IMB-MPI1[0x100028dc] [white23:58788] [ 7] /lib64/libc.so.6(+0x24700)[0x104b4700] [white23:58788] [ 8] /lib64/libc.so.6(__libc_start_main+0xc4)[0x104b48f4] [white23:58788] *** End of error message *** [white22:73620] *** Process received signal *** [white22:73620] Signal: Segmentation fault (11) [white22:73620] Signal code: Invalid permissions (2) [white22:73620] Failing at address: 0x108e0810 Thanks, S. — Si Hammond Scalable Computer Architectures Sandia National Laboratories, NM, USA [Sent from Remote Connection, Please excuse typos] ___ users mailing list [email protected] https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-03ThreadSiegmar Gross

Hi Howard, it still works with 4 processes and "vader" will not send the following output about missing communication peers if I start at least 2 processes. ... [loki:14965] select: initializing btl component vader [loki][[42444,1],0][../../../../../openmpi-2.0.2rc2/opal/mca/btl/vader/btl_vader_component.c:454:mca_btl_vader_component_init] No peers to communicate with. Disabling vader. [loki:14965] select: init of component vader returned failure [loki:14965] mca: base: close: component vader closed [loki:14965] mca: base: close: unloading component vader ... Now the output from 4 processes.

Re: [OMPI users] segmentation fault with openmpi-2.0.2rc2 on Linux

2017-01-03ThreadHoward Pritchard

HI Siegmar, Could you please rerun the spawn_slave program with 4 processes? Your original traceback indicates a failure in the barrier in the slave program. I'm interested in seeing if when you run the slave program standalone with 4 processes the barrier failure is observed. Thanks, Howard 2017-01-03 0:32 GMT-07:00 Siegmar Gross < [email protected]>: > Hi Howard, > > thank you very much that you try to solve my problem. I haven't > changed the programs since 2013 so that you use the correct > version. The program works as expected with the master trunk as > you can see at the bottom of this email from my last mail. The > slave program works when I launch it directly. > > loki spawn 122 mpicc --showme > cc -I/usr/local/openmpi-2.0.2_64_cc/include -m64 -mt -mt -Wl,-rpath > -Wl,/usr/local/openmpi-2.0.2_64_cc/lib64 -Wl,--enable-new-dtags > -L/usr/local/openmpi-2.0.2_64_cc/lib64 -lmpi > loki spawn 123 ompi_info error while loading shared libraries libtorque.so.2

0 Comments

Leave a Comment