Version (development version)
New Features
- Now
availableCores()returns 2 also when package vignettes are built byR CMD buildorR CMD check. This helps to prevent package vignettes from overusing the CPU cores when building and checking R packages.
Version 1.45.1
CRAN release: 2025-07-24
Miscellaneous
- Now
print()forRichSOCKclusteroutputs a more concise summary, which is also grammatically correct for single-node clusters.
Deprecated and Defunct
- In previous version,
makeClusterPSOCK()started to collect session information on each parallel worker, which includedcapabilities(). However, for unknown reasons,capabilities()caused the cluster creation to fail GitHub Actions running macOS. The problem could be reproduced neither locally, on the mac-builder, nor on the CRAN macOS servers. Because this feature is non-critical and only introduced in the previous version, I decided to remove the collection ofcapabilities()again.
Version 1.45.0
CRAN release: 2025-06-02
New Features
availableCores()gained argumentmax, which limits the maximum number of cores returned after everything else is applied, i.e.availableCores(..., max = n)is short formin(n, availableCores(...), na.rm = TRUE).availableWorkers()gained argument..., which passes any additional arguments toavailableCores(), if specified.If
killNode(..., signal = tools::SIGTERM)successfully signaled the cluster node, it will now close any existing socket connection to the node. If the node is running on the local host, it will also remove its temporary directory, because the the node’s R process might not have been exited gracefully.The session information collected by
makeClusterPSOCK()now contains more details on each worker, e.g. thetempdir()folder,capabilities(), andextSoftVersion().Cluster nodes created by
makeClusterPSOCK()gained attributecalls, which records thesys.calls(). This can be useful when troubleshooting from where a cluster was created. Analogously, setting R optionparallelly.makeNodePSOCK.callsto TRUE will relay the call stack in the system call that launched the cluster node.
Bug Fixes
availableCores()would not respectmethod = "fallback"ifconstraintsspecified"connections"or"connections-N".availableCores()would produce an error onError in scan(file = file, what = what, ...)on systems that have a/proc/self/mountsfile with syntax errors. Such files have been reported on Windows Subsystem for Linux version 2 (WSL 2), where spaces in Windows path have not been properly escaped for some entries. Now such invalid entries are skipped, before parsing the mount table.
Version 1.44.0
CRAN release: 2025-05-07
New Features
Add support to
availableCores()andavailableWorkers()to specifyconstraints = "connections-N", whereNspecifies the number of connections to leave free after launching a PSOCK cluster with this number of cores.Add
all.equal()forconnection, which can distinguish between two connections that share the same connection index, but are not the same connection, e.g. when one was created, then closed, and another one of the same kind is created.
Bug Fixes
-
availableCores()would not respectmethod = "fallback", since v1.41.0 (2024-12-18), on system with a value formethod = "/proc/self/status".
Version 1.43.0
CRAN release: 2025-03-24
Significant Changes
Now
availableCores()memoizes the values of all its components. This means that as soon as it has been called, environment variables such asNSLOTSwill no longer be queried.Starting with R 4.5.0, one can use
parallel::makeCluster(n, type = parallelly::RPSOCK)as an alternative toparallelly::makeClusterPSOCK(n). Similarly,type = parallelly::RMPIcreates a cluster usingparallelly::makeClusterMPI(), andtype = parallelly::SEQcreates a cluster usingparallelly::makeClusterSequential(). This was first introduced in parallelly 1.38.0, but here we renamePSOCKtoRPSOCKandMPItoRMPIto minimize the risk for mistaking them from the built-in types in the parallel package. TheRstands for “Rich”.
Documentation
- Add more help on the R option
parallelly.maxWorkers.localhostlimits. Improved the warning and error messages that are produced when these settings are exceeded.
Bug Fixes
isNodeAlive()could produce warnings ondoTryCatch(return(expr), name, parentenv, handler) : NAs introduced by coercionon MS Windows. Improved the internaltasklistparses used to test whether a process is alive.availableCores()could produceError: Error in cache_controller[[field]] : subscript out of boundsin... getCGroups1CpuQuota -> getCGroups1CpuPeriodMicroseconds.
Version 1.42.0
CRAN release: 2025-01-30
New Features
- Now
availableCores()andavailableWorkers()support also when both CGroups v1 and CGroups v2 are enabled on the machine. Previously, such configurations were completely ignored.
Bug Fixes
Call
isNodeAlive()andkillNode()on cluster nodes running on external machines would produceError in match.arg(type, choices = known_types, several.ok = FALSE) : 'arg' must be of length 1. This bug was introduced in version 1.38.0 (2024-07-27), when adding richer support for therscript_shargument.Call
isNodeAlive()andkillNode()on cluster nodes running on external machines would produceError: ‘length(rsh_call) == 1L’ is not TRUEif optionrshoptswere specified during creation.The value of
availableCores()was numeric rather than integer as documented. This harmless bug was introduced in version 1.31.0 (2022-04-07).
Version 1.41.0
CRAN release: 2024-12-18
New Features
Now
availableCores()queries also/proc/self/statusfor CPU affinity allotments.makeClusterPSOCK()will now produce an error, rather than a warning, when the local system command used to launch the parallel worker failed with a non-zero exit code.Now
serializedSize()always returns a double. Previously, it would return an integer, if the value could be represented by an integer. However, it turned out that returning an integer increased the risk for integer overflow later on if, say, two such values were added together.
Bug Fixes
-
makeClusterPSOCK()on MS Windows failed to launch remote workers, with warnings on"In system(local_cmd, wait = FALSE, input = input) : 'C:\WINDOWS\System32\OpenSSH\ssh.exe' not found". This bug was introduced in version 1.38.0 (2024-07-27), when adding richer support for therscript_shargument.
Version 1.40.0
CRAN release: 2024-12-03
New Features
- Argument
userofmakeClusterPSOCK()may now be a vector of usernames - one for each worker specified.
Documentation
- Add vignettes on how to setup a cluster of parallel workers on the local machine, on external machines, in the cloud, in HPC environments, and more.
Bug Fixes
Querying of cgroups v1 ‘cpuquota’ CPU limits broke in the previous release (v1.39.0).
availableCores()could produce errorFailed to identify mount point for CGroups v1 controller 'cpuset'on some systems.availableWorkers()would produce invalid warning onIdentified 8 workers from the ‘PE_HOSTFILE’ file (...), which is more than environment variable ‘NSLOTS’ = 8when running via a Grid Engine job scheduler.
Version 1.39.0
CRAN release: 2024-11-07
New Features
- Environment variables
R_PARALLELLY_RANDOM_PORTSnow supports multiple, comma-separated port specifications, e.g."20001:20999"and"1068:1099,20001:20999,40530".
Documentation
- Add example to
help("makeClusterPSOCK")on how to usesystemd-runto limit workers’ CPU quota and memory allowances.
Bug Fixes
- Now
availableCores()does a better job detecting cgroups v2cpu.maxCPU restrictions.
Version 1.38.0
CRAN release: 2024-07-27
New Features
Now argument
rshcmdofmakeNodePSOCK()can be a function. It must accept at least two arguments namedrshoptsandworker. Thershoptsargument is a character vector of length zero or more. Theworkerargument is a string hostname. The function must return a single string.Now
makeNodePSOCK()acceptsrscript_sh = "none", which skips quoting the Rscript call.Now
makeNodePSOCK()acceptsrscript_shof length one or two. Iflength(rscript_sh) == 2, thenrscript_sh[1]is for the inner andrscript_sh[2]is for the outer shell quoting of the Rscript call. More precisely,rscript_sh[1]is for Rscript arguments that need shell quoting (e.g.Rscript -e "<expr>"), andrscript_sh[2]is for the wholeRscript ...call.Add
makeClusterSequential()available for R (>= 4.4.0).Starting with R 4.5.0 (currently R-devel), one can use
parallel::makeCluster(n, type = parallelly::PSOCK)as an alternative toparallelly::makeClusterPSOCK(n). Similarly,type = parallelly::MPIcreates a cluster usingparallelly::makeClusterMPI(), andtype = parallelly::SEQcreates a cluster usingparallelly::makeClusterSequential().Add
serializedSize()for calculating the size of an object by counting the number of bytes required to serialize it.
Version 1.37.0
CRAN release: 2024-02-14
New Features
-
makeClusterPSOCK(nworkers)gained protection against setting up too many localhost workers relative to number of available CPU cores. Ifnworkers / availableCores()is greater than 1.0 (100%), then a warning is produced. If greater than 3.0 (300%), an error is produced. These limits can be configured by R optionparallelly.maxWorkers.localhost. These checks are skipped ifnworkersinherits fromAsIs, e.g.makeClusterPSOCK(I(16)). The current 3.0 (300%) limit is likely to be decreased in a future release. A few packages failR CMD check --as-cranwith this validation enabled. For example, one package uses 8 parallel workers in its examples, whileR CMD check --as-cranonly allows for two. To give such packages time to be fixed, the CRAN-enforced limits are ignored for now.
Miscellaneous
-
makeClusterPSOCK()could produce a confusing errorInvalid port: NAif a non-available port was requested. Now the error message is more informative, e.g.Argument 'port' specifies non-available port(s): 80.
Version 1.36.0
CRAN release: 2023-05-26
New Features
isNodeAlive()andkillNode()now support also worker processes that run on remote machines. They do this by connecting to the remote machine using the same method used to launch the worker, which is typically SSH, and do their R calls that way.isNodeAlive()andkillNode()gained argumenttimeoutfor controlling the maximum time, in seconds, before giving up and returning NA.Add
cloneNode(), which can be used to “restart”RichSOCKnodecluster nodes.Argument
workerformakeNodePSOCK()now takes the optional, logical attributelocalhostto manually specify that the worker is a localhost worker.Add
print()forRichSOCKnode, which gives more details thanprint()forSOCKnode.print()forRichSOCKnodeandRichSOCKclusterreport on nodes with broken connections.Add
as.cluster()forRichSOCKnode, which returns aRichSOCKcluster.Introduce R option
parallelly.supportsMulticore.disableOnto control where multicore processing is disabled by default.
Bug Fixes
Calling
killNode()onRichSOCKnodenode could theoretically kill a process on the current machine with the same process ID (PID), although the parallel worker (node) is running on another machine.isNodeAlive()onRichSOCKnodenode could theoretically return TRUE because there was a process with the same process ID (PID) on the current machine, although the parallel worker (node) is running on another machine.isLocalHost()forSOCK0nodewas not declared an S3 method.
Version 1.35.0
CRAN release: 2023-03-22
New Features
- Now
freePort()defaults todefault = NA_integer_, so thatNA_integer_is returned when no free port could be found. However, in R (< 4.0.0), which does not support port querying, we usedefault = "random".
Documentation
- Mention in
help("makeClusterPSOCK")thatrscript_sh = "cmd"is needed if the remote machines run MS Windows.
Bug Fixes
makeClusterPSOCK(..., verbose = TRUE)would not show verbose output. One still had to set optionparallelly.debugto TRUE.availableWorkers()could produce false sanity-check warnings on mismatching ‘PE_HOSTFILE’ content and ‘NSLOTS’ for certain SGE-cluster configurations.
Version 1.34.0
CRAN release: 2023-01-13
New Features
- Add support for
availableWorkers(constraints = "connections"), which limits the number of workers that can be be used to the current number of free R connections according tofreeConnections(). This is the maximum number of PSOCK, SOCK, and MPI parallel cluster nodes we can open without running out of available R connections.
Bug Fixes
availableCores()would produce a warningIn is.na(constraints) : is.na() applied to non-(list or vector) of type 'NULL'when running with R (< 4.0.0).availableWorkers()did not acknowledge the"cgroups2.cpu.max"and"Bioconductor"methods added toavailableCores()in parallelly 1.33.0 (2022-12-13). It also did not acknowledge methods"cgroups.cpuset"and"cgroups.cpuquota"added in parallelly 1.31.0 (2022-04-07), and"nproc"added in parallelly 1.26.1 (2021-06-29).When
makeClusterPSOCK()failed to connect to all parallel workers within theconnectTimeouttime limit, could either produceError in sprintf(ngettext(failed, "Cluster setup failed (connectTimeout=%.1f seconds). %d worker of %d failed to connect.", : invalid format '%d'; use format %f, %e, %g or %a for numeric objectsinstead of an informative error message, or an error message with the incorrect information.
Version 1.33.0
CRAN release: 2022-12-14
New Features
Add
killNode()to terminate cluster nodes via process signaling. Currently, this is only supported for parallel workers on the local machine, and only those created bymakeClusterPSOCK().makeClusterPSOCK()and likes now assert the running R session has enough permissions on the operating system to do system calls such assystem2("Rscript --version"). If not, an informative error message is produced.On Unix,
availableCores()queries also control groups v2 (cgroups v2) fieldcpu.maxfor a possible CPU quota allocation. If a CPU quota is set, then the number of CPUs is rounded to the nearest integer, unless its less that 0.5, in case it’s rounded up to a single CPU. An example, where cgroups CPU quotas can be set to limit the total CPU load, is with Linux containers, e.g.docker run --cpus=3.5 ....Add support for
availableCores(methods = "connections"), which returns the current number of free R connections perfreeConnections(). This is the maximum number of PSOCK, SOCK, and MPI parallel cluster nodes we can open without running out of available R connections. A convenient way to use this and all other methods isavailableCores(constraints = "connections").Now
availableCores()recognizes environment variableIS_BIOC_BUILD_MACHINE, which is set to true by the Bioconductor (>= 3.16) check servers. If true, then a maximum of four (4) cores is returned. This new environment variable replaces legacy variableBBS_HOMEused in Bioconductor (<= 3.15).availableCores()splits up method"BiocParallel"into two;"BiocParallel"and"Bioconductor". The former queries environment variableBIOCPARALLEL_WORKER_NUMBERand the latterIS_BIOC_BUILD_MACHINE. This meansavailableCores(which = "all")now reports on both.isNodeAlive()will now produce a once-per-session informative warning when it detects that it is not possible to check whether another process is alive on the current machine.
Documentation
Add section to
help("makeClusterPSOCK", package = "parallelly")explaining whyR CMD checkmay produce “checking for detritus in the temp directory … NOTE” and how to avoid them.Add section ‘For package developers’ to
help("makeClusterPSOCK", package = "parallelly")reminding us that we need to stop all clusters we created in package examples, tests, and vignettes.
Bug Fixes
-
isNodeAlive()failed to record which method works for testing if a process exists or not, which meant it would keep trying all methods each time. Similarly, if none works, it would still keep trying each time instead of returning NA immediately. On some systems, failing to check whether a process exists could result in one or more warnings, in which case those warnings would be produced for each call toisNodeAlive().
Version 1.32.1
CRAN release: 2022-07-21
Bug Fixes
- The
hostelement of theSOCK0nodeorSOCKnodeobjects created bymakeClusterPSOCK()lost attributelocalhostfor localhost workers. This made some error messages from the future package less informative.
Version 1.32.0
CRAN release: 2022-06-07
Significant Changes
- The default for argument
revtunnelofmakeNodePSOCK(), and therefore also ofmakeClusterPSOCK(), is nowNA, which means it’s agile to whetherrshcmd[1]specifies an SSH client, or not. If SSH is used, then it will resolve torevtunnel = TRUE, otherwise torevtunnel = FALSE. This removed the need for settingrevtunnel = FALSE, when non-SSH clients are used.
New Features
-
availableCores()andavailableWorkers()gained support for the ‘Fujitsu Technical Computing Suite’ job scheduler. Specifically, they acknowledges environment variablesPJM_VNODE_CORE,PJM_PROC_BY_NODE, andPJM_O_NODEINF. Seehelp("makeClusterPSOCK", package = "parallelly")for an example.
Bug Fixes
makeClusterPSOCK()would fail withError: node$session_info$process$pid == pid is not TRUEwhen running R in Simplified Chinese (LANGUAGE=zh_CN), Traditional Chinese (Taiwan) (LANGUAGE=zh_TW), or Korean (LANGUAGE=ko) locales.Some warnings and errors showed the wrong call.
Version 1.31.1
CRAN release: 2022-04-22
Bug Fixes
Changes to option
parallelly.availableCores.systemwould be ignored if done after the first call toavailableCores().availableCores()with optionparallelly.availableCores.systemset to less thatparallel::detectCores()would produce a warning, e.g. “[INTERNAL]: Will ignore the cgroups CPU set, because it contains one or more CPU indices that is out of range [0,0]: 0-7”.
Version 1.31.0
CRAN release: 2022-04-07
Significant Changes
- Changed the default for argument default of
freePort()to"random", which used to be"first". The main reason for this is to make sure the default behavior is to return a random port also on R (< 4.0.0) where we cannot test whether or not a port is available.
New Features
On Unix,
availableCores()now queries also control groups (cgroups) fieldscpu.cfs_quota_usandcpu.cfs_period_us, for a possible CPU quota allocation. If a CPU quota is set, then the number of CPUs is rounded to the nearest integer, unless its less that 0.5, in case it’s rounded up to a single CPU. An example, where cgroups CPU quotas can be set to limit the total CPU load, is with Linux containers, e.g.docker run --cpus=3.5 ....In addition to cgroups CPU quotas,
availableCores()also queries cgroups for a possible CPU affinity, which is available in fieldcpuset.set. This should give the same result as what the already existing ‘nproc’ method gives. However, not all systems have thenproctool installed, in which case this new approach should work. Some high-performance compute (HPC) environments set the CPU affinity so that jobs do not overuse the CPUs. It may also be set by Linux containers, e.g.docker run --cpuset-cpus=0-2,8 ....The minimum value returned by
availableCores()is one (1). This can be overridden by new optionparallelly.availableCores.min. This can be used to test parallelization methods on single-core machines, e.g.options(parallelly.availableCores.min = 2L).
Bug Fixes
The ‘nproc’ result for
availableCores()was ignored if nproc > 9.availableCores()would return the ‘fallback’ value when only ‘system’ and ‘nproc’ information was available. However, in this case, we do want it to return ‘nproc’ when ‘nproc’ != ‘system’, because that is a strong indication that the number of CPU cores is limited by control groups (cgroups) on Linux. If ‘nproc’ == ‘system’, we cannot tell whether cgroups is enabled or not, which means we will fall back to the ‘fallback’ value if there is no other evidence that another number of cores are available to the current R process.Technically,
canPortBeUsed()could falsely return FALSE if the port check was interrupted by, say, a user interrupt.freePort(ports, default = "random")would always use returnports[1]if the system does not allow testing if a port is available or not, or if none of the specified ports are available.
Version 1.30.0
CRAN release: 2021-12-17
New Features
makeNodePSOCK(), and therefore alsomakeClusterPSOCK(), gained argumentrscript_sh, which controls howRscriptarguments are shell quoted. The default is to make a best guess on what type of shell is used where each cluster node is launched. If launched locally, then it whatever platform the current R session is running, i.e. either a POSIX shell ("sh") or MS Windows ("cmd"). If remotely, then the assumption is that a POSIX shell ("sh") is used.makeNodePSOCK(), and therefore alsomakeClusterPSOCK(), gained argumentdefault_packages, which controls the default set of R packages to be attached on each cluster node at startup. Moreover, if argumentrscriptspecifies an ‘Rscript’ executable, then argumentdefault_packagesis used to populate Rscript command-line option--default-packages=.... Ifrscriptspecifies something else, e.g. an ‘R’ or ‘Rterm’ executable, then environment variableR_DEFAULT_PACKAGES=...is set accordingly when launching each cluster node.Argument
rscript_argsofmakeClusterPSOCK()now supports"*"values. When used, the corresponding element will be replaced with the internally added Rscript command-line options. If not specified, such options are appended at the end.
Bug Fixes
makeClusterPSOCK()did not support backslashes (\) inrscript_libs, backslashes that may originate from, for example, Windows network drives. The result was that the worker would silently ignore anyrscript_libscomponents with backslashes.The package detects when
R CMD checkruns and adjust default settings via environment variables in order to play nicer with the machine where the checks are running. Some of these environment variables were in this case ignored since parallelly 1.26.0.
Version 1.29.0
CRAN release: 2021-11-21
Significant Changes
-
makeClusterPSOCK()launches parallel workers with optionsocketOptionsset to"no-delay"by default. This decreases the communication latency between workers and the main R session, significantly so on Unix. This option requires R (>= 4.1.0) and has no effect in early versions of R.
New Features
Added argument
socketOptionstomakeClusterPSOCK(), which sets the corresponding R option on each cluster node when they are launched.Argument
rscript_envsofmakeClusterPSOCK()can also be used to unset environment variables cluster nodes. Any named element with valueNA_character_will be unset.Argument
rscriptofmakeClusterPSOCK()now supports"*"values. When used, the corresponding element will be replaced with the"Rscript", or ifhomogenous = TRUE, then absolute path to current ‘Rscript’.
Documentation
- Add
makeClusterPSOCK()example on how to launch workers distributed across multiple CPU Groups on MS Windows 10.
Bug Fixes
isForkedChild()would only return TRUE in a forked child process, if and only if, it had already been called in the parent R process.Using argument
rscript_startupwould causemakeClusterPSOCK()to fail in R-devel (>= r80666).
Version 1.28.0
New Features
Add
isNodeAlive()to check whether a cluster and cluster nodes are alive or not.Add
isForkedChild()to check whether or not the current R process is a forked child process.
Bug Fixes
Environment variable
R_PARALLELLY_SUPPORTSMULTICORE_UNSTABLEwas incorrectly parsed as a logical instead of a character string. If the variables was set to, say,"quiet", this would cause an error when the package was loaded.makeClusterPSOCK()failed to fall back tosetup_strategy = "sequential", when not supported by the current R version.
Version 1.27.0
CRAN release: 2021-07-19
New Features
-
availableCores()andavailableWorkers()now respects environment variableBIOCPARALLEL_WORKER_NUMBERintroduced in BiocParallel (>= 1.27.2). They also respectBBS_HOMEwhich is set on the Bioconductor check servers to limit the number of parallel workers while checking Bioconductor packages.
Workaround
-
makeClusterPSOCK()andparallel::makeCluster()failed with error “Cluster setup failed.of workers failed to connect.” when using the new default setup_strategy = "parallel"and when the tcltk package is loaded when running R (>= 4.0.0 && <= 4.1.0) on macOS. Now parallelly forcessetup_strategy = "sequential"when the tcltk package is loaded on these R versions.
Bug Fixes
makeClusterPSOCK(..., setup_strategy = "parallel")would forget to close an socket connection used to set up the workers. This socket connection would be closed by the garbage collector eventually with a warning.parallelly::makeClusterPSOCK()would fail with “Error in freePort(port) : Unknown value on argument ‘port’: ‘auto’” if environment variableR_PARALLEL_PORTwas set to a port number.parallelly::availableCores()would produce ‘Error in if (grepl(“^ [1-9]$”, res)) return(as.integer(res)) : argument is of length zero’ on Linux systems withoutnprocinstalled.
Version 1.26.1
CRAN release: 2021-06-30
New Features
-
print()onRichSOCKclustermentions when the cluster is registered to be automatically stopped by the garbage collector.
Workaround
- Depending on R version used, the RStudio Console does not support the new
setup_strategy = "parallel"when usingmakeClusterPSOCK()orparallel::makeCluster(). The symptom is that they, after a long wait, result in “Error in makeClusterPSOCK(workers, …) : Cluster setup failed.of workers failed to connect.” This is due to a bug in R, which has been fixed for R (>= 4.1.1) but also in a recent R 4.1.0 Patched. For R (>= 4.0.0) or R (<= 4.1.0), this release works around the problem by forcing setup_strategy = "sequentialfor parallelly and parallel when running in the RStudio Console. If you wish to override this behavior, you can always set optionparallelly.makeNodePSOCK.setup_strategyto"parallel", e.g. in your~/.Rprofilefile. Alternatively, you can set the environment variableR_PARALLELLY_MAKENODEPSOCK_SETUP_STRATEGY=parallel, e.g. in your~/.Renvironfile.
Bug Fixes
- On systems with
nprocinstalled,availableCores()would be limited by environment variablesOMP_NUM_THREADSandOMP_THREAD_LIMIT, if set. For example, on conservative systems that setOMP_NUM_THREADS=1as the default,availableCores()would pick this up vianprocand return 1. This was not the intended behavior. Now those environment variables are temporarily unset before queryingnproc.
Version 1.26.0
CRAN release: 2021-06-09
Significant Changes
-
R_PARALLELLY_*(andR_FUTURE_*) environment variables are now only read when the parallelly package is loaded, where they set the correspondingparallelly.*option. Previously, some of these environment variables were queried by different functions as a fallback to when an option was not set. By only parsing them when the package is loaded, it decrease the overhead in functions, and it clarifies that options can be changed at runtime whereas environment variables should only be set at startup.
New Features
makeClusterPSOCK()now support setting up cluster nodes in parallel similarly to howparallel::makePSOCKcluster()does it. This significantly reduces the setup turnaround time. This is only supported in R (>= 4.0.0). To revert to the sequential setup strategy, set R optionparallelly.makeNodePSOCK.setup_strategyto"sequential".Add
freePort()to get a random TCP port that can be opened.
Bug Fixes
- R option
parallelly.availableCores.fallbackand environment variableR_PARALLELLY_AVAILABLECORES_FALLBACKwas ignored since parallelly 1.22.0, when support for ‘nproc’ was added toavailableCores().
Version 1.25.0
CRAN release: 2021-04-30
Significant Changes
- The default SSH client on MS Windows 10 is now the built in
sshclient. This means that regardless whether you are on Linux, macOS, or Windows 10, setting up parallel workers on external machines over SSH finally works out of the box without having to install PuTTY or other SSH clients. This was possible because a workaround was found for a Windows 10 bug preventing us from using reverse tunneling over SSH. It turns out the bug reveals itself when using hostname ‘localhost’ but not ‘127.0.0.1’, so we use the latter.
New Features
-
availableCores()gained argumentomitto make it easier to put aside zero or more cores from being used in parallel processing. For example, on a system with four cores,availableCores(omit = 1)returns 3. Importantly, sinceavailableCores()is guaranteed to always return a positive integer,availableCores(omit = 4) == 1, even on systems with four or fewer cores. UsingavailableCores() - 4on such systems would return a non-positive value, which would give an error downstream.
Bug Fixes
makeClusterPSOCK(), or actuallymakeNodePSOCK(), did not accept all types of environment variable names when usingrscript_envs, e.g. it would give an error if we tried to pass_R_CLASS_MATRIX_ARRAY_.makeClusterPSOCK()had a “length > 1 in coercion to logical” bug that could affect especially MS Windows 10 users.
Version 1.24.0
CRAN release: 2021-03-14
Significant Changes
- The default SSH client on MS Windows is now, in order of availability: (i)
plinkof the PuTTY software, (ii)sshin the RStudio distribution, and (iii)sshof Windows 10. Previously, the latter was considered first but that still has a bug preventing us from using reverse tunneling.
New Features
makeClusterPSOCK(), or actuallymakeNodePSOCK(), gained argumentquiet, which can be used to silence output produced bymanual = TRUE.c()forclusterobjects now warns about duplicated cluster nodes.Add
isForkedNode()to test if a cluster node runs in a forked process.Add
isLocalhostNode()to test if a cluster node runs on the current machine.Now
availableCores()andavailableWorkers()avoid recursive calls to the custom function given by optionsparallelly.availableCores.customandparallelly.availableWorkers.custom, respectively.availableWorkers()now recognizes the Slurm environment variableSLURM_JOB_NODELIST, e.g."dev1,n[3-4,095-120]". It will usescontrol show hostnames "$SLURM_JOB_NODELIST"to expand it, if supported on the current machine, otherwise it will attempt to parse and expand the nodelist specification using R. If either of environment variableSLURM_JOB_CPUS_PER_NODEorSLURM_TASKS_PER_NODEis set, then each node in the nodelist will be represented that number of times. If in addition, environment variableSLURM_CPUS_PER_TASK(always a scalar), then that is also respected.
Miscellaneous
- All code is now using the
parallelly.prefix for options and theR_PARALLELLY_prefix for environment variables. Settings that use the correspondingfuture.andR_FUTURE_prefixes are still recognized.
Bug Fixes
availableCores()did not respect environment variableSLURM_TASKS_PER_NODEwhen the job was allocated more than one node.Above argument
quietwas introduced in future 1.19.1 but was mistakenly dropped from parallelly 1.20.0 when that was released, and therefore also from future (>= 1.20.0).
Version 1.23.0
CRAN release: 2021-01-04
New Features
availableCores(),availableWorkers(), andfreeCores()gained argumentlogical, which is passed down toparallel::detectCores()as-is. The default is TRUE but it can be changed by setting the R optionparallelly.availableCores.logical. This option can in turn be set via environment variableR_PARALLELLY_AVAILABLECORES_LOGICALwhich is applied (only) when the package is loaded.Now
makeClusterPSOCK()asserts that there are enough free connections available before attempting to create the parallel workers. If too many workers are requested, an informative error message is produced.Add
availableConnections()andfreeConnections()to infer the maximum number of connections that the current R installation can have open at any time and how many of those are currently free to be used. This limit is typically 128 but may be different in custom R installations that are built from source.
Version 1.22.0
CRAN release: 2020-12-13
New Features
Now
availableCores()queries also Unix commandnproc, if available. This will make it respect the number of CPU/cores limited by ‘cgroups’ and Linux containers.PSOCK cluster workers are now set up to communicate using little endian (
useXDR = FALSE) instead of big endian (useXDR = TRUE). Since most modern systems use little endian,useXDR = FALSEspeeds up the communication noticeably (10-15%) on those systems. The default value of this argument can be controlled by the R optionparallelly.makeNodePSOCK.useXDRor the corresponding environment variableR_PARALLELLY_MAKENODEPSOCK_USEXDR.
Beta Features
Add
cpuLoad()for querying the “average” system load on Unix-like systems.Add
freeCores()for estimating the average number of unused cores based on the average system load as given bycpuLoad().
Version 1.21.0
CRAN release: 2020-10-27
Significant Changes
- Removed
find_rshcmd()which was never meant to be exported.
New Features
makeClusterPSOCK()gained argumentvalidateto control whether or not the nodes should be tested after they’ve been created. The validation is done by querying each node for its session information, which is then saved as attributesession_infoon the cluster node object. This information is also used in error messages, if available. This validation has been done since version 1.5.0 but now it can be disabled. The default of argumentvalidatecan be controlled via an R options and an environment variable.Now
makeNodePSOCK(..., rscript_envs = "UNKNOWN")produces an informative warning on non-existing environment variables that was skipped.
Bug Fixes
makeClusterPSOCK()would produce an error on ‘one node produced an error: could not find function “getOptionOrEnvVar”’ if parallelly is not available on the node.makeClusterPSOCK()would attempt to load parallelly on the worker. If it’s not available on the worker, it would result in a silent warning on the worker. Now parallelly is not loaded.makeClusterPSOCK(..., tries = n)would retry to setup a cluster node also on errors that were unrelated to node setup or node connection errors.The error message on using an invalid
rscript_envsargument formakeClusterPSOCK()reported on the value ofrscript_libs(sic!).makeNodePSOCK(..., rscript_envs = "UNKNOWN")would result in an error when trying to launch the cluster node.
Deprecated and Defunct
- Removed
find_rshcmd()which was never meant to be exported.
Version 1.20.0
CRAN release: 2020-10-20
Significant Changes
- Add
availableCores(), andavailableWorkers(),supportsMulticore(),as.cluster(),autoStopCluster(),makeClusterMPI(),makeClusterPSOCK(), andmakeNodePSOCK()from the future package.
New Features
- Add
isConnectionValid()andconnectionId()adopted from internal code of the future package.
Bug Fixes
Renamed environment variable
R_FUTURE_MAKENODEPSOCK_triesused bymakeClusterPSOCK()toR_FUTURE_MAKENODEPSOCK_TRIES.connectionId()did not return-1Lon Solaris for connections with internal ‘nil’ pointers because they were reported as ‘0’ - not ‘nil’ or ‘0x0’.
Version 1.19.0
Significant Changes
Now
availableCores()better supports Slurm. Specifically, if environment variableSLURM_CPUS_PER_TASKis not set, which requires that option--slurm-cpus-per-task=nis specified andSLURM_JOB_NUM_NODES=1, then it falls back to usingSLURM_CPUS_ON_NODE, e.g. when using--ntasks=n.Now
availableCores()andavailableWorkers()supports LSF/OpenLava. Specifically, they acknowledge environment variableLSB_DJOB_NUMPROCandLSB_HOSTS, respectively.
New Features
makeClusterPSOCK()will now retry to create a cluster node up totries(default: 3) times before giving up. If argumentportspecies more than one port (e.g.port = "random") then it will also attempt find a valid random port up totriestimes before giving up. The pre-validation of the random port is only supported in R (>= 4.0.0) and skipped otherwise.makeClusterPSOCK()skips shell quoting of the elements inrscriptif it inherits fromAsIs.makeClusterPSOCK(), or actuallymakeNodePSOCK(), gained argumentquiet, which can be used to silence output produced bymanual = TRUE.
Performance
- Now
plan(multisession),plan(cluster, workers = <number>), andmakeClusterPSOCK()which they both use internally, sets up localhost workers twice as fast compared to versions since future 1.12.0, which brings it back to par with a bare-boneparallel::makeCluster(..., setup_strategy = "sequential")setup. The slowdown was introduced in future 1.12.0 (2019-03-07) when protection against leaving stray R processes behind from failed worker startup was implemented. This protection now makes use of memoization for speedup.
Version 1.18.0
New Features
print()onRichSOCKclustergives information not only on the name of the host but also on the version of R and the platform of each node (“worker”), e.g. “Socket cluster with 3 nodes where 2 nodes are on host ‘localhost’ (R version 4.0.0 (2020-04-24), platform x86_64-w64-mingw32), 1 node is on host ‘n3’ (R version 3.6.3 (2020-02-29), platform x86_64-pc-linux-gnu)”.It is now possible to set environment variables on workers before they are launched by
makeClusterPSOCK()by specify them as as<name>=<value>as part of therscriptvector argument, e.g.rscript=c("ABC=123", "DEF='hello world'", "Rscript"). This works because elements inrscriptthat match regular expression"^ [[:alpha:]_][[:alnum:]_]*=.*"are no longer shell quoted.makeClusterPSOCK()now returns a cluster that in addition to inheriting fromSOCKclusterit will also inherit fromRichSOCKcluster.
Bug Fixes
Made
makeClusterPSOCK()andmakeNodePSOCK()agile to the name change fromparallel:::.slaveRSOCK()toparallel:::.workRSOCK()in R (>= 4.1.0).makeClusterPSOCK(..., rscript)will not try to locaterscript[1]if argumenthomogeneousis FALSE (or inferred to be FALSE).makeClusterPSOCK(..., rscript_envs)would result in a syntax error when starting the workers due to non-ASCII quotation marks if optionuseFancyQuoteswas not set to FALSE.
Version 1.17.0
New Features
-
makeClusterPSOCK()gained argumentrscript_envsfor setting environment variables in workers on startup, e.g.rscript_envs = c(FOO = "3.14", "BAR").
Miscellaneous
- Not all CRAN servers have
_R_CHECK_LIMIT_CORES_set. To better emulate CRAN submission checks, the future package will, when loaded, set this environment variable to TRUE if unset and ifR CMD checkis running. Note thatfuture::availableCores()respects_R_CHECK_LIMIT_CORES_and returns at most2L(two cores) if detected.
Version 1.15.1
New Features
- The default range of ports that
makeClusterPSOCK()draws a random port from (when argumentportis not specified) can now be controlled by environment variableR_FUTURE_RANDOM_PORTS. The default range is still11000:11999as with the parallel package.
Version 1.15.0
Documentation
- Added ‘Troubleshooting’ section to
?makeClusterPSOCKwith instructions on how to troubleshoot when the setup of local and remote clusters fail.
Bug Fixes
makeClusterPSOCK()could produce warnings like “cannot open file ‘/tmp/alice/Rtmpi69yYF/future.parent=2622.a3e32bc6af7.pid’: No such file”, e.g. when launching R workers running in Docker containers.makeClusterMPI()did not work for MPI clusters with ‘comm’ other than ‘1’.
Version 1.13.0
New Features
Now
availableCores()also recognizes PBS environment variableNCPUS, because the PBSPro scheduler does not setPBS_NUM_PPN.If, option
future.availableCores.customis set to a function, thenavailableCores()will call that function and interpret its value as number of cores. Analogously, optionfuture.availableWorkers.customcan be used to specify a hostnames of a set of workers thatavailableWorkers()sees. These new options provide a mechanism for anyone to customizeavailableCores()andavailableWorkers()in case they do not (yet) recognize, say, environment variables that are specific the user’s compute environment or HPC scheduler.makeClusterPSOCK()gained support for argumentrscript_startupfor evaluating one or more R expressions in the background R worker prior to the worker event loop launching. This provides a more convenient approach than having to use, say,rscript_args = c("-e", sQuote(code)).makeClusterPSOCK()gained support for argumentrscript_libsto control the R package library search path on the workers. For example, to prepend the folder~/R-libson the workers, userscript_libs = c("~/R-libs", "*"), where"*"will be resolved to the current.libPaths()on the workers.
Bug Fixes
-
makeClusterPSOCK()did not shell quote the Rscript executable when running its pre-tests checking whether localhost Rscript processes can be killed by their PIDs or not.
Version 1.12.0
New Features
- If
makeClusterPSOCK()fails to create one of many nodes, then it will attempt to stop any nodes that were successfully created. This lowers the risk for leaving R worker processes behind.
Bug Fixes
-
makeClusterPSOCK()in future (>= 1.11.1) produced warnings when argumentrscripthadlength(rscript) > 1.
Version 1.11.1.1
Bug Fixes
- When
makeClusterPSOCK()fails to connect to a worker, it produces an error with detailed information on what could have happened. In rare cases, another error could be produced when generating the information on what the workers PID is.
Version 1.11.1
New Features
- The defaults of several arguments of
makeClusterPSOCK()andmakeNodePSOCK()can now be controlled via environment variables in addition to R options that was supported in the past. An advantage of using environment variables is that they will be inherited by child processes, also nested ones.
Software Quality
- TESTS: When the future package is loaded, it checks whether
R CMD checkis running or not. If it is, then a few future-specific environment variables are adjusted such that the tests play nice with the testing environment. For instance, it sets the socket connection timeout for PSOCK cluster workers to 120 seconds (instead of the default 30 days!). This will lower the risk for more and more zombie worker processes cluttering up the test machine (e.g. CRAN servers) in case a worker process is left behind despite the main R processes is terminated. Note that these adjustments are applied automatically to the checks of any package that depends on, or imports, the future package.
Bug Fixes
- Whenever
makeClusterPSOCK()would fail to connect to a worker, for instance due to a port clash, then it would leave the R worker process running - also after the main R process terminated. When the worker is running on the same machine,makeClusterPSOCK()will now attempt to kill such stray R processes. Note thatparallel::makePSOCKcluster()still has this problem.
Version 1.11.0
New Features
makeClusterPSOCK()produces more informative error messages whenever the setup of R workers fails. Also, its verbose messages are now prefixed with “[local output]” to help distinguish the output produced by the current R session from that produced by background workers.It is now possible to specify what type of SSH clients
makeClusterPSOCK()automatically searches for and in what order, e.g.rshcmd = c("<rstudio-ssh>", "<putty-plink>").Now
makeClusterPSOCK()preserves the global RNG state (.Random.seed) also when it draws a random port number.makeClusterPSOCK()gained argumentrshlogfile.
Version 1.10.0
New Features
Add
makeClusterMPI(n)for creating MPI-based clusters of a similar kind asparallel::makeCluster(n, type = "MPI")but that also attempts to workaround issues whereparallel::stopCluster()causes R to stall.makeClusterPSOCK()andmakeClusterMPI()gained argumentautoStopfor controlling whether the cluster should be automatically stopped when garbage collected or not.
Version 1.9.0
Bug Fixes
-
makeClusterPSOCK()produced a warning when environment variableR_PARALLEL_PORTwas set torandom(e.g. as on CRAN).
Version 1.8.1
New Features
-
makeClusterPSOCK()now produces a more informative warning if environment variableR_PARALLEL_PORTspecifies a non-numeric port.
Version 1.7.0
New Features
- On Windows,
makeClusterPSOCK(), and thereforeplan(multisession)andplan(multiprocess), will use the SSH client distributed with RStudio as a fallback if neithersshnorplinkis available on the systemPATH.
Bug Fixes
-
makeClusterPSOCK(..., renice = 19)would launch each PSOCK worker vianice +19resulting in the error “nice: ‘+19’: No such file or directory”. This bug was inherited fromparallel::makePSOCKcluster(). Now usingnice --adjustment=19instead.
Version 1.5.0
New Features
makeClusterPSOCK()now defaults to use the Windows PuTTY software’s SSH clientplink -ssh, ifsshis not found.Argument
homogeneousofmakeNodePSOCK(), a helper function ofmakeClusterPSOCK(), will default to FALSE also if the hostname is a fully qualified domain name (FQDN), that is, it “contains periods”. For instance,c('node1', 'node2.server.org')will usehomogeneous = TRUEfor the first worker andhomogeneous = FALSEfor the second.makeClusterPSOCK()now asserts that each cluster node is functioning by retrieving and recording the node’s session information including the process ID of the corresponding R process.
Documentation
- Help on
makeClusterPSOCK()gained more detailed descriptions on arguments and what their defaults are.
Version 1.4.0
New Features
- The default values for arguments
connectTimeoutandtimeoutofmakeNodePSOCK()can now be controlled via global options.
Version 1.3.0
New Features
makeClusterPSOCK()treats workers that refer to a local machine by its local or canonical hostname as"localhost". This avoids having to launch such workers over SSH, which may not be supported on all systems / compute cluster.Added
availableWorkers(). By default it returns localhost workers according toavailableCores(). In addition, it detects common HPC allocations given in environment variables set by the HPC scheduler.Option
future.availableCores.fallback, which defaults to environment variableR_FUTURE_AVAILABLECORES_FALLBACKcan now be used to specify the default number of cores / workers returned byavailableCores()andavailableWorkers()when no other settings are available. For instance, ifR_FUTURE_AVAILABLECORES_FALLBACK=1is set system wide in an HPC environment, then all R processes that usesavailableCores()to detect how many cores can be used will run as single-core processes. Without this fallback setting, and without other core-specifying settings, the default will be to use all cores on the machine, which does not play well on multi-user systems.
Version 1.2.0
New Features
Added
makeClusterPSOCK()- a version ofparallel::makePSOCKcluster()that allows for more flexible control of how PSOCK cluster workers are set up and how they are launched and communicated with if running on external machines.Added generic
as.cluster()for coercing objects to cluster objects to be used as inplan(cluster, workers = as.cluster(x)). Also added ac()implementation for cluster objects such that multiple cluster objects can be combined into a single one.
Version 1.1.1
Bug Fixes
- For the special case where ‘remote’ futures use
workers = "localhost"they (again) use the exact same R executable as the main / calling R session (in all other cases it uses whateverRscriptis found on thePATH). This was already indeed implemented in 1.0.1, but with the added support for reverse SSH tunnels in 1.1.0 this default behavior was lost.
Version 1.1.0
New Features
- REMOTE CLUSTERS: It is now very simple to use
cluster()andremote()to connect to remote clusters / machines. As long as you can connect via SSH to those machines, it works also with these future. The new code completely avoids incoming firewall and incoming port forwarding issues previously needed. This is done by using reverse SSH tunneling. There is also no need to worry about internal or external IP numbers.
Version 0.15.0
New Features
- Now
availableCores()also acknowledges environment variableNSLOTSset by Sun/Oracle Grid Engine (SGE).
Version 0.12.0
Bug Fixes
- FIX: Now
availableCores()returns3L(=2L+1L) instead of2Lif_R_CHECK_LIMIT_CORES_is set.
Version 0.10.0
New Features
- Now
availableCores()also acknowledges the number of CPUs allotted by Slurm.
