Thursday, May 17, 2012

Notes RStudio Server

Notes RStudio Server

Connecting to Linux
Basic Linux Admin
R installations on Linux
R Studio Server Installation
Installing Apache
Configuring Apache Daemon - httpd
R Studio Management / Configuration
Configuring Subversion for Rstudio
Running R-Studio with Subversion
Services That Need To Be Restarted

server names, login names, etc --- see details.txt

Connecting to Linux
Get Open Source SSH software

To connect via SSH:
1. Open Bitvise Tunnelier
2. Host: <LinuxServer>
   Port: 22
   Username: <LinuxUser>
   Initial Method: password
   Password: <>
3. Click on Open New Terminal Console.
4. To check sudo password is working:
   sudo more /tmp/dsm.sys.bak
   ... or use any files that can be viewed by root only.

Basic Linux Admin
To check which version of GCC goes with which version of RH Linux.

check version of Linux
   cat /proc/version
Linux version 2.6.18-308.el5 ( (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) #1 SMP Fri Jan 27 17:17:51 EST 2012
   uname -a

check 32bit vs 64 bit
   uname -m
----- output ------
file /usr/bin/file
/usr/bin/file: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), stripped

check packages
yum list installed
yum list available

For X windows:
sudo vi /etc/ssh/ssh_config
   - modify this file so that
   X11Forwarding yes
   X11DisplayOffset 10
   X11UseLocalhost yes

No  DISPLAY=myws:0; export DISPLAY
No  DISPLAY=:0.0; export DISPLAY
No  DISPLAY=:0; export DISPLAY
No  DISPLAY=.0; export DISPLAY

In Remote box, type:
    DISPLAY=localhost:10.0; export DISPLAY
xterm &

To avoid setting DISPLAY everytime, edit .bash_profile by adding these lines:
export DISPLAY

Fedora, and perhaps RHEL, rpm packages are "likely" to be installed in:

top             - to see status of processes, press 'q' to quit, press 'H' to toggle threads view.
                - press 'n' to enter the number of lines displayed.
mpstat -P ALL   - to see stats on all CPUs

/etc/services   - a file with list of port numbers and their associated services.
netstat         - to see the ports of active services

R installations on Linux

Manual Downloads:
R-2.15.0-1.el5.x86_64 [16 KiB] Changelog by Tom Callaway (2012-03-30):- Update to 2.15.0
R-core-2.15.0-1.el5.x86_64 [36.0 MiB] Changelog by Tom Callaway (2012-03-30):- Update to 2.15.0
R-devel-2.15.0-1.el5.x86_64 [90 KiB] Changelog by Tom Callaway (2012-03-30):- Update to 2.15.0
xdg-utils-1.0.2-4.el5.noarch [52 KiB] Changelog by Lubomir Kundrak (2008-01-25):- Fix for CVE-2008-0386 (#429513)

rpm -i R-core-2.15.0-1.el5.x86_64.rpm
warning: R-core-2.15.0-1.el5.x86_64.rpm: Header V4 DSA signature: NOKEY, key ID
error: Failed dependencies:
        cups is needed by R-core-2.15.0-1.el5.x86_64 is needed by R-core-2.15.0-1.el5.x86_64
        tetex-latex is needed by R-core-2.15.0-1.el5.x86_64
        xdg-utils is needed by R-core-2.15.0-1.el5.x86_64

Type these commands to install dependencies:
sudo yum install cups
sudo yum install tetex
sudo yum install tk.x86_64
sudo rpm -i xdg-utils-1.0.2-4.el5.noarch.rpm
sudo yum install tetex-latex
sudo rpm -i R-core-2.15.0-1.el5.x86_64.rpm

To check rpm's are installed, eg.
sudo rpm -q -i xdg-utils
sudo yum info R-core-2.15.0-1.el5.x86_64     # can yum even if packaged was RPM

To check yum's are installed, eg.
sudo yum info tetex-latex
sudo more /var/log/yum.log

yum configuration details are in:

R Studio Server Installation

1. Extra Packages for Enterprise Linux  (EPEL)
Download epel-release-5-4.noarch.rpm from:

(Transfer that file to the Linux box)

In the Linux box, do (see WARNING below first) the following:
sudo rpm -Uvh epel-release-5-4.noarch.rpm

To check installation:
sudo rpm -q -i epel-release-5-4

WARNING: It may not be necessary to do this step. Not sure what the consequences of avoiding this step is. The main problem with DOING this step, from actual experience, is that it corrupts the yum repository. Actually it added 2 repository which cannot be accessible. The solution is to move the files below to somewhere else:
sudo move /etc/yum.repos.d/epel*  <Some Junk Directory>

2. Install R (core or base) package.
Since R is installed in the step above, we just need to check R is installed by:
     sudo yum info R-core-2.15.0-1.el5.x86_64     # can yum even if packaged was RPM
     sudo rpm -q -i R-core

3. Installing R Studio
Download the Rstudio package from:

Transfer the rpm to the Linux box if not there already.

Install:  sudo rpm -Uvh rstudio-server-0.95.265-x86_64.rpm

Check:  sudo rstudio-server verify-installation
   ...the following output from the check still allows Rstudio to run through web in the end - so it is OK.
rserver[20481]: WARNING R include path (/usr/include/R) not found; LOGGED FROM: bool core::r_util::<unnamed>::validateREnvironment(const core::r_util::EnvironmentVars&, const core::FilePath&, std::str
ing*) /root/rstudio/src/cpp/core/r_util/REnvironmentPosix.cpp:379
Starting rstudio-server:                                   [  OK  ]

Configuring for External Libraries - WORK IN PROGRESS
- This are R libraries that are not part of the standard R base. They are developed by other users, can be downloaded from the R repository.
- For R studio, let only the svn user have control over the R libraries and they will be stored at:
- Create the configuration files as follows, edit the file:
sudo vi /etc/rstudio/rserver.conf  
- Put this content in the file:
- Test and restart server
 sudo rstudio-server test-config
 sudo rstudio-server restart

 Package Management Proposal
USE CASE 1: A normal R user wants to install packages using Rstudio-server GUI.
 - Problem 1: it appears each user need to install their own packages under ~/R/library
 - Problem 2: Linux server cannot access outside world.
 - Solution:
 a. Create a local repository on Linux box, so that Rserver points to http://localhost/src/contrib/
 b. On Windows: Manually or write a Web Robot to source all updated packages. Then transfer all packages to Linux server under <Web>/src/contrib
 c. On Linux, each user specify the localhost as R repository.
 Summary, essentially we run our R Repository mirror. Each user install packages as they need. All users of CBA are guaranteed to use the same version of packageas because they all source it from the same CBA repository.
 d. An extra step may be to link all users directories ~/R/library to a common directory on Linux server. This means when one user install a package, it will be available to all users.

USE CASE 2: An R administrator installs packages for everyuser.
The R administrator gets the R library SOURCE, in the form of *.tar.gz and puts in them in <LOCAL_REPO>.
To install individual packages, the R administrator will do:
    sudo R CMD INSTALL -l <R_SHARE> <LOCAL_REPO>/adk_1.0-2.tar.gz

... where the example package to install is adk, and the location to install may be <R_SHARE>="/usr/share/R/library"
Check that R_SHARE is listed as one of the directories given by R command: .libPaths()

To remove:
sudo R CMD REMOVE -l <R_SHARE> adk

To Check:
- login as a regular R user and type:

With this method, any R user can use the package and all users will be using the same version.
The problem of accessing the repository still remains and has the same solution described as above.

Some R package management commands:
getOption("repos")                 - list R repositories
getOption("defaultPackages")       - list packages loaded by default
remove.packages(c("pkg1", "pkg2"), lib = file.path("path", "to", "library"))
.libPaths()    - default libpaths
     "/home/cheec/R/library"             "/usr/lib64/R/library"            
     "/usr/share/R/library"              "/usr/lib/rstudio-server/R/library"
Sys.getenv("R_LIBS_USER")                -  "~/R/library"  *** note these are R env vars - they are not Linux env vars
Sys.getenv("R_HOME")                     -  "/usr/lib64/R"
R_HOME/etc/repositories                  - list of directories
R CMD INSTALL -l <LIB> pk1 pk2           - installs packages pk1, pk2 to location <LIB>
install.packages(c("pk1", "pk2"))        - within R, install packages pk1, pk2
install.packages("pk1", dependencies=TRUE)  - within R, install packages pk1 and dependencies.
install.packages("~/R/library/adk_1.0-2.tar.gz", repos = NULL)
library(help="adk")                      - to check if adk has been installed correctly
update.packages()                        - checks and updates all necesary packagees
R CMD check -l ~/R/library ~/R/library/adk_1.0-2.tar.gz     - to check and INSTALL the package adk is installed.
sudo R CMD INSTALL -l /usr/share/R/library ~/R/library/adk_1.0-2.tar.gz
sudo R CMD REMOVE -l /usr/share/R/library adk

To set up a local CRAN mirror:

Installing Apache
apr-util 1.4.1

Transfer the *.gz files to the linux box.

Extract the Apache source:
    sudo mv <the FOUR *.gz files>  /usr/local
cd /usr/local
    sudo tar -xzvf <the FOUR *.gz files>

Installing apr:
   go to the appropriate directory, eg cd /usr/local/apr
   sudo ./configure
   sudo make
   sudo make install

Installing apr-util
   go to the appropriate directory, eg cd /usr/local/apr-util....
   sudo ./configure --with-apr=/usr/local/apr
   sudo make
   sudo make install

Install pcre-devel
   sudo yum install pcre-devel.x86_64

Installing apache2
   go to the appropriate directory, eg cd /usr/local/httpd...
   sudo ./configure --with-apr=/usr/local/apr --enable-proxy --enable-proxy-http --enable-proxy-html  --enable-xml2enc
   sudo make
   sudo make install

Uninstall apache2 - only if you need to uninstall for whatever reasons
   go to the appropriate directory, eg cd /usr/local/httpd...
   sudo make clean
   sudo make distclean
Installing elinks - a text based browser (for checking whether webserver running locally)
   sudo yum install elinks

Configuring Apache - to use Reverse Proxy to Rstudio-Server
- apache src files are at: /usr/local/httpd....  --> <SRC_DIR>
- by default, apache is installed under : /usr/local/apache2    (check this by <SRC_DIR>/configure --help)
- from here on, let notation  $PREFIX="/usr/local/apache2"
- to start   apache server:  $PREFIX/bin/apachectl -k start
- to stop    apache server:  $PREFIX/bin/apachectl -k stop
- to restart apache server:  $PREFIX/bin/apachectl restart
- to check if apache server is running: pstree   (look for httpd in text)
- to check apache installation, edit the file $PREFIX/htdocs/index.html and write some html content like "Hello World"
  open up elinks text browser on the linux box, and goto http://localhost, you should see the index.html modified above.  
  open up normal browser on your PC desktop within CBA network, and goto http://<HOST>, you should see the index.html modified above.  

Configuring Proxies

Put the following code into PREFIX/conf/httpd.conf file:
<VirtualHost *:80>
<Proxy *>
Allow from localhost
ProxyPass        / http://localhost:8787/
    ProxyPassReverse / http://localhost:8787/

- stop apache server (see above)
- start apache server (see above)
- Open a browser on your local PC desktop and goto http://<HOST> (no port numbers needed). You will be REDIRECTED to R-Studio.

Configuring Apache Daemon - httpd
Note there are TWO versions of apache. One is the RPM package that came prebuilt with RHEL. The other one is the one manually built using the method above. The self-build one is needed over the RPM one because several specific options for Rstudio requires this.

The RPM apache is configured to be ready to start up with the following configuration.
File: /etc/rc.d/init.d/httpd                 - startup script that has the following relevant information
- points to /etc/sysconfig/httpd             - has httpd system related information
- points to /usr/sbin/apachectl              - control program
- points to /usr/sbin/httpd                  - actual daemon program
- points to /var/run/
- points to /var/lock/subsys/httpd
- points to /etc/httpd/conf/httpd.conf       - main server configuration file

For the apache build locally, it is not set to startup, but has the following files
- /usr/local/apache2/bin/apachectl           - control program
- /usr/local/apache2/bin/httpd               - actual daemon program
- /usr/local/apache2/conf/httpd.conf         - main server configuration file

*** You will be SUPPLIED / GIVEN a file called httpd_start
The idea is to modify the already existing /etc/rc.d/init.d/httpd  (from RHEL's Apache rpm) so that it has the contents of the given httpd_start file
1. Edit the file /etc/rc.d/init.d/httpd  so that its contents are identical to httpd_start.
2.Alternatively to copy-paste is to upload httpd_start to directory  /etc/rc.d/init.d/ to REPLACE existing httpd file.
3.Also do this to hide another file:  sudo mv /usr/sbin/httpd  /usr/sbin/httpd_REDHAT_old

to check that this new httpd is working, do the following:
cd /etc/rc.d/init.d
sudo ./httpd start
--- check that the webserver is running ----
sudo ./httpd stop

Then do the following:
sudo chmod 755 /etc/rc.d/init.d/httpd        - to ensure it has permissions to start
sudo chkconfig --list                             - do this to check that httpd is registered or not
sudo chkconfig --add /etc/rc.d/init.d/httpd       - to register for startup, if not there yet
sudo chkconfig --level 2345 httpd on              - to switch on at specific runlevels

Ref - for init.d/httpd startup script

R Studio Management / Configuration
   To manually stop, start, and restart the server you use the following commands:
sudo rstudio-server stop
sudo rstudio-server start
sudo rstudio-server restart

   To list all currently active sessions:
sudo rstudio-server active-sessions
   To suspend an individual session:
sudo rstudio-server suspend-session <pid>
   To suspend all running sessions:
sudo rstudio-server suspend-all
   The suspend commands also have a "force" variation which will send an interrupt to to the session to request the termination of any running R command:
sudo rstudio-server force-suspend-session <pid>
sudo rstudio-server force-suspend-all
   The force-suspend-all command should be issued immediately prior to any reboot so as to preserve the data and state of active R sessions accross the restart.

   Taking the Server Offline - If you need to perform system maintenance and want users to receive a friendly message indicating the server is offline you can issue the following command:
sudo rstudio-server offline
sudo rstudio-server online
   These two commands are independent of start stop. When suspended, even if the server is restarted, rstudio is still not accessible (ie offline), until it is switched back to online again.

Workflow for taking R offline:
Method A
- on server: sudo rstudio-server offline
- on user PC: 1. "Error: Status code 503 returned" display in the R session 2. "Rstudio Temporarily Offline" dialog appears. 3. User cannot do anything.
- on server, this has to be done IN ORDER, otherwise session will not start.
   1. sudo rstudio-server restart
   2. sudo rstudio-server online

Version Control using Subversion
RHEL5 has got subversin 1.6.11 pre-installed for both i386 and x86_64
To use Subversion with R, the work needs to be organized a Rstudio Projects.

1. To check if the svnserver starts automatically, whether svn is preinstalled for Red Hat, or self install later, type:
  ls -laF /etc/init.d/svnserve
to see if the file exists. This file runs a script to start the daemon.

2. Create svn user and group, type:
sudo useradd svn  - use id svn to check user details
sudo passwd svn   - make password for svn -> "svn"
sudo usermod  -a -G R_POC_TEAM svn    - this puts the user "svn" to the group called R_POC_TEAM.

Ensure that all other user of svn is in the same group called "R_POC_TEAM". To check which groups the users are in, type:
sudo more /etc/passwd

3. Create the SVN repository for all members of Group Quantitative Analytics (GQA).
- this is done only ONCE, when a new VM and subversion is installed.
- all users and the 'svn' user need to be in the same group. In this example, all these users belong to a group called R_POC_TEAM. If this is not true, make it so.
- Login as svn user.
- create the repository called QuantAnalytics, type:
svnadmin create --fs-type fsfs /home/svn/QuantAnalytics

This creates the repository QuantAnalytics, that will contain all projects. Note that --fs-type fsfs is the preferred filesystem for svn rather than the Berkley DB filesystem (see manual).

4. Check the file /etc/services. If it does not contain the following lines, then add them:
svn    3690/tcp     # Subversion
svn    3690/udp     # Subversion

5. Configuring Authentication and Authorization
- LOGIN AS USER called 'svn'
- go to the QuantAnalytics repository, eg:
    cd /home/svn/QuantAnalytics
- Create a file under ...QuantAnalytics/conf/svnserve.conf, with the following content:
password=db = passwd
realm = QA realm
anon-access = read
auth-access = write
...... passwd is actually the word "passwd".

- Ensure the file called 'passwd' exist under ...QuantAnalytics/conf/ and that it has the following content:
harry = harryssecret
sally = sallyssecret

.... where harry and sally are valid users in the linux box.

6. Hide the absolute path of the svn server and provide relative paths to user only.
Edit this file: /etc/rc.d/init.d/svnserve

by adding "-r /home/svn" to the following line if it exist, so that it becomes:
args="--daemon --pid-file=${pidfile} $OPTIONS  -r /home/svn"

Now when the svn server is mentioned, it is done by:
<svn server>/QuantAnalytics  instead of  <svn server>/home/svn/QuantAnalytics

The following information are for references:
- Access SVN. The URL pointing to the repository is:

- Some svn admin commands: svnadmin, svnlook, svndump, svndumpfilter, svnsync

- svnlook info <repos>      - prints information for the repository at <repos>.
                            The information include author, date, number of lines of log, log message

- sudo killall svnserve     - force termination of any active svnserve services. Also solves the error with message:
                            " svnserve: Can't bind server socket: Address already in use"

- sudo /etc/init.d/svnserve start       - to start svn server once off.

***** How to setup iptables - NOT Needed here

Running R-Studio with Subversion
A. NEVER NEVER NEVER have filenames with SPACES or ANY OTHER symbols except . (fullstop) and _ (underscore).

1. To run Rstudio, open a browser and type this URL:

To login as another user (for administrators only), when the user name field is no longer visible, type:

The sign in password is the same password of the Unix account.

2. Go to Tools - Options, click the Version Control icon on the left navigation pane.
- Under "SVN executable:", type:  /usr/bin/svn

3. The Repository URL is:

4. To Finish R, there are two ways:
- Within Rstudio, click File - Quit R.
This will finish off the current R but immediately asks you if you want to start a new R version.
- On the top right, click Sign-Out
This will sign you out, but your session is saved on the server. When you login again, you will see the same session where you left off, including all the variables in the workspace will still be there.
- Also while working in a project, MUST close project before exiting R, otherwise the same project opens automatically when Rstudio is started. To close project, click: Projects - Close Projects.

5. (Recommended) To create a new project and at the same time, enabling version control.
This is the recommended way, even if you have existing R code files and existing project, to simplify the process of putting your code into version control, just create a new project associated with version control. Then copy your files into this new project.
- Once logged in to Rstudio, click Tools - Shell
- Type: mkdir ~/<ProjName>
  where ~ means your home directory, <ProjName> is the name of your project
- Type:  svn import ~/<ProjName> svn://localhost/QuantAnalytics/<ProjName>
- Click Close, to get back to main Rstudio
- Click Project - New Project
- Choose "Version Control" from the New Project dialog.
- Click "Subversion", not "Git"
- Enter the following details:
Repository URL:                        svn://localhost/QuantAnalytics/<ProjName>
Username:                              <your Linux account username>
Project Directory Name:                <ProjName>
Create project as subdirectory of:     ~
- Click "Create Project"
- RStudio will open up in the new project directory. Experiment a bit here. Create a file, write something, save it.
- Copy files from other directories into this directory. Then add these files to version control.

6.A. When a Directory for a group of R files exist, to put it in version control, first formalize that directory as a R project, then add to version control.
- Once logged in to Rstudio, click Project - New Project
- Choose "Existing Directory" from the New Project dialog.
- In the Project working Directory, enter the existing directory path, eg. ~/ExistingDir
- Click "Create Project".
At the end of this stage, a new project has been created to encompass old files in an old directory.

6.B. To put existing R project directory into SVN:
- Login to Rstudio and open the project created in the previous step.
- Check that Rstudio is now in that project directory, type:    getwd()  
- On the Menu, click Tools - Shells
- To upload files to SVN, type:
(In General) svn import <local path> <SVN repo URL> -m "your message"
(Example)    svn import ~/myProj svn://localhost/QuantAnalytics/myProj -m "Initial Import"

svn delete  svn://localhost/QuantAnalytics/myProj/.Rproj.user
... then press C
svn delete  svn://localhost/QuantAnalytics/myProj/.Rhistory
... then press C
The last two lines are to remove your personal R configuration files - since these should not be in the server.
Alternatively, to avoid the "svn delete", we can just selectively import the R source files we intend to store.

svn checkout --force svn://localhost/QuantAnalytics/myProj   ~/myProj
svn update
svn commit

- Close the Shell from Rstudio.
- Now back in the main Rstudio interface, Close the R project
- Reopen the R project. You will now find in the top-right Box, next to the "Workspace" and "History" tabs, there is a new "SVN" tab. This "SVN" will track any changes made to files which are under version control.

7. A few other tasks with SVN and R studio.

To UN-version control your local directory - the situation is directory A is in version control, and directory A is in your local workspace already. You wish to keep the contents in your workspace. You wish to break any links between your workspace and version control.
- go to directory A
- type: rm -r -f .svn

To delete any path or branch from version control. Warning: Deleting a path and files from version control may be deleting some files your colleagues are using. DO NOT do this until you check with them. Danger: this command may allow you to accidentally delete the wrong path in version control, ie. deleting your colleauges files. BE SURE what you are deleting.
- svn delete svn://localhost/QuantAnalytics/<your directory>

svn delete svn://localhost/QuantAnalytics

Services That Need To Be Restarted

Services are: Apache Rstudio    SVN
<ServiceNames>:     httpd        rstudio-server      svnserve
Runlevel default:       NA                  2345                NA
Runlevel default:      2345                2345                2345

Runlevel default is the out-of-box startup configuration before any changes have been made.
This is obtained from :  chkconfig --list

Do the following to register httpd and svnserve will start automatically
sudo chkconfig --level 2345 httpd on
sudo chkconfig --level 2345 svnserve on

Redhat Enterprise Linux runlevels are (according to /etc/inittab)
# Default runlevel. The runlevels used by RHS are:
#   0 - halt (Do NOT set initdefault to this)
#   1 - Single user mode
#   2 - Multiuser, without NFS (The same as 3, if you do not
#   3 - Full multiuser mode
#   4 - unused
#   5 - X11
#   6 - reboot (Do NOT set initdefault to this)

Checking if daemon has started:
service rstudio-server status
chkconfig --list
sudo more /etc/inittab            - shows how the system is initialized
sudo ls /etc/rc.d/init.d/         - list of daemon files, used or not used in startup
ls -latrF /etc/rc.d/rc3.d/        - links to actual files in init.d, at this particular runlevel 3

No comments: