TWS/Maestro – Problems and Solutions

 

Preface

 

This document is designed to help users troubleshoot problems encountered when creating or running jobs in Maestro[1]  Often problems with Maestro jobs are due to either permission or configuration problems with the job script or configuration problems with the Unix or Windows user account under which the job runs. These are problems that Maestro support cannot resolve.  The purpose of this document is to help users solve Maestro-related problems more quickly by helping them identify the cause of the problem, and guiding them to the most appropriate resource for help, which in many cases is not Maestro support.

 

Background

 

Maestro is composed of a Master cpu and a number of Fault Tolerant Agents (FTAs).  At Princeton, the Master is either Maestrosrv or Maestrodev, and the FTAs are various administrative Unix or Windows hosts such as isserv205 or landr.  Once a day, at 7:30 AM , the Master sends the day’s schedule to all of the FTAs.  After that, the Master collects jobs status information from the FTAs and coordinates cross-platform dependencies.

 

The most important thing to understand about Maestro is that it is simply a scheduling package.  It runs a job script at the appointed time, as the designated user, but it does not manipulate the script in any way.  Once Maestro has issued the command to run your Job Stream or job, your script has control, and Maestro is completely out of the picture.  It simply waits for your Job Stream or job to complete and return a completion code.

 

When to contact Maestro support (Robert Hebditch)

 

When contacting Maestro support for the following security file changes you need to provide the user loginid, the job loginid (eg lradmin, advora, etc.), the name of the FTA (e.g. isserv205) and the two initial letters of the jobs/schedules involved

 

  • A new person needs to be able to create and/or submit jobs (see the Maestro ID Request document for details).

 

  • An existing Maestro user needs his or her privileges extended or reduced.

 

  • A new Unix or Windows host needs to be added to the list of Maestro FTAs.

 

.

 

For run-time problems, first apply the appropriate solution from this document.  This will usually be the fastest way to solve your problem.

 

For problems that you cannot fix by applying the solutions in this document, please collect the following information before contacting Maestro support.

 

For run-time problems, you MUST provide, at a minimum:

 

  • The job/schedule name and name of the FTA.

 

  • An accurate description of the problem including the actions that led up to, or that your took as a consequence of the error.

 

  • The exact phrasing of any error message(s) you received including the error id number if there is one (a screen print is often the best way to provide this information).

 

  • Other relevant details such as extracts from the job log, a copy of the script you are running, the date of the schedule if not the current day, and anything else that would help Maestro support reconstruct the details of the problem.

 

Providing appropriate background information on your job will expedite finding the solution to your problem.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Trouble Shooting Check List

 

Problems with Job Scheduling Console

 

 

Note:  Job Scheduling Console is upgraded from time to time, especially when major changes are involved.  Before investigating ANY Job scheduling Console problems, first check that you are running the latest version. The most current version is available at  \\files\software\maestro\JSConsole-Install. Select your operating system.  Follow the readme.doc instructions to install.

 

Current version is JSC 1.4 is available as of Feb 8 2006

 

Remote Console or GConman is not functioning properly

Use Job Scheduling Console as your front end to Maestro - REMOTE CONSOLE IS UNSUPPORTED, NO ONE SHOULD BE USING IT

 

Logging in to Job Scheduling Console

Have you been able to log in to JSC using this id before?  Have you requested access from support?  Are you using the right login id?  Maestro uses your Unix id and password.  (See Section I below)

 

Viewing Schedules and jobs from earlier days using Job Scheduling Console

If you cannot select “alternate plan,” ensure current JSC version is installed.

(See Section I below)

 

Calendar dates are not correctly set or schedules are “ignoring the calendar”

Many problems with correctly setting a calendar result from use of earlier versions of the JSC.  (See Section I below.)

 

I use Job Scheduling Console at home, but cannot read job logs

This is a known problem and is fixed in JSC version 1.4.  Install Version 1.4  

 

I cannot submit an ad hoc job

Tivoli is working on this.  Submitting some ad hoc jobs is a known problem. The workaround is to create a job, which executes your command and submit it using the dropdown menu. (See Section II below)

 

A job I created earlier is no longer listed in the JSC display of jobs

 Are you looking in the right place?  Are you filtering out certain job names with a column filter or with a personal “plan” or “database list?”  Are you looking at the current day’s schedules and jobs?  (See Section II below)

 

I can’t create/save a job

Have you filled in all the required fields?  Are you trying to save an NT script to a Unix box?  Do you have appropriate access rights to create the job? (See Section II below)

 

I can’t save a job stream I just created

Have you filled in all the required fields?  (See Section II below)

 

 

Problems with Running Jobs

 

My script doesn’t run correctly in Maestro, but when I run it from my desktop it works fine

Your script needs to establish the same environment as you have when you run from the desktop.  (See Section III below)

 

My job didn’t run as expected, how can I tell if the problem is related to Maestro?

Browse the job log. If a job log exists at all, almost certainly the problem is with your script.  If there is no job log and the status is “error” this is almost certainly an access problem – check that your id has access to every thing it needs, check that what it is looking for actually exists.   (See Section III below)

 

My job has run successfully in past schedules, now it won’t run

Even though you and Maestro support have changed nothing, other groups make changes that can affect the running of your job.  Always check that permissions to directories/files are still what you expect.  (See Section III below)

 

My job has been running for hours and it usually takes five minutes

Check “Status of all Workstations.”  If your server is unlinked or jobman isn’t running, there will be no reporting from it until it is re-linked, which mostly happens spontaneously when the server itself is running.  (See Section III below)

 

My job abended but Maestro indicates it was successful

Maestro only detects abends if the script it is running sends a non zero completion code.

(See Section IV below)

 

My job abended but job log is empty or was not created

No job log indicates that Maestro cannot run the job because of access problems.  Looked to see if your job “failed” rather than abended.  Be sure your job/login id has appropriate access.  (See Section IV below)

 

 

 

 

For a more detailed explanation of the above problems read the appropriate section below.

 

 

 

 

 

 

Problems Details

 

Section I – Job Scheduling Console

 

 

I can’t log into Job Scheduling Console

 

You will not be able to log into Job Scheduling Console if you have not first established your need for access by contacting Maestro support.  If this has already been set up for you, AND you cannot get beyond the login window you should be SURE that you are using an appropriate login id or password.  Maestro knows you by your Unix id and password.  REMOTE CONSOLE or GCONMAN IS UNSUPPORTED NO ONE SHOULD BE USING IT

 

I can’t see previous schedules and jobs using Job Scheduling Console

 

You may look at schedules and jobs from earlier dates up to approximately 60 days back using ‘set alternate plan’ from the main drop down menu.  If this is not available to you, install the current JSC (version 1.4) which may be downloaded and installed from:  \\files\software\Maestro\JSConsole_Install\Windows\JSC-1.4\windows\installer

 

Calendar dates are not correctly set or schedules are “ignoring the calendar”

Ensure you are running the current version of the JSC.  As of Feb 8 2006 the current version is JSC 1.4

 

I use Job Scheduling Console at home, but cannot read job logs

This is a known problem and is fixed in JSC version 1.4 – JSC v 1.4 will be available in early 2006.

 

 

Section II – Job Creation and Submission

 

I cannot submit an ad hoc job

 

This is a known problem in TWS 8.1 and was recorded as APAR IY50637 on 11/6/03.  Nonetheless, you can work around this and still submit a newly created job simply by selecting “submit|job” from the drop down menu in JSC.

 

A job I created earlier is no longer listed in the JSC display of jobs

 

  • Be sure you have selected the right environment (Maestrodev or Maestrosrv). 
  • Be sure you are not filtering the job out of your view by having set a column filter or by way of a personal “plan” or “database list.”
  • Be sure you are looking at the current plan for the day.

 

 

I can’t create/save a job

 

(i)         If you are trying to create a job and the cpu you want to run it on is not in the select list, you need to determine whether the type of job you are running (NT/Unix script or command) matches the operating system of the box you want to run it on.  Maestro knows, for example, that a Unix script won’t run on a Windows box, so it doesn’t give you the choice.

 

(ii)        If the following window comes up, it is a Maestro security violation. 

 

 

You will be correctly denied if you are trying to add, modify, delete etc a job or other object that you DO NOT have rights to.  For example, if you have rights to MB jobs this does NOT entitle you to operate on LR jobs.  This is true across the board.  You cannot save a job with a joblogin id that you do not have rights to.  If you believe that you should have access contact Maestro Support.                                

 

 

 

 

I can’t save a job stream I just created

 

Assuming that your security rights have already been set up, this is likely to be caused by not selecting all the requirements to create a Job Stream. 

  • A job stream must have at least one job in it. 
  • It must have a dependency of some sort – this is most often a time dependency - you need to select a start time (see below).
  • It may have other dependencies, for example the existence of a file or the availability or a resource. 
  • Finally you must either select a calendar OR make it “on request.”

 

 

 

Section III – Jobs do not run as expected

 

My script doesn’t run correctly in Maestro, but when I run it from my desktop it works fine

 

This is a problem with your script.  Maestro cannot anticipate all the environment variables that a particular shell script might need, so it provides a very minimal set.  When you login from your desktop to a particular server, depending on the user name and the default shell you have, Unix gives you a default environment or it executes a .login, .profile or .cshrc file if you have one. This provides you with the environment set up by these files.  For your script to run successfully in Maestro, you MUST ensure the same environment is set in your script.  To determine what the environment is, run “env” from the command line (for Windows scripts you need to run the “set” command.)  Then put the same command in your script.  Compare the output of the two, find the relevant differences and ensure that your script sets up this same environment. 

A further indication that Maestro is not involved is to look and see if other jobs on the same fta are completing successfully.

 

 

My job didn’t run as expected, how can I tell if the problem is related to Maestro?

 

A simple way to determine if Maestro has passed control to your script is to “browse the job log.”  If a job log exists at all, almost certainly your script got control and the problem is with the script or the resources it calls upon. 

  • One common problem is that the job loginid does not have access to a resource used in the script – for example a user does not have permissions to a directory or file.  This is not a problem that Maestro support can fix.
  • Another is that the job status is “error” and the internal status is “failed” and no job log is created. This usually indicates that Maestro could not even log the user on to the FTA.  Most often this is a call to the Unix or Windows team to ensure that the user exists on the box and that he/she has appropriate rights.
  • Remember – Maestro receives messages from your application AND from the operating system and in turn writes them to your job log. It is important to determine, in any way you can, where the message is from.

In any event ALWAYS scan the job log if there is one, often there is a message indicating the problem, although not always a helpful one.  You should include any such messages in your request for Maestro support.

 

 

My job has always run successfully in past schedules, now it won’t run

 

It is a frequent observation of customers that nothing has changed and that the job has run successfully every day for a long period (sometimes months/years) but suddenly it won’t run.

The first thing to check is the job and job stream priority. If either or both of these are zero, your job or job stream will not run.  Check with your team mates first: there may be a good reason why someone doesn’t want the job to run.  Next, bear in mind that neither you, nor the Maestro administrator, may have changed anything, but all kinds of external changes are possible.  For example: upgrades to software of all kinds including applications, Unix/Solaris patching etc, file systems run out of space, cpu’s get overloaded, permissions changes are requested or made (sometimes by a fellow team member), changes are made in security policy and so on. Try to determine if something of this sort has happened. Remember Maestro changes infrequently and it is support policy to give users ample advance notice of any Maestro changes.

 

 

 

 

My job has been running for hours and it usually takes five minutes

 

In the left hand panel on the JSC click on the environment you are working in (Maestrodev or Maestrosrv.)   Click on Default Plan Lists and then click on “Status of all Workstations.” Communication has been temporarily lost if, for the FTA your job is running on, the Jobman Running and the Link Status columns indicate anything other than “Yes” and “LINKED” respectively. 

 

 

 

 

NOTE:

  1. FTA’s almost routinely go into unlinked state for short periods.  THIS IS NOT A PROBLEM.
  2. Even when your FTA is unlinked, your job has most likely run successfully but because communication has been lost it has been unable to report its status to the master.  FTA’s run completely independently after they have received their schedules from the start of day job which runs on the master (at 7.30am each day)
  3. Maestro support monitors all FTA’s and is aware of any that remain unlinked for over an hour.

 

 

 

 

 

Section IV – Job Abends

 

My job abended but Maestro indicates it was successful

 

Many customers have asked why Maestro doesn’t mark the job status “abend” when a script terminates abnormally.  Remember, once a job has started, Maestro just waits for a return code.  It does not “control” anything about your job while it’s running.  Maestro depends on return codes from jobs to determine status.  In general, some sense of the problem should be evident from the job log.

 

When a script abends, Maestro will only detect this IF you have trapped the error in your script and returned a non-zero return code. If a particular command fails, trap and test the return code issue the command “exit” with a numeric parameter (example: exit 5.)  It would be also be a good idea to write a meaningful text message to stdout just before exiting.  Note too that some system processes will write error messages to the Job log and also indicate the cause of the abend, but Maestro does NOT distinguish these messages from the messages you have coded in your script. 

 

 

My job abended but job log is empty or was not created. 

 

If there is no job log, it essentially means that Maestro was unable to run the job.  There are exceptions but often the status is “error” and  failed.”  This is usually NOT a Maestro problem but generally means a problem with permissions, authentication or access.

 



[1] Tivoli Workload Scheduler (TWS) is the most recent IBM name for this product, but since it is still known as Maestro by almost all Princeton users, this document will use the name Maestro.