Aug 11
It’s challenging to hire a great ops person. How do you judge in an interview?
If I want an A player, I ask hard questions, set high expectations and request their commitment to the teams needs. Even then I make mistakes, but at least I have their commitment. I can work with that if their performance has to be brought up.
Here’s who I have in mind when I’m interviewing:
- They know they can’t win the first time, they’ll keep trying.
- They are not looking to change careers or directions – they should have been customer service oriented technical people for last 7 years at least.
- They hate work and dedicate themselves to eliminating it.
- They love people.
- They are students of their work.
- They are disciplined and self-motivated
- They accept work – this is an all-day every-day job. We have a “No Slashdot” policy.
- If you don’t have something else to do you are required to ask the NOC if you can help with a problem or same day ticket
- Do customer follow up mails.
- Eliminate work for someone
i. Write a script
ii. Write a tool that lets business users see data they’ve never seen
- “Widen the Moat” so we can make gains against downtime.
- Attempt to reduce the complexity of something to the appliance level.
- Read customer service and management books.
- Make (or review) a list of people you owe things.
i. Ask them if they are getting what they need.
ii. Ask them how you’ll know when they have exactly what they need.
iii. If there is no clear way to deliver what they want – tell them so.
iv. If you are going to drop their concern – let them know.
- Put goals, meeting and other important work things on the calendar for the next month, quarter and year.
- Invent drills to make sure we are where are where we should be in emergency response and disaster recovery.
- Write a new monitor and figure out how to make it supportable forever.
- Bad uses of time:
- Changing the degree of transparency of you xterms for 100th time.
- Second guessing management any more than one level above you.
- Pretending you can’t affect the direction of the management one level above you.
- Internet reading – even if it’s “background” research on technical things (digg, Slashdot, boing-boing, etc have nothing to do with what we do).
- Writing documentation no one but you can interpret.
written by admin
Aug 11
I started this blog because I work in a niche area of a niche area of technology – the commercial application of UNIX/Linux compute farms. I was frustrated that what is obvious to me about how to run operations is difficult to communicate to my clients, who are often in operational breakdown.
Because I’m a consultant, I deal with two audiences, the executive who hired me, and the operations staff who I work with.
From the executive side, I make the case that operations is the critical part of their business. My favorite way to say it is: Success is not writing software, it’s running it. They agree, but are often not positioning operations to succeed.
From the staff side, I make the case that capturing everything they do in scripts and a diary will pay off a hundred fold. They agree, but often don’t want to foul up the carefully crafted way they do things.
This blog is for both of you. I have year’s worth of tech tricks, short-cuts and solutions for people who live on the command-line. For managers and executives I have 25+ UNIX shops worth of things that never work.
It’s my intention to provide you information and techniques that will make a difference for your career or business.
-Tony
written by admin
Aug 07
The gsh package comes with ‘ghosts’, a command that generates a list of machines in the class you specify – but nothing else. Using it you can create a “scp to class” command. Just for conventions sake I call it ‘gcp’.
Usage:
gcp
Usage example:
gcp SNPcluster /tmp/new_etc_hosts /etc/hosts
#!/bin/bash
# get the path of ghosts so we can confirm we have ghosts on the #machine
GHOSTS=`type -p ghosts`
if [ -z $GHOSTS ]
then
echo “$0 required ‘ghosts’ to work. Please install it and try again”
echo “exiting”
exit
else
# make sure we have 3 arguments
if [ ! -z $1 -a ! -z $2 -a ! -z $3 ]; then
for i in `ghosts $1`;
do
echo scping to $2 to $i:/$3;
scp $2 $i:$3;
done;
else
echo “need 3 arguments. $0 “;
fi
fi
written by admin
Aug 07
The key to working with a large compute farm is to setup it up to run as single machine or a few classes of machine.
- ‘gsh’ is a tool that lets you send a command to multiple machines:
-
http://outflux.net/unix/software/gsh/download/gsh-1.0.2.tar.gz
Here’s the gsh command line to setup the nagios clients on a class of machines at the same time (see nagios client post):
gsh cluster+cf01 “ln -s /bin/bash /bin/rbash ; mkdir -p /usr/local/nagios/libexec ; useradd -d /usr/local/nagios/libexec -s /bin/rbash nagios ; chown root.nagios /usr/local/nagios/libexec ;chmod 750 /usr/local/nagios/libexec”
To use gsh effectively you need to use ssh-agent or open ssh keys. I use ssh-agent and am comfortable with its security implications. If your site doesn’t allow any kind of auto-login process you’ll be stuck entering each passwd, however gsh assures exactly the same command is executed on each machine.
written by admin