SCN: Responsibility (was: Re: A Preliminary Review of SSL)

Rod Clark bb615 at scn.org
Sat Mar 4 03:58:44 PST 2000


> If the users on /home0 are pinched for space, why don't they just
> delete (or compress) a few files? Why can't these users just work
> matters out themselves?

JJ, 

   The users as an amorphous group of people can't do system
administration. We actually have gobs of space. It just isn't
being allocated as usefully as it could be, if we had more
system administration going on to keep things well maintained.

   It mostly comes down to not having enough sysadmin time,
which on a system this size means not enough people. You've said
that you're overbooked, and everyone knows that's true, and
sympathizes with that. So there's a stopgap fix on this now,
when a more long-term answer like moving the partition to a
partition that is twice as large would have taken more time.

   This is typical of the kind of system administration that has
accumulated on SCN for years. It has got us into the situation
that not long ago, some very good people (Cere and Steve) who
are used to working on standard well maintained systems, had to
spend about 100 hours fighting all the hundreds of little
problems that prevented them from installing some software that
would have been of great benefit to SCN to install, if they
could have completed doing it. It should have been quick and
easy, but instead it was a nightmare for them, and both of them
were burned out by the experience to the extent that they took a
six month hiatus from trying to work with this on SCN's system
again.

> (Mind that this one particular problem stands proxy to how to deal
> with problems in general on SCN.)

   That's a pithy and relevant observation. Because of not ever
having time to do it right, we have taken years now to build one
of the most band-aid ridden and nonsensically accreted together
systems that I've ever seen.

   And this is coming home to roost. It means that we can't find
volunteers now who can step in and install on SCN complex
software that they're familiar with, because of a myriad of
issues that aren't right with our system. It means that in the
rush of many different people getting many, many things
hurriedly patched up in the short run, there has been a lack of
overall system administration of the kind that always keeps in
mind the need to maintain standardization and adherence to best
practices over time, and along with it enough people management
so that various people don't keep doing stuff that doesn't
accord with that.
   
   So with Randy stepping down, and a new system to design and
keep in good order, we need a new head system administrator, and
some planning. You can help a great deal, but I believe you
haven't been responsible for a large system before where these
issues become very important over time, and because of that your
comments in this thread reflect good intentions combined with a
lesser amount of supervisory experience. At this point we need
someone who has that. So we need to recruit someone, unless one
of the more experienced commercial or university sysadmins here
(Troy, Bob H., Scot, Steve G., - it's 3:30 AM and I must have
forgotten some others) magically has enough time to devote to
this.

   Rhodes is working on this and many other things, and
commented at Excomm that he wished he could clone himself. So
could some of you consult with Rhodes and help with whatever
particular recruiting plan he is drawing up?

> (Also [**Rod! please note!**] I have just deleted a few files that no one
> is likely to miss, freeing up another one percent of space, so this is
> _not_ an immediate problem.)

   It will be as soon as webadm tries to untar that file again.
I still can't do that without filling all available space on
home0. This is now back to exactly where it was a few days ago
when we were having the initial problems with it.

   /dev/sd5g             963223  811169   55732    94%    /home0

   Do whatever makes most sense in the short run. But as you
noted above, in a wider perspective this is not a short run
problem, and we do need Operations management who has (or have)
a coherent plan to take SCN's systems from this kind of
situation to a well managed one.

Rod

* * * * * * * * * * * * * *  From the Listowner  * * * * * * * * * * * *
.	To unsubscribe from this list, send a message to:
majordomo at scn.org		In the body of the message, type:
unsubscribe scn
==== Messages posted on this list are also available on the web at: ====
* * * * * * *     http://www.scn.org/volunteers/scn-l/     * * * * * * *



More information about the scn mailing list