Locally Developed Programs (mostly) on
ISR
One frustration in learning a new Unix system is getting
information on local commands, those add-ons not in the general
books. Here is a list of such commands on this system. Use the
"man" pages to learn more about the commands, e.g.
"man abstr", talk with Cheri or Don in 548/550 or
email
soc-help
Note that since the majority of these commands relate to data
processing, they are available on ISR2 and are unlikely to be
available on WJH. Also these programs are not exclusively Unix
facilities; some newer ones are either PC or web-based.
While categories of programs may be fuzzy, you might think of
these as relating roughly to data extraction,
text processing, queuing
jobs, program interfaces, web work and miscellaneous.
Dataset related:
- abstr abstracts requested columns from a
dataset with fixed length records.
- habstr abstracts and rectangularizes
hierarchical datafiles, allowing user-specified case
selection. (User guide
available)
- sabstr,shabstr are recent rewrites of the above
two extractors which read from the standard input, i.e.
from a redirection arrow. This allows them to extract
directly from compressed datasets, .z and .gz files,
without uncompressing the datasets first. Check the man
pages on either sabstr or shabstr for detail.
- nlabstr abstracts requested columns from
new-line delineated file, allowing user-specified case
selection and output as fixed or delimited file.
- HTdoc_prep.pl, HTdoc_sel.pl provides a
web interface for use of above extracting programs. Check for
detail below.
- desunbox is specific to
extracts drawn using the Census Data
Extraction Service. One of the DES output options is
a "box1" in which data and documentation are
bundled together in compact form. Desunbox separates the
two and provides a routine ascii data file and an SPSS
control file for processing it.
- chkfile checks a file for non-ascii
characters and lines longer than eighty, or a
user-specified number, of columns
- comdelim abstracts or simply reformats
comma-delimited files to fixed format; handles missing
data carefully. (Useful for moving from spreadsheets to
statistical packages -- generally outclassed by
Stat-transfer at this point.)
- prtdata prints datasets with a ruler so
that specific columns can be identified easily -- prints
80, 100 or 132 column lines, wrapping long lines under
the ruler until complete.
- psidsel, psidabstr, psidutil
set of programs for working with Panel Survey of
Income Dynamics. These programs were in active use on
the old ISR and have been ported to the new system.
However they have not been updated and do not contain the
most recent years of PSID survey, largely because of
declining use here of the PSID and because the PSID
website is gradually replacing their functionality.
However if you are drawing large extracts and need the
case selection possibilities of this software or your
extracts are timing out badly on the website, contact
Cheri Minton. You may want to use the facilities here for
the bulk of your extract and then match on the final
years from a web extract.
- spxabs, spxmake
programs for utilizing online documentation in a
standardized format to create extract information and
SPSSX data descriptions for subsets of the dataset. ( inf2stata
uses this online documentation for creating stata data
descriptions.)
- work history
manager is not a polished effort, rather a way of
using SPSS to deal with the NLSY work history files.
Developed in conjunction with Lin Tao, and tailored to
his needs, the program could be respecified for other
projects, building on what was learned.
- cps case matching is another work in
progress, done with Joe Swingle. Again, subsequent work
could piggyback on what was learned.
Text Processing Related:
- keygrep pulls paragraphs from a set of
text files according to requested keys, either present in
text lines, or entered on colon-lineated keylines added
at the end of paragraphs specifically for organization of
text.
- keylist counts colon-delineated keys in the
requested set of files.
- (Keygrep and keylist have been largely superceded
by ATLASti; however they are not incompatible
with this higher powered program and files
prepared and coded for them would flow very
smoothly into ATLASti for additional work)
- Two projects in progress -- see Cheri Minton if
either is of interest -- with a few more users, these
will be installed on the system, but meanwhile, your
needs will influence their development. Both were written
to accomodate undergrad thesis writers.
- recode_strings -- a set of two routines
for helping you develop a coding scheme for a
large group of slightly open-ended responses,
e.g. occupation, goals in life, religious
affiliation -- beginning with a spreadsheet of
one or more columns of responses to the same
question, e.g. religious affiliation, you are
aided in working through to a numeric coding in
which you may be combining any number of the
original responses. Once your code is developed,
you can then "recode" your variables
according to your coding scheme so that you have
numeric data for your statistical analysis.
- comments by var and by case -- Having
entered a case number and open-ended comments
into a spreadsheet as a set of variables, perhaps
in a sheet accompanying a set of quantitative
variables., you can now arrange these comments by
variable with the case number attached (i.e. all
comments on one topic together) and by case with
the variable name attached. Starting these
comments in a spreadsheet will keep you in a
position to work with them as data.
Queue related:
- qspss, qstata, qsas run a job of the corresponding
program in background and allow you to submit multiple
jobs which will be run sequentially. For more information
on these programs, see the Stats
on ISR2 memo.
Program Interfaces:
( programs that ease use of specific programs in use in the
department)
- netprep. provides a user-friendly format
for data bound for UCInet and KrackPlot.
- dnwrkdsk provides a graphical interface and
command preprocessor for Shelly Haberman's fortran
program, "dnewton".
Web Work:
Miscellaneous:
- tapemap, writesl catalogs and writes an
IBM standard labelled tape. Note that all IBM formats are
not available. The tapes written are fixed, blocked
EBCIDIC tapes with standard labels. Similar programs
exist for ANSI standard label tapes. (atapemap and
writeal) -- Given the loss of tape facilities on ISR,
these programs have not yet been ported. When and if we
again have tape drives, the local access programs will be
ported to the new system.
- usage checks user's use of system
resources, esp. printing charges.
Comments or questions? Write
soc-help!
or phone 5-4751 or drop by 544/548.