Preparing Textual Data for the General Inquirer

The general-purpose Java version of the General Inquirer processes all the text within each of the files contained in a specific folder. An output record of tag counts is made for each file, which can then be a row in a statistical spreadsheet.

  • Files should be edited to have any content removed that should not be part of the analysis.
  • Information about each file should be in the file name.
  • The file names within each folder should have the same format.

The General Inquirer creates columns for this file name identification information in the output spreadsheet according to the following procedure:

1) All the characters in a file name starting with the first period are removed.
For example, ".txt" and ".doc" will be removed.

2) Each word in a file name (separated by spaces) is given a separate ID field.

3) The ID fields are labeled ID1, ID2, etc.

It may be helpful to rename these columns with more descriptive labels later on the statistical spreadsheet.

4) For the last word in the file name (which may be the only word if there is but one):

The computer tests to see if it begins with a character.
If it does, it then looks for a digit in the word.

If a digit is found, then all characters up to the digit are made into one ID field and the characters starting with the digit are made a second ID field.

Some examples:

bush speech defense1

is made into 4 ID fields for the candidate name, the type of document, the topic, and the serial number within that group: (1) bush, (2) speech, (3) defense, (4) 1

UMIN 0225.txt

will have the ".txt" removed and be made into two fields, one for Univ. of Minnesota, the second for the newspaper date (February 25). The date field may be further recoded into groupings by the statistical software.


will have the ".TXT" removed and separated into two fields, "DH" for a high performer and "134" for the respondent's ID number.


will similarly be two fields, with the "C" for conservative party and "87" indicating the year of the party manifesto.


If more identification information is needed than can be fit into a file name (because of the restrictions on file name length in your system) then the file name should contain a unique ID that can be linked to a row in a spreadsheet, for later merging with the Inquirer's output spreadsheet.


Special versions of the Inquirer also are operational:

1) For open-ended responses contained in a column of Excel spreadsheet cells:

The Excel spreadsheet is saved as a tab-delimited file. The Inquirer processes each cell in the specified column as a unit and produces output records of tag counts for each cell. A single cell can contain up to 32,000 characters of text or more than 3000 words.

2) For real-time feedback:

Example: The computer presents a "TAT" picture and asks the respondent to tell a story about the picture that has a beginning, middle and end. The computer then gives instant feedback to the respondent about the story. This feedback can given right over the internet.


Return to Home Page