Free Text Database specs:
Here is what I want in a database:
Format:
Complete free text input. There are just two concepts: paragraphs
and sentences. Sentences are separated by periods, and are
preferably short, stating one fact each. All of the sentences in a
single paragraph are assumed to be related. Paragraphs are separated
by an extra blank line.
For example:
His name is Steve
Bush, but also known as Burnin' to his friends. His car is a
miata. Hair color used to be dark black. Cell phone # is
123-45678. Works at Solutionware. Writes computer programs for a
living. Wants to buy a digital camera. His main address is 1583
Curtner Ave, San Jose, CA 95125. In case of emergency call John
Smith at 408-273-6097. Home phone is (408)469-4907. Favorite
actress is Jennifer Garner.
Now this person's name is John James. His address is 1234 North
Back St, Hollywood, CA. He doesn't like Jennifer Garner.
And that is all there is to it! The input file is just a set of
paragraphs, each containing a set of sentences. Sentences are
defined by periods, paragraphs by the extra blank line between them.
User interface:
We start with a single "Find box":
Find:
|
words required in
sentence:
|
|
|
|
______________________
|
-
|
+
|
Entering any set of words in the blank space means find a
single sentence that contain all of those words. For
example:
Find:
|
words required in
sentence:
|
|
|
|
hair color black
|
-
|
+
|
This would ask to search for a single sentence which contains all
three of the words, hair, color, and black.
Clicking on the "+" sign adds another box, which is automatically
labelled "AND":
Find:
|
words required in
sentence:
|
|
|
|
hair color black
|
-
|
+
|
AND
|
______________________ |
-
|
+
|
Entering data in this second box adds a second requirement, which
specifies that another sentence must be found in the SAME
paragraph, with these new words. For example,
Find:
|
words required in
sentence:
|
|
|
|
hair color black
|
-
|
+
|
AND
|
address san jose
|
-
|
+
|
The above Find request specifies that a matching paragraph must
contain a sentence with the words hair, color, and black, and must
also contain a sentence with the words address, san, and jose in it.
In other words, it finds all paragraphs regarding black-haired
people who live in San Jose.
In other words, each line specifies a group of words which must
appear together in one sentence, and the various lines specify
various sentences which must appear together in a single paragraph.
The first paragraph matching this form would appear in a large box
on the screen below the find boxes. If there was more than one, the
first would be shown and the number of matching paragraphs would
also be shown, and there would be a place to click NEXT/PREVIOUS etc
to cycle through them.
Printing reports:
In addition to NEXT/PREVIOUS is PRINT. It simply prints the results
of the find.
But we also have a second type of form called a Print form. This
form looks much the same as the Find form, and has the purpose of
shortening the printout by only printing some sentences from each
matching paragraph rather than all of the paragraph.
You do the FIND first, filling out that form and getting a result.
Then instead of hitting PRINT, you bring up the PRINT form. It
reduces the printout to a more condensed format.
For example, your find all people with black hair who live in San
Jose, as in the example above, but you just want to print their name
and phone number, not their entire paragraph of info. We use the
Find form first, as above, then bring up the Print form and do:
Print:
|
words in sentence:
|
Shorten?
|
Sort by?
|
|
|
|
name
|
|
|
-
|
+
|
|
phone number
|
|
|
-
|
+
|
Here, we search through only the found paragraphs, for a sentence
with the word "name" in it, and print only that sentence. And also
look for a sentence with the words phone number in it and print only
that sentence. (Or, if both name and phone number are in the same
sentence, print just that sentence).
Run on the example paragraph way above, that is:
His name is Steve Bush, but also known as Burnin' to his
friends. His car is a miata. Hair color used to be dark black.
Cell phone # is 123-45678. Works at Solutionware. Writes computer
programs for a living. Wants to buy a digital camera. His main
address is 1583 Curtner Ave, San Jose, CA 95125. In case of
emergency call John Smith at 408-273-6097. Home phone is
(408)469-4907. Favorite actress is Jennifer Garner.
This PRINT form would produce:
Name is Steve Bush, but also known
as Burnin' to his friends.
Cell phone # is 123-45678.
Home phone is (408)469-4907.
Now, if we add an "X" in the Shorten box, like this:
Print:
|
words in sentence:
|
Shorten?
|
Sort by?
|
|
|
|
name
|
X |
X
|
-
|
+
|
|
phone number
|
X
|
|
-
|
+
|
the program uses an intelligent algorithm to separate the actual
name or phone number from the rest of the words in the sentence and
so prints only:
Steve Bush. 123-45678. (408)469-4907
And also, since we added an "X" in the Sort box, it sorta all the
found paragraphs by that name.
And that is the entirety of the program that I want!!
Thanks for listening,
Steve Bush.