Morphemes
|
Words have meaningful sub-parts that aren't all words:
unfamiliarity |
un | familiar | ity |
|
|
undeniability |
un | deny | able | ity |
|
|
Morpheme
Structure
|
Right |
Wrong |
deny |
deny | able |
un | deny | able |
un | deny | able | ity |
undeniability |
|
deny |
un | deny |
un | deny | able |
un | deny | able | ity |
undeniability |
|
|
Stems
|
walked |
walk | ed |
Stem | Suffix |
Stem may define a word that has multiple forms:
Stem may be more than one morpheme:
formalized |
formal | ed |
Stem | Suffix |
form | al | ed |
|
Morphological
Analyzer
Versus
Stemmer
|
Two kinds of programs:
- Morphological Analyzer: Not only takes a word apart into its
component "atoms" (morphemes), but also produces the right
"structure".
Right: formalize = [[form + al] + ize]
Wrong: formalize = [form + [al + ize]]
- Stemmer: Finds stems
formalize => formal
formalized => formalize
formalized => formal
|
English Verb Forms
(Stemming)
|
  | Form | Stem |
Present
Participle | walking | walk |
Past
Participle | walked | walk |
Past Form | walked | walk |
Present
Tense | walks | walk |
|
  | Form | Stem |
breaking
  | break |
broke
  | break |
broken
  | break |
breaks
  | break |
|
  | Form | Stem |
bleeding
  | bleed |
bled
  | bleed |
bled
  | bleed |
bleeds
  | bleed |
|
|
Inflectional
Morphology
|
Inflectional morphology is about the different forms of a
single word:
Inflectional Verb
break |
-> | breaks | 3rd person present tense singular |
breaking | present participle |
broken | present participle |
break | 3rd person present tense plural |
broke | past tense |
Inflectional Noun
cat |
-> | cat | Singular |
cats | Singular |
Inflectional Adjective
happy |
-> | happy | Positive |
happier | Comparative |
happiest | Superlative |
|
Derivational
Morphology
|
Dervivational morphology is about making new
words.
relational | -> | relate |
conditional | -> | condition |
digitizer | -> | digitize |
predication | -> | predicate |
hopefulness | -> | hopeful |
feudalism | -> | feudal |
formalize | -> | formal |
feudalism | -> | feudal |
allowance | -> | allow |
effective | -> | effect |
|
Measure of
a Word
|
- Let vowels be:
A E I O U and Y (after non- A,E,I,O,U)
try: "y" is a vowel
bay: "y" is not a vowel
- Let consonants be anything other than a vowel.
- Let V stand for any sequence of vowels.
- Let C stand for any sequence of consonants.
- We can represent any word by the pattern:
(C)(VC)m(V)
Parentheses here mean optionality. That is words can skip
the initial consonant sequence C and start witha vowel
sequence("at"), they can skip the VC sequence and
go directly to a final vowel ("tree"), or
they skip the final vowel ("cat").
Here "m" is Kleene-operator m. We allow any number
of VC sequences ( each a "syllable").
- WE call the value of "m" for a particular
word its measure.
Some of examples of measure:
m=0 | tr, ee,tree,by |
m=1 | trouble,oats,trees,ivy |
m=2 | troubles,private,oaten,orrery,biases |
m=3 | intrusion,orreries |
|
Algorithm
Notation
|
m | the measure of the stem |
*S | the stem ends with S (and similarly for
other letters) |
*v* | The stem contains a vowel |
*d | the stem ends with a double consonant (e.g. -TT or -SS) |
*o | the stem ends in CV1C, where the second C is
not W, X, or Y (e.g. -WIL, -HOP)
|
|
Stages of
Algorithm
|
- Plural Nouns and Third Person Singular verbs
(-s ending)
cats | => | cat |
walks | => | walk |
- Verbal Past tense and progressive forms
walked | => | walk |
walking | => | walk |
- Y => I
happy => happi
- Derivational Morphology I:
Multiple suffixes (-TIONAL, -ENCY, -ALLY, -IZATION,
-ALISM.etc.)
- Derivational Morphology II:
More Multiple suffixes (-ICATE, -ATIVE, .etc.)
- Derivational Morphology III:
Single suffixes (-AL,-ANCE,-ENCE, -IC,-ABLE)
- Cleanup
|
Stage I:
Third Present Verbs,
Plural Noun
|
No conditions for these rules:
Only one rule
per set may apply to a given word.
The rule with the LONGEST MATCH should apply.
This is what explains rule (c) below.
Any word that matches (c) will not undergo (d).
 
|
Rule
|
Example
|
(a)
|
SSES->SS
|
caresses => carress
|
(b)
|
IES->I
|
ponies=> poni
ties=> ti
|
(c)
|
SS->SS
|
ponies=> poni
ties=> ti
|
(d)
|
S->eps
|
cats=> cat
|
|
Stage IIa
Past and
Progressive
Verbs
|
 
|
Condition
|
Rule
|
Example
|
(a)
|
(m > 1)
|
EED->EE
|
feed => feed
agreed => agree
|
(b)
|
(*v*)
|
ED->eps
|
plastered=> plaster
bled => bled
|
(c)
|
(*v*)
|
ING->eps
|
try=> try
spring => spring
|
|
Stage IIb
Cleanup
Consonant
Doubling,
E-insertion
|
 
|
Condition
|
Rule
|
Example
|
(a)
|
 
|
AT => ATE
|
conflat(ed) => conflate
|
(b)
|
 
|
BL->BLE
|
troubl(ing) => trouble
|
(c)
|
 
|
IZ->IZE
|
siz(ed) => size
|
(d)
|
(*d & !(*L or *S or *Z))
|
CC -> C
|
hopp(ing) => hop
tann(ing) => tan
fall(ing) => fall
hiss(ing) => hiss
fizz(ing) => fizz
|
(e)
|
(m=1 & *o)
|
eps -> e
|
fail(ing) => fail
fil(ing) => file
|
|
Stage III:
Y => I
|
(*v*)
|
Y -> I
|
happy => happi
sky => sky
|
|
Stage IV:
Suffixes
|
Condition
|
Rule
|
Examples
|
(m > 0)
|
ATIONAL => ATE
|
relational => relate
|
(m > 0)
|
ENCI => ENCE
|
valenci => valence
|
(m > 0)
|
IZER => IZE
|
digitizer => digitize
|
(m > 0)
|
ABLI => ABLE
|
comfortably => comfortable
|
(m > 0)
|
ALLI => AL
|
radicalli
=>
radical
|
(m > 0)
|
ENTLI => ENT
|
differentli
=>
different
|
(m > 0)
|
IZATION => IZE
|
vietnamization
=>
vietnamize
|
(m > 0)
|
ELI => E
|
vileli
=>
vile
|
(m > 0)
|
OUSLI => OUS
|
analogousli
=>
analogous
|
(m > 0)
|
ATION => ATE
|
predication
=>
predicate
|
(m > 0)
|
ATOR => ATE
|
operator
=>
operate
|
(m > 0)
|
ALISM => AL
|
feudalism
=>
feudal
|
(m > 0)
|
IVENESS => IVE
|
decisiveness
=>
decisive
|
(m > 0)
|
FULNESS => FUL
|
hopefulness
=>
hopeful
|
(m > 0)
|
OUSNESS => OUS
|
callousness
=>
callous
|
(m > 0)
|
ALITI => AL
|
formaliti
=>
formal
|
(m > 0)
|
IVITI => IVE
|
sensitiviti
=>
sensitive
|
(m > 0)
|
BILITI => BLE
|
sensibility
=>
sensible
|
|
Stage V:
Suffixes
|
Condition
|
Rule
|
Examples
|
(m > 0)
|
ICATE => IC
|
triplicate => triplic
|
(m > 0)
|
ATIVE => eps
|
formative => form
|
(m > 0)
|
ALIZE => AL
|
formalize => form
|
(m > 0)
|
ICITI => IC
|
electriciti => electric
|
(m > 0)
|
FUL => eps
|
hopeful => hope
|
(m > 0)
|
NESS => eps
|
goodness => good
|
|