A Sentence Reduction in Modern English
Opening of determination of suggestion in English and analysis of his structure. Structural elements of suggestion and his decline are in modern English. Methods of decline of suggestion by the abstract of control of syntax and model of decision tree.
Рубрика | Иностранные языки и языкознание |
Вид | курсовая работа |
Язык | английский |
Дата добавления | 02.02.2011 |
Размер файла | 21,2 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru/
5
MINISTRY OF EDUCATION AND SCIENCE OF UKRAINE
IVAN FRANKO NATIONAL UNIVERSITY OF LVIV
ENGLISH DEPARTMENT
A Sentence Reduction in Modern English
COURSE PAPER
PRESENTED BY
Nadia Oleksyuk
a fourth year student
of the English department
SUPERVISED BY
Budna Maria Vasylivna
an associate professor
of the English department
LVIV 2010
CONTENTS
I. Introduction
II. The Main Part
1. The Sentence
2. Structure of English Sentence
3. A sentence Reduction
3.1 A Sentence Reduction Using Syntax Control Abstract
3.2 A New Sentence Reduction based on Decisions tree model
III. Conclusion
IV. References
Introduction
The theme of my course paper sounds as following: “A sentence Reduction in Modern English”. This qualification work can be characterized by the following:
When studying the structure of a unit, we find out its components, mostly units of the next lower level, their arrangement and their functions as parts of the unit. Many linguists think that the investigation of the components and their arrangement suffices. Standing on such ground, I would like to point out tasks and aims of my work:
1. The first task of my work is to give definition to term «sentence».
2. The second task is to describe the structure of sentences and its reduction in Modern English.
3. The last task of my work is to characterize types of parts of the sentence.
In our opinion the practical significance of our work is hard to be overvalued. This work reflects modern trends in linguistics and we hope it would serve as a good manual for those who want to master modern English language. Also this work can be used by teachers of English language for teaching English grammar.
The actuality of this work caused by several important points. We seem to say that the sentence reduction is one of the main trends in development of Modern English, especially in its colloquial layer, which, in its turn at high degree is supported by development of modern informational technologies and simplification of alive speech. So the significance of our work can be proved by the following reasons:
a) Sentence reduction is one of the developing branches of lexicology nowadays.
b) Reduction reflects the general trend of simplification of a language.
c) Shortening is closely connected with the development of modern informational technologies.
d) Being a developing branch of linguistics it requires a special attention of teachers to be adequated to their specialization in English.
Having based upon the actuality of the theme we are able to formulate the general goals of our qualification work.
a) To study, analyze, and sum up all the possible changes happened in the studied branch of linguistics for the past fifty years.
b) To teach the problem of shortening to young English learners.
c) To demonstrate the significance of the problem for those who want to brush up their English.
d) To mention all the major of linguists' opinions concerning the subject studied.
If we say about the new information used within our work we may note that the work studies the problem from the modern positions and analyzes the modern trends appeared in this subject for the last ten years.
If we say about the methods of scientific approaches used in our work we can mention that the method of typological analysis was used.
After having proved the actuality of our work, I would like to describe the composition of it:
My work consists of four parts: introduction, the main part, conclusion and references. Within the introduction part we gave the brief description of our course paper. The main part of the work includes several items. There we discussed such problems as the types of sentences in English, their construction, some types of sentence reduction. In the conclusion to our work we tried to draw some results from the scientific investigations made within the present course paper. In references we mentioned some sources which were used while compiling the present work. It includes linguistic books and articles dealing with the theme, also some internet sources.
method structure decline suggestion
1. The Sentence
The notion of sentence has not so far received a satisfactory definition, which would enable us by applying it in every particular case to find out whether a certain linguistic unit was a sentence or not.
Thus, for example, the question remains undecided whether such shop notices as Book Shop and such book titles as English are sentences or not. In favour of the view that they are sentences the following consideration can be brought forward. The notice Book Shop and the title English Grammar mean 'This is a book shop', 'This is an English Grammar'; the phrase is interpreted as the predicative of a sentence whose subject and link verb have been omitted, that is, it is apprehended as a unit of communication. According to the other possible view, such notices as Book Shop and such titles as English Grammar are not units of communication at all, but units of nomination, merely appended to the object they denote. Since there is as yet no definition of a sentence which would enable us to decide this question, it depends on everyone's subjective view which alternative he prefers. We will prefer the view that such notices and book titles are not sentences but rather nomination units.
We also mention here a special case. Some novels have titles formulated as sentences, e. g. The Stars Look Down, by A. Cronin, or They Came to a City, by J.B. Priestley. These are certainly sentences, but they are used as nomination units, for instance, Have you read The Stars Look Down? Do you like They Came to a City?
With the rise of modern ideas of paradigmatic syntax yet another problem concerning definition of sentence has to be considered.
In paradigmatic syntax, such units as He has arrived, He has not arrived, Has he arrived, He will arrive, He will not arrive, Will he arrive, etc., are treated as different forms of the same sentence, just as arrives, has arrived, will arrive etc., are different forms of the same verb. We may call this view of the sentence the paradigmatic view.
Now from the point of view of communication, He has arrived and He has not arrived are different sentences since they convey different information (indeed, the meaning of the one flatly contradicts that of the other).
2. Structure of English Sentence
When studying the structure of a unit, we find out its components, mostly units of the next lower level, their arrangement and their functions as parts of the unit.
Many linguists think that the investigation of the components and their arrangement suffices. Thus Holliday writes: «Each unit is characterized by certain structures. The structure is a syntagmatic framework of interrelated elements, which are paradigmatically established in the systems of classes and stated as values in the structure…. if a unit 'word' is established there will be dimensions of word-classes the terms in which operate as values in clause structures: given a verb /noun/ adverb system of word classes, it might be that the structures ANV and NAV were admitted in the clause but NVA excluded».
Now `a syntagmatic framework of interrelated elements' may describe the structure of a combination of units as well as that of a higher unit, a combination of words as well as a sentence or a clause. The-important properties that unite the interrelated elements into a higher unit of which they become parts, the function of each element as part of the whole, are not mentioned.
Similarly, Z. Harris thinks that the sentence The fear of war grew can be described as TN1PN2V, where T stands for article, N for noun, P for preposition and V for verb.
Such descriptions are feasible only if we proceed from the notion that the difference between the morpheme, the word and the sentence is not one of quality but rather of quantity and arrangement.
Z. Harris does not propose to describe the morpheme (as he calls it) is as VC, where V stands for vowel and C for consonant. He does not do so because he regards a morpheme not as an arrangement of phonemes, but as a unit of a higher level possessing some quality (namely, meaning) not found in any phoneme or combination of phonemes outside the morpheme. Since we assume that not only the phoneme and the morpheme, but also the word and the sentence are units of different levels, we cannot agree to the view that a sentence is merely an arrangement of words.
In our opinion, The fear of war grew is a sentence not because it is TNPNV, but because it has properties not inherent in words. It is a unit of communication and as such it possesses predicativity and intonation. On the other hand, TNPNV stands also for the fear of war growing, the fear of war to grow, which are not sentences.
As to the arrangement of words in the sentence above, it fully depends upon their combinability. We have TN and not NT because an article has only right-hand connections with nouns. A prepositional phrase, on the contrary has left-hand connections with nouns; that is why we have TNPN, etc.
The development of transform grammar (Harris, Chomsky) and tagmemic grammar (Pike) is to a great extent due to the realization of the fact that «an attempt to describe grammatical structure in terms of morpheme classes alone - even successively inclusive classes of classes - is insufficient».
As defined by Harris, the approach of transformational grammar differs from the above-described practice of characterizing «each linguistic entity… as composed out of specified ordered entities at a lower level» in presenting «each sentence as derived in accordance with a set of transformational rules, from one or more (generally simpler) sentences, i.e. from other entities of the same level. A language is then described as consisting of specified sets of kernel sentences and a set of transformations».
For English Harris lists seven principal patterns of kernel sentences:
1. NvV (v stands for a tense morpheme or an auxiliary verb, i.e. for a (word-) morpheme containing the meanings of predicativity).
2. NvVPN
3. NvVN
4. N is N
5. N is A (A stands for adjective)
6. N is PN
7. N is D (D stands for adverb)
As one can easily see, the patterns above do not merely represent arrangements of words, they are such arrangements which contain predicativity - the most essential component of a sentence. Given the proper intonation and replaced by words 4hat conform to the rules of combinability, these patterns will become actual sentences. Viewed thus, the patterns may be regarded as language models of speech sentences.
One should notice, however, that the difference between the patterns above is not, in fact, a reflection of any sentence peculiarities. It rather reflects the difference in the combinability of various subclasses of verbs.
The difference between `NvV and `NvVN', for instance, reflects the different combinability of a non-transitive and a transitive verb (He is sleeping: He is writing letters. Cf. to sleep, to write letters). The difference between those two patterns and `N is A' reflects the difference in the combinability of notional verbs and link verbs, etc.
A similar list of patterns is recommended to language teachers under the heading These are the basic patterns for all English sentences:
1. Birds fly.
2. Birds eat worms.
3. Birds are happy.
4. Birds are animals.
5. Birds give me happiness.
6. They made me president.
7. They made me happy.
The heading is certainly rather pretentious. The list does not include sentences with zero predications or with partially implied predicativity while it displays the combinability of various verb classes.
S. Potter reduces the number of kernel sentences to three: «All simple sentences belong to one of three types:
A. The sun warms the earth;
B. The sun is a star; and
C. The sun is bright.»
And as a kind of argument he adds: «Word order is changeless in A and B, but not in C. Even in sober prose a man may say Bright is the sun.»
The foregoing analysis of kernel sentences, from which most English sentences can be obtained, shows that «every sentence can be analysed into a centre, plus zero or more constructions… The centre is thus an elementary sentence; adjoined constructions are in general modifiers». S In other words, the essential structure constituting a sentence is the predication; all other words are added to it in accordance with their combinability. This is the case in an overwhelming majority of English sentences. Here are some figures based on the investigation of modern American non-fiction.
No |
Pattern |
Frequency of occurrence (per cent) |
||
as sole pattern |
in combination |
|||
1. 2. 3. 4. 5. |
Subject + verb Babies cry. Subject + verb + objec Girls like clothes. Subject + verb + predicative Dictionaries are books. Dictionaries are useful. Structural subjects + verb + + notional subject There is evidence. It is easy\o learn knitting. Minor patterns Are you sure? Whom did you invite? Brush your teeth. What a day |
2.51 32.9 20.8 4.3 7.9 |
5.3 5.9 6.4 0.9 |
Some analogy can be drawn between the structure of a word and the structure of a sentence. The morphemes of a word are formally united by stress. The words of a sentence are formally united by intonation.
The centre of a word is the root. The centre of a sentence is the predication.
Some words have no other morphemes but the root (ink, too, but). Some sentences have no other words but those of the predication (Birds fly. It rains. Begin.).
Words may have some morphemes besides the root (unbearable). Sentences may have some words besides the predication (Yesterday it rained heavily.).
Sometimes a word is made of a morpheme that is usually not a root (ism). Sometimes sentences are made of words that are usually not predications (Heavy rain).
Words may have two or more roots (blue-eyed, merry-go-round). Sentences may have two or more predications (He asked me if I knew where she lived.).
The roots may be co-ordinated or subordinated (Anglo-Saxon, blue-bell). The predications may be co-ordinated and subordinated (She spoke and he listened. He saw Sam did not believe).
The roots may be connected directly (footpath) or indirectly, with the help of some morpheme salesman. The predications may be connected directly (7 think he knows) or indirectly, with the help of some word (The day passed as others had-passed.).
The demarcation line between a word with more than one root and a combination of words is often very vague (cf. blackboard and black board, brother-in-law and brother in arms). The demarcation line between a sentence with more than one predication and a combination of sentences is often very vague.
Cf. She'd only to cross the pavement. But still she waited. (Mansfield).
As we know, a predication in English is usually a combination of two words (or word-morphemes) united by predicativity, or, in other words, a predicative combination of words. Apart from that the words of a predication do not differ from other' words in conforming to the general rules of. Combinability. The rules of grammatical combinability do not admit of *boys speaks or *he am. The combination *the fish barked is strange as far as lexical combinability is concerned, etc.
All the other words of a sentence are added to those of the predication in accordance with their combinability to make the communication as complete as the speaker wishes. The predication Boys play can make a sentence by itself. But the sentence can be extended by realizing the combinability of the noun boys and the verb play into the three noisy boys play boisterously upstairs. We can develop the sentence into a still more extended one. But however extended the sentence is it does not lose its integrity. Every word in it is not just a word, it becomes part of the sentence and must be evaluated in its relation to other parts and to the whole sentence much in the same way as a morpheme in a word is not just a morpheme, but the root of a word or a prefix, or a suffix, or an inflection.
Depending on their relation to the members of the predication the words of a sentence usually fall into two groups - the group of the subject and the group of the predicate.
Sometimes there is a third group, of parenthetical words, which mostly belongs to the sentence as a whole. In the sentence below the subject group is separated from the predicate group by the parenthetical group.
That last thing of yours, dear Flora, was really remarkable.
As already mentioned, the distribution and the function of a word-combination in a sentence are usually determined by its head-word: by the noun in noun word-combinations, by the verb in verb word-combinations, etc.
The adjuncts of word-combinations in the sentence are added to their head-words in accordance with their combinability, to develop the sentence, to form its secondary parts which may be classified with regard to their head-words.
All the adjuncts of noun word-combinations in the sentence can be united under one name, attributes. All the adjuncts of verb (finite or non-finite) word-combinations may be termed complements. In the sentence below, the attributes are spaced out and the complements are in heavy type.
He often took Inene to the theatre. Instinctively choosing the modern Society plays with the modern Society conjugal problems. (Galsworthy).
The adjuncts of all other word-combinations in the sentence may be called extensions. In the sentences below the extensions are spaced out.
You will never be free from dozing and dreams. (Shaw).
She was ever silent, passive, gracefully averse. (Gals-worthy).
The distribution of semi-notional words in the sentence is determined by their functions - to connect notional words or to specify them. Accordingly they will be called connectives or specifies. Conjunctions and prepositions are typical connectives. Particles are typical specifies.
3. Sentence Reduction
Sentence reduction is the removal of redundant words or phrases from an input sentence by creating a new sentence in which the gist of the original meaning of the sentence remains unchanged.
3.1 A Sentence Reduction Using Syntax Control Abstract
Methods of sentence reduction have been used in many applications. Grefenstette (G.Grefenstette, 1998) proposed removing phrases in sentences to
produce a telegraphic text that can be used to provide audio scanning services for the blind. Dolan (S.H. Olivers and W.B.Dolan, 1999) proposed removing clauses in sentences before indexing document for information retrieval. Those methods remove phrases based on their syntactic categories but not rely on the context of words, phrases and sentences around. Without using that information can be reduced the accuracy of sentence reduction problem. Mani and Maybury also present a process of writing a reduced sentence by reversing the original sentence with a set of revised rules to improve the performance of summarization. (Inderject Mani and Mark Maybury, 1999). Jing and McKeown(H. Jing, 2000) studied a new
method to remove extraneous phrase from sentences by using multiple source of knowledge to decide which phrase in the sentences can be removed. The multiple sources include syntactic knowledge, context information and statistic computed from a corpus that consists of examples written by human professional.
Their method prevented removing some phrases that were relative to its context around and produced a grammatical sentence. Recently, Knight and Marcu(K.Knight and D.Marcu, 2002) demonstrated two methods for sentence compression problem, which are similar to sentence reduction one. They devised both noisychannel and decision tree approach to the problem. The noisy-channel framework has been used in many applications, including speech recognition, machine translation, and information retrieval. The decision tree approach has been used in parsing sentence. (D. Magerman, 1995)(Ulf Hermijakob and J.Mooney, 1997) to define the rhetorical of text documents (Daniel Marcu, 1999). Most of the previous methods only produce a short sentence whose word order is the same as that of the original sentence, and in the same language, e.g., English. When nonnative speaker reduce a long sentence in foreign language, they usually try to link the meaning of words within the original sentence into meanings in their language. In addition, in some cases, the reduced sentence and the original sentence had their word order are difference. Therefore, two reduced sentences are performed by non-native speaker, one is the reduced sentence in foreign language and another is in their language. Following the behavior of nonnative speaker, two new requirements have been arisen for sentence reduction problem as follows:
1) The word order of the reduced sentence may different from the original sentence.
2) Two reduced sentences in two difference languages can be generated.
With the two new perspectives above, sentence reduction task are useful for many applications such as: information retrieval, query text summarization and especially cross-language information retrieval. To satisfy these new requirements, we proposed a new algorithm using semantic information to simulate the behavior of nonnative-speaker. The semantic information obtained from the original sentence will be integrated into the syntax tree through syntax control.
Formulation
Let E and V be two difference languages. Given a long sentence e : e1; e2;:::; en in the language E. The task of sentence reduction into two languages E and V is to remove or replace some redundant words in the sentence e to generate two new
sentences e0 1; e0 2; :::; e0 m and v1; v2; :::; vk in language E and V so that their gist meanings are unchanged. In practice, we used English language as a source language and the target language are in English and Vietnamese. However, the reader should understand that our method can apply for any pair of languages. In the following part we present an algorithm of sentence reduction using syntax control with rich semantic information.
Sentence reduction algorithm
We present an algorithm based on a semantic parsing in order to generate two short sentences into difference languages. There are three steps in a reduction
algorithm using syntax control. In the first step, the input sentence e will be parsed into a syntax tree t through a syntax parser. In the second step, the syntax tree will be added rich semantic information by using a semantic parser, in which each node of the syntax tree is associated with a specific syntax control. The final step is
a process of generating two deference sentences into language E and V language from the syntax tree t that has been annotated with rich semantic information.
Syntax parsing
First, We parse a sentence into a syntax tree. Our syntax parser locates the subject, object, and head word within a sentence. It also recognizes phrase verbs, cue phases or expressions in English sentences. These are useful information to reduce sentence. The Figure 2 explains the equivalent of our grammar symbol with English grammar symbol. Figure 1 shows an example of our syntax parsing for the sentence ”Like FaceLift, much of ATM's screen performance depends on the underlying application”. To reduce the ambiguity, we design a syntactic parsing
base on grammar symbols, which classified in detail. Part of speech of words was extended to cope with the ambiguity problem. For example, in Figure 2, ”noun” was dived into ”private noun” and ”general noun”.
The bilingual dictionary was built including about 200,000 words in English and its meaning in Vietnamese. Each English word entry includes several meanings in Vietnamese and each meaning was associated with a symbol meaning. The set of symbol meanings in each word entry is defined by using WordNet database.(C. Fellbaum, 1998) The dictionary also contained several phrases, expressions in Figure 1: An example of syntax tree of ”Like FaceLift, much of ATM's screen performance depends on the underlying application” English and its equivalent to Vietnamese.
Semantic parsing using syntax control
After producing a syntax tree with rich information, we continue to apply a semantic parsing for that syntax tree. Let N be an internal node of the syntax tree t and N has k children nodes: n1; n2; :::nk . The node N based on semantic information from its n children nodes to consider what the remained part in the reducing sentence should be. When parsing semantic for the syntax tree t, each N must be used the information of children nodes to define its information. We call that information is semantic-information of the node N and define it as N:sem . In addition, each semantic-information of a given node N was mapped with a meaning in the Figure 2: Example of symbol Equivalent target language. For convince, we define SI is a set of semanticinformation and assume that the jth semanticinformation of the node nj is nj [i]. To understand what the meaning of the node N should be, we have to know the meaning of each children node and know how to combine them into meanings for the node N . Figure 3: Syntax control Figure 3 shows two choices for sequence meanings of the node N in a reduction process . It is easy for human to understand exactly which meaning of ni should be and then decoding them as objects to memorize. With this basic idea, we design a control language to do this task. The k children nodes n1; n2; :::nk are associated
with a set of a syntax control to conduct the reducing sentence process. The node N and its children are associated with a set of rules. To present the set of rules we used a simple syntax of a control language as follows:
1) Syntax to present the order of children nodes and nodes to be removed.
2) Syntax to constraint each meaning of a children node with meanings of other children nodes.
3) Syntax to combine sequence meanings into one symbol meaning (this process called a inherit process from the node N to its children). A syntax rule control will be encoded as onegeneration rules and a set of condition rules so that the generation rule has to satisfy. With a specification condition rule, we can define its generation rule directly.
Condition rule
A condition rule is formulated as follows: if nj1:sem = v1 ^ nj2:sem = v2::: ^ njm:sem = vm then N:sem = v with v and vj 2 SI
Generation rule
A generation rule is a sequence of symbols in order to transfer the internal node N into the internal node of a reduced sentence. We used two generation rules, one for E and other one for V . Given a sequence symbols g : g1g2:::gm , in which gi is an
integer or a string. The equation gi = j means the children node be remained at position j in the target node. If gi = "v1v2:::vl", we have that string will in the children node ni of the target node. Figure 1 shows a syntax tree of the input sentence: ”Much of ATM's performance depends on the underlying application.”. In this syntax tree, the syntax rule:”S1=Bng-daucau Subj cdgt Bng-cuoicau” will
be used the syntax control bellow to reduce < Con > default < =Con > < Gen > 1 2 < =Gen > The condition rule is ”default” mean the generation rule is applied to any condition rule. The generation rule be ”1 2” mean only the node (Subj) in the
index 1 and the node (cdgt) in the index 2 of the rule ”S1=Bng-daucau Subj cdgt Bng-cuoicau” are remained in the reduced sentence. If the syntax control is changed to < Con > Subj = HUMAN < =Con > < Gen > 1 2 < =Gen > This condition rule means that only the case the semantic information in the children node "Subj" is "HUMAN" the generation rule ”1 2” is applied for reduction process. Using the default condition rule the reduced sentences to be generated as follows. Original sentence: Like FaceLift, much of ATM's screen performance depends on the underlying application. Reduced sentence in English: Much of ATM's performance depends on the underlying application. Reduced sentence in Vietnamese: Nhieu hieu suat cua ATM phu thuoc vao nhung ung dung tiem an.
In order to generating reduced sentence in Vietnamese language, the condition rule and generation is also designed. This process is used the same way as transfer translation method. Because the gist meaning of a short sentence is unchanged in comparing with the original sentence, the gist meaning of a node after applying the syntax control will be unchanged. With this assumption, we can reuse the syntax control for translating the original sentence into other languages (English into Vietnamese) for translating the reduced sentence. Therefore, our sentence reduction program can produce two reduced sentences in two difference languages.
Our semantic parsing used that set of rules to select suitable rules for the current context. The problem of selecting a set of suitable rules for the current context
of the current node N is to find the most likely condition rule among the set of syntax control rules that associated with it. Thus, semantic parsing using syntax control problem can be described mathematically as follows:
Given a sequence of children nodes n1; n2; :::; nk of a node N, each node ni consist of a list of meaning, in which each meaning was associated with a symbol meaning. The syntax rule for the node N was associated with a set of condition rules. In addition, one condition rule is mapped with a specification generation rule. Find the most condition rules for that node sequences. This problem can be solved by using a variant of the Viterbi algorithm (A.J. Viterbi, 1967). Firstly, we define each semantic-information of a children node with all index condition rules. Secondly, we try to find all sequences that come from the same condition rules.
3.2 A New Sentence Reduction based on Decisions tree model
This chapter is about a novel sentence reduction algorithm base on decision tree model where semantic information is used to enhance the accuracy of sentence reduction. The proposed algorithm is able to deal with the changeable order problem in sentence reduction. Experimental show a better result when comparing with the original methods.
Many researches in automatic text summarization were focused on extraction or identifying the important clauses and sentences, paragraphs in texts. Meanwhile, humans used to produce summaries by creating new sentences that are grammatical, that cohere with one another, and capture the most salient parts of information in the original document. Sentence reduction is the problem of removing some redundant words or some phrases from the original sentence by creating a new sentence, in which the gist meaning of the original sentence was changed . Methods of sentence reduction have been applied in many applications. Grefenstette (Grefenstette,S,1998) proposed removing phrases in sentences to produce a telegraphic text that can be used to provide audio scanning services for the blind. Dolan (Donlan,W.B, 1999) proposed removing clauses in sentences before indexing document for information retrieval. Those methods removed phrases based on their syntactic categories without relying on the context of words, phrases and sentences around. Therefore, those methods are unsuitable for text summarization task. Sentence reduction for text summarization is pointed out by Mani and Maybury (Mani and Maybury,1999). The authors present a process of writing reduced sentences by reversing the original sentence with a set of revised rules. Jing (Jing,H, 2000) also studied a method to remove extraneous phrases from sentences by using multiple source of knowledge to decide which phrase can be removed. The multiple sources include syntactic knowledge, context information and statistic computed from a corpus that consists of examples written by human professional. Their method prevented removing some phrases that were relative to its context around and produced a grammatical sentence, and applied to the cut and paste summarization strategy. Recently, Knight and Marcus (Knight and Marcu,D, 2002) demonstrated two methods for sentence compression problem based on corpus. They devised both noisy-channel and decision tree approach to the problem. The decision tree approach has been applied in parsing sentence and defining the rhetorical of text documents and achieved a good results in sentence compression.
In almost previous methods, the order of reduced sentences is the same with the original sentence. Meanwhile, in summrizing document, human may perform a changeable order to ensure the summary document is smooth and coherence. This fact requires a new sentence reduction with the order of reduced sentence is different from the orignal. In addition to using sentence reduction for text summarization, the information of syntactic is not enough. The semantic information of original sentences should be incorporated with reduction process to enhance the accuracy of reduction process. This fact is also similar to the behavior of human in reduction sentence that they can understand the meaning of original sentences to ensure that important words is remained in reduced sentences. To satisfy the new requirements mentioned above, we proposed a new sentence reduction based on decision tree model where semantic information is used to support reduction process. The decision tree model is also extended to cope with the changeable order between original sentences and reduced sentences.
Decision tree model for sentence reduction
The following sections will present a sentence reduction based on decision tree model using rich semantic information. Let t and s be a syntax tree of the original sentence and a reduced sentence respectively. To perform a rewriting process we used an Input list, two stacks and some rewriting operators are defined as follows.
An Input list consists of a sequence of words subsumed by the tree t where each word in the input list is labeled with the name of all syntactic constituents in t that start with it. CSTACK is a stack consists of all sub trees in order to rewrite a small tree. RSTACK is a stack consists of all removed nodes in rewriting process from a large tree t into a small tree s . Five operators are used to rewrite a larger tree t into a smaller tree s are as follows;
* smFr-operator transfers a first word from the input list into CSTACK. It was written in
mathematic by the label sta-T.
* REDUCE-operators pops the k syntactic trees located at the top of CSTACK and combine them into a new tree. These operators are formulated as REDUCE (k, x ) , in which k is an integer and X is a grammar symbol.
* DROP-operators are used to remove from the input list subsequences of word that correspond to syntactic constituents to RSTACK. Both REDUCE-operators and DROP-operators are used to derive the structure of the syntactic tree of the short sentence. They were written as DROP x with X is a grammar symbol.
* ASSIGN TYPE-operators are used to change the label of trees at the top of the CSTACK. These POS tags may be different from the POS tags in the original sentence. These operators are written as ASSIGN TYPE (X) , which x are POS tags.
* RESTORE-operators take the leh element in RSTACK to remove that element into the Input list. These operators are designed with the assumption that a sub-tree was removed from the input list still affects the current decision. We also formulated it as RESTORE k where k is an integer.
A DROP x operators deletes from the input list all words that are spanned by constituent x in t and store them into CSTACK. The operator RESTORE is designed to restore some words in RSTACK to generate a small tree s . With these operators, the order of words within a small tree s can be changed in comparing with the word order of the large tree t .
Features
The features we used in this model consist of:
* Some features come from the input list.
* Some features come from the configuration of CSTACK.
* Some features come from the configuration of RSTACK.
There are two kinds of features were described followings:
Operation features
Theses operators reflect the number of trees in CSTACK and the number of elements in RSTACK and the type of the last five operators. We also consider the information of two stacks as the information denotes the syntactic category of the root nodes of the partial trees build up to a certain time.
Original tree specific features denote the syntactic constituents that start with the first unit in the input list.
Semantic feature
The semantic features we used including: The semantic information of current words within the input list. The semantic type we used including some general semantic types such as, HUMAN, THINGS, ANIMAL, CONCEPT, INSTRUCTOR, COMPUTER, etc. Some semantic information such as, the word in the input list is head word or not. The boolean value is used to define whether or not a word is in the subcategorization table.
Process of reduction sentence
After using decision tree learning to generate a set of rules, we have each configuration of two stacks and input list that correspond to a decision action. A given input sentence was parsed and each word within the input list was corresponding to the word in the sentence and the sequence of syntactic constituents that begins with at each word. We simulate the rewriting process, in which each configuration of two stacks and one input list were executed with an operator and change to a new state and so on. The processes repeat until the Input list is empty and there is only one sub tree in CSTACK with its root node is the one of terminal symbols (the symbol to recognize it as a root symbol) or RSTACK is empty. An order traversal of the leaves of this tree that produces the reduced version of the sentence was given as input.
Reduction Procedure
Input : an input sentence
Output: a reduced sentence
Step 1. The input sentence is parsed into a syntax tree.
Step 2. The syntax tree is enriched semantic information .
Step 3. create an input list and set CSTack and RStack to empty.
Step 4. Call a traversal procedure to obtain a reduced syntax tree
Step 5. Generate a reduced sentence from the reduced syntax tree
Traversal procedure
Input: Input list, CSTack, RStack
Output: A reduced tree while(not terminal condition) { feature=get contextual feature(); action= get action(feature); parameter=get_parameter(action); switch (action) { case SHIFT: SHIFT(); case ASSIGN TYPE: ASSIGN TYPE(parameter); break; case REDUCE: Reduce(parameter); break; case DROP: Drop(parameter); break; case RESTORE: Restore(parameter); break;}
The process of reduction sentence
In the traversal procedure, we use some functions and sub procedures are as follows: get contextual features, get action and get parameter. The function get contextual, features extracts the vector of features. The function get action and get parameter are used to get information of operator and parameter for performing the procedure SHIFT, DROP, RESTORE, ASSIGNT TYPE and REDUCE.
We have presented the new algorithms that allow rewriting a long sentence into reduced sentences with the order of short sentence is able to be different from the original sentence. We claimed that the semantic information of the original sentence was very useful for sentence reduction problem. Experimental results showed that the proposed algorithm improved the original algorithm. For future work we continue testing on the large corpus and integrating with a summarization system are currently underway.
Conclusion
In the conclusion of my work, I would like to say some words according the done investigation. The main research was written in the main part of my course paper.
We have presented an algorithm that allows rewriting a long sentence into two reduced sentences in two difference languages. We compared our methods with the other methods to show advantages as well as limits of the method. We claim that the semantic information of the original sentence through using syntax control is very useful for sentence reduction problem. We proposed a method for sentence reduction using semantic information and syntactic parsing so-called syntax control approach. Our method achieved a higher accuracy and the outputted reduction sentences in two different languages e.g. English and Vietnamese. Thus, it is closed to the outputs of non-native speaker in reduction manner. Investigate machine learning to generate syntax control rules automatically from corpus available are promising to enhance the accuracy of sentence reduction using syntax control .
Having analyzed the problem of shortening of words in Modern English we could do the following conclusions:
a) The problem of shortened words in Modern English is very actual nowadays.
b) There are several kinds of reduction: using syntax control abstract, based on decisions tree model.
c) A number of famous linguists dealt with the problem of sentence reduction in Modern English.
Having said about the perspectives of the work we hope that this work will find its worthy way of applying at schools, lyceums and colleges of high education by both teachers and students of English. We also express our hopes to take this work its worthy place among the lexicological works dedicated to the types of shortening.
References
1. Иванова И.П., Бурлакова В.В., Почепцов Г.Г. Теоретическая грамматика современного английского языка. - М., 1981. - 285 c.
2. B. Ilyish, The Structure of Modern English.
3. V.N. Zhigadlo, I.P. Ivanova, L.L. Iofik.» Modern English language» (Theoretical course grammar) Moscow, 1956 y.
4. O. Jespersen. Essentials of English Grammar. N.Y., 1938
5. Ch. Barber. Linguistic change in Present-Day English. Edinburgh, 1964
6. Inderjeet Math and Mark Maybury, editor. 1999. Advances in Automatic Text summarization. The MIT Press.
7. Grefenstette,G. 1998. Producing intelligent telegraphic text reduction to provide an audio scanning service for the blind , In Working notes of the AAAI Spring Symposium on Intelligent Text summarization, pp.111-118.
8. Corston-Olivers,S.H and Dolan,W.B. 1999. Less is more; eliminating index terms from subordinate clauses.In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistic, pp.349-356.
9. Jing ,H. 2000. Sentence reduction for automatic text summarization. In Proceeding of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics NAACL-2000
10. Knight,K and Marcu ,D . 2002 Summarization beyond sentence extraction: A Probabilistic approach to sentence compression. Artificial Intelligent 139 : 91-107.
11. Magerman, D. 1995. Statistical decision tree models for parsing. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistic, pp.276-283.
12. Ulf Hermijakob and Raymond J. Mooney. 1997. Learning parse and translation decision from examples with rich context. In Proceeding of ACL/EACL'97, pp 482-489,1997.
13. Marcu ,D. 1999. A decision- based approach to Rhetorical parsing. In Proc. Of ACL'99, pp.365-372, 1999.
14. Eugene Charniak. 2000. A Maximum entropy inspired parser. In Proceedings of the first Annual Meeting of the North American Chapter of the Association for Computational Linguistic NAACL-2000, pp.132-139.
15. Fellbaum, C. 1998. WORDNET: An Electronic Lexical Database. Mit Press.
16. Quilan, J. 1993. C4.5: Programs for Machine Learning, Morgan Kaufman, San Mateo, CA.
Размещено на Allbest.ru
Подобные документы
Translating of suggestion into the English language. Use of regular shape of participle. The use of correct times of verbs is in suggestion. Putting of verbs in brackets in Gerund or Infinitive. Development of skills of business intercourse in English.
контрольная работа [27,1 K], добавлен 04.03.2011The morphological structure of a word. Morphemes. Types of morphemes. Allomorphs. Structural types of words. Principles of morphemic analysis. Derivational level of analysis. Stems. Types of stems. Derivational types of words.
реферат [11,3 K], добавлен 11.01.2004Traditional periodization of historical stages of progress of English language. Old and middle English, the modern period. The Vocabulary of the old English language. Old English Manuscripts, Poetry and Alphabets. Borrowings in the Old English language.
презентация [281,2 K], добавлен 27.03.2014An analysis of homonyms is in Modern English. Lexical, grammatical and lexico-grammatical, distinctions of homonyms in a language. Modern methods of research of homonyms. Practical approach is in the study of homonyms. Prospects of work of qualification.
дипломная работа [55,3 K], добавлен 10.07.2009Study of different looks of linguists on an accentual structure in English. Analysis of nature of pressure of the English word as the phonetic phenomenon. Description of rhythmic tendency and functional aspect of types of pressure of the English word.
курсовая работа [25,7 K], добавлен 05.01.2011Multiple negation – the use of two or sometimes several negative markers in a statement. Old English and Middle English periods. Decline of multiple negation. Approaches to the multiple negation classification. Analysis of Maylory’s Morte Darthur.
курсовая работа [31,7 K], добавлен 17.04.2011The development of Word Order. Types of syntactical relations words in the phrase, their development. The development of the composite sentence. The syntactic structure of English. New scope of syntactic distinctions and of new means of expressing them.
лекция [22,3 K], добавлен 02.09.2011In the world there are thousands of different languages. How indeed modern English is optimum mean for intercourse of people of different nationalities. Knowledge of English is needed for the effective teaching subsequent work and improvement of our life.
сочинение [13,7 K], добавлен 11.02.2009Studying the appearance of neologisms during the Renaissance, semantic features of neologisms in modern English, the types of neologisms, their division by their structure. Analysis sociolinguistic aspects of mathematical education based on neologisms.
дипломная работа [60,2 K], добавлен 18.03.2012Diversity of dialects of the Old English period. Analysis of dialectal words of Northern English in the modern language. Differences between dialects and Standard language; investigation of differences between their grammar, pronunciation and spelling.
курсовая работа [124,4 K], добавлен 07.11.2015