Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
barfoid 4
Aug 21, 2014

by XyloJW
ng tool. LOL. YOSPOS BITCH!

Adbot
ADBOT LOVES YOU

power botton
Nov 2, 2011

we already have an enterprise software thread

black man 3
Oct 29, 2014

by XyloJW
'barfoid 4' pwning YOSPOS Bitches itt Lol

jesus WEP
Oct 17, 2004


run query, dump results to an excel ss
that's what the end user is gonna do anyway

maniacdevnull
Apr 18, 2007

FOUR CUBIC FRAMES
DISPROVES SOFT G GOD
YOU ARE EDUCATED STUPID

St Evan Echoes posted:

run query, dump results to an excel ss
that's what the end user is gonna do anyway

how else would you be able to sort or filter the result? jeez louise what a noob!

power botton
Nov 2, 2011

you can't run analytics on a screenshot

Fart.Bleed.Repeat.
Sep 29, 2001

i dont use sql i use crystal reports with sql

barfoid 4
Aug 21, 2014

by XyloJW

St Evan Echoes posted:

run query, dump results to an excel ss
that's what the end user is gonna do anyway

Lmao. Scrub teir analytics for a scrub teir organization. NEXT!

barfoid 4
Aug 21, 2014

by XyloJW

Ruby got Railed posted:

i dont use sql i use crystal reports with sql

Bug report: the spoil tag does not unspoil on clicking :) using an iPhone 5s here

Fart.Bleed.Repeat.
Sep 29, 2001

barfoid 4 posted:

Bug report: the spoil tag does not unspoil on clicking :) using an iPhone 5s here

oh its supposed to say "with sql" but thats the spoilered part so now you know

Widdiful
Oct 10, 2012

wow thanks for the spoiler, jerk

Fart.Bleed.Repeat.
Sep 29, 2001

CREATE TRIGGER

black man 3
Oct 29, 2014

by XyloJW
Bump, really funny poo poo here

Luigi Thirty
Apr 30, 2006

Emergency confection port.

may I recommend some kind of report program generator

Asymmetric POSTer
Aug 17, 2005

St Evan Echoes posted:

run query, dump results to an excel ss
that's what the end user is gonna do anyway

we do this

Asymmetric POSTer
Aug 17, 2005

barfoid 4 posted:

Lmao. Scrub teir analytics for a scrub teir organization. NEXT!

lol if you dont work for a scrub teir organization collecting a fat paycheck doing nothing

black man 3
Oct 29, 2014

by XyloJW

mishaq posted:

lol if you dont work for a scrub teir organization collecting a fat paycheck doing nothing

Haha yeah!

black man 3
Oct 29, 2014

by XyloJW
Literally laffing hysterically at ppl who have to actually be competent at their job to earn a living... I feel extremely like Goldman Sachs itt

A Wheezy Steampunk
Jul 16, 2006

High School Grads Eligible!
i've never written a report in my life, op

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

Ruby got Railed posted:

CREATE TRIGGER

WARNING

madeupfred
Oct 10, 2011

by FactsAreUseless
They Named the Second Barfoid Barfoid 4

rotor
Jun 11, 2001

classic case of pineapple derangement syndrome
I drop down to node.js when I want to get close to the metal

barfoid 4
Aug 21, 2014

by XyloJW

rotor posted:

I drop down to node.js when I want to get close to the metal

I too, am a fucktard.

pram
Jun 10, 2001
if your parser isnt some horrible chimera of xml and java then lol

barfoid 4
Aug 21, 2014

by XyloJW
what is a parser.

pram
Jun 10, 2001
Parsing
From Wikipedia, the free encyclopedia
"Parse" redirects here. For other uses, see Parse (disambiguation).
"Parser" redirects here. For the computer programming language, see Parser (CGI language).
Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).[1][2]

The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.

Within computational linguistics the term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information.

The term is also used in psycholinguistics when describing language comprehension. In this context, parsing refers to the way that human beings analyze a sentence or phrase (in spoken language or text) "in terms of grammatical constituents, identifying the parts of speech, syntactic relations, etc."[2] This term is especially common when discussing what linguistic cues help speakers to interpret garden-path sentences.

Within computer science, the term is used in the analysis of computer languages, referring to the syntactic analysis of the input code into its component parts in order to facilitate the writing of compilers and interpreters.

Contents [hide]
1 Human languages
1.1 Traditional methods
1.2 Computational methods
1.3 Psycholinguistics
2 Computer languages
2.1 Parser
2.2 Overview of process
3 Types of parsers
4 Types of parsers
4.1 Top-down parsers
4.2 Bottom-up parsers
4.3 Parser development software
5 Lookahead
6 See also
7 References
8 Further reading
9 External links
Human languages[edit]
Main category: Natural language parsing
Traditional methods[edit]
The traditional grammatical exercise of parsing, sometimes known as clause analysis, involves breaking down a text into its component parts of speech with an explanation of the form, function, and syntactic relationship of each part.[3] This is determined in large part from study of the language's conjugations and declensions, which can be quite intricate for heavily inflected languages. To parse a phrase such as 'man bites dog' involves noting that the singular noun 'man' is the subject of the sentence, the verb 'bites' is the third person singular of the present tense of the verb 'to bite', and the singular noun 'dog' is the object of the sentence. Techniques such as sentence diagrams are sometimes used to indicate relation between elements in the sentence.

Parsing was formerly central to the teaching of grammar throughout the English-speaking world, and widely regarded as basic to the use and understanding of written language. However the teaching of such techniques is no longer current.

Computational methods[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Main category: Natural language parsing
In some machine translation and natural language processing systems, written texts in human languages are parsed by computer programs[clarification needed]. Human sentences are not easily parsed by programs, as there is substantial ambiguity in the structure of human language, whose usage is to convey meaning (or semantics) amongst a potentially unlimited range of possibilities but only some of which are germane to the particular case. So an utterance "Man bites dog" versus "Dog bites man" is definite on one detail but in another language might appear as "Man dog bites" with a reliance on the larger context to distinguish between those two possibilities, if indeed that difference was of concern. It is difficult to prepare formal rules to describe informal behaviour even though it is clear that some rules are being followed.

In order to parse natural language data, researchers must first agree on the grammar to be used. The choice of syntax is affected by both linguistic and computational concerns; for instance some parsing systems use lexical functional grammar, but in general, parsing for grammars of this type is known to be NP-complete. Head-driven phrase structure grammar is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn Treebank. Shallow parsing aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is dependency grammar parsing.

Most modern parsers are at least partly statistical; that is, they rely on a corpus of training data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. (See machine learning.) Approaches which have been used include straightforward PCFGs (probabilistic context-free grammars), maximum entropy, and neural nets. Most of the more successful systems use lexical statistics (that is, they consider the identities of the words involved, as well as their part of speech). However such systems are vulnerable to overfitting and require some kind of smoothing to be effective.[citation needed]

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually designed grammars for programming languages. As mentioned earlier some grammar formalisms are very difficult to parse computationally; in general, even if the desired structure is not context-free, some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the CKY algorithm, usually with some heuristic to prune away unlikely analyses to save time. (See chart parsing.) However some systems trade speed for accuracy using, e.g., linear-time versions of the shift-reduce algorithm. A somewhat recent development has been parse reranking in which the parser proposes some large number of analyses, and a more complex system selects the best option.

Psycholinguistics[edit]
In psycholinguistics, parsing involves not just the assignment of words to categories, but the evaluation of the meaning of a sentence according to the rules of syntax drawn by inferences made from each word in the sentence. This normally occurs as words are being heard or read. Consequently, psycholinguistic models of parsing are of necessity incremental, meaning that they build up an interpretation as the sentence is being processed, which is normally expressed in terms of a partial syntactic structure. Creation of the wrong structure can lead to the phenomenon known as garden-pathing.

Computer languages[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Parser[edit]
A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure – giving a structural representation of the input, checking for correct syntax in the process. The parsing may be preceded or followed by other steps, or these may be combined into a single step. The parser is often preceded by a separate lexical analyser, which creates tokens from the sequence of input characters; alternatively, these can be combined in scannerless parsing. Parsers may be programmed by hand or may be automatically or semi-automatically generated by a parser generator. Parsing is complementary to templating, which produces formatted output. These may be applied to different domains, but often appear together, such as the scanf/printf pair, or the input (front end parsing) and output (back end code generation) stages of a compiler.

The input to a parser is often text in some computer language, but may also be text in a natural language or less structured textual data, in which case generally only certain parts of the text are extracted, rather than a parse tree being constructed. Parsers range from very simple functions such as scanf, to complex programs such as the frontend of a C++ compiler or the HTML parser of a web browser. An important class of simple parsing is done using regular expressions, where a regular expression defines a regular language, and then the regular expression engine automatically generates a parser for that language, allowing pattern matching and extraction of text. In other contexts regular expressions are instead used prior to parsing, as the lexing step whose output is then used by the parser.

The use of parsers varies by input. In the case of data languages, a parser is often found as the file reading facility of a program, such as reading in HTML or XML text; these examples are markup languages. In the case of programming languages, a parser is a component of a compiler or interpreter, which parses the source code of a computer programming language to create some form of internal representation; the parser is a key step in the compiler frontend. Programming languages tend to be specified in terms of a deterministic context-free grammar because fast and efficient parsers can be written for them. For compilers, the parsing itself can be done in one pass or multiple passes – see one-pass compiler and multi-pass compiler.

The implied disadvantages of a one-pass compiler can largely be overcome by adding fix-ups, where provision is made for fix-ups during the forward pass, and the fix-ups are applied backwards when the current program segment has been recognized as having been completed. An example where such a fix-up mechanism would be useful would be a forward GOTO statement, where the target of the GOTO is unknown until the program segment is completed. In this case, the application of the fix-up would be delayed until the target of the GOTO was recognized. Obviously, a backward GOTO does not require a fix-up.

Context-free grammars are limited in the extent to which they can express all of the requirements of a language. Informally, the reason is that the memory of such a language is limited. The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced. More powerful grammars that can express this constraint, however, cannot be parsed efficiently. Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out at the semantic analysis (contextual analysis) step.

For example, in Python the following is syntactically valid code:

x = 1
print(x)
The following code, however, is syntactically valid in terms of the context-free grammar, yielding a syntax tree with the same structure as the previous, but is syntactically invalid in terms of the context-sensitive grammar, which requires that variables be initialized before use:

x = 1
print(y)
Rather than being analyzed at the parsing stage, this is caught by checking the values in the syntax tree, hence as part of semantic analysis: context-sensitive syntax is in practice often more easily analyzed as semantics.

Overview of process[edit]
Flow of data in a typical parser
The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic.

The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions. For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^, 2, each of which is a meaningful symbol in the context of an arithmetic expression. The lexer would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like "12*" or "(3" will not be generated.

The next stage is parsing or syntactic analysis, which is checking that the tokens form an allowable expression. This is usually done with reference to a context-free grammar which recursively defines components that can make up an expression and the order in which they must appear. However, not all rules defining programming languages can be expressed by context-free grammars alone, for example type validity and proper declaration of identifiers. These rules can be formally expressed with attribute grammars.

The final phase is semantic parsing or analysis, which is working out the implications of the expression just validated and taking the appropriate action. In the case of a calculator or interpreter, the action is to evaluate the expression or program, a compiler, on the other hand, would generate some kind of code. Attribute grammars can also be used to define these actions.

Types of parsers[edit]
The task of the parser is essentially to determine if and how the input can be derived from the start symbol of the grammar. This can be done in essentially two ways:

Top-down parsing- Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse trees using a top-down expansion of the given formal grammar rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate ambiguity by expanding all alternative right-hand-sides of grammar rules.[4]
Bottom-up parsing - A parser can start with the input and attempt to rewrite it to the start symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on. LR parsers are examples of bottom-up parsers. Another term used for this type of parser is Shift-Reduce parsing.
LL parsers and recursive-descent parser are examples of top-down parsers which cannot accommodate left recursive production rules. Although it has been believed that simple implementations of top-down parsing cannot accommodate direct and indirect left-recursion and may require exponential time and space complexity while parsing ambiguous context-free grammars, more sophisticated algorithms for top-down parsing have been created by Frost, Hafiz, and Callaghan[5][6] which accommodate ambiguity and left recursion in polynomial time and which generate polynomial-size representations of the potentially exponential number of parse trees. Their algorithm is able to produce both left-most and right-most derivations of an input with regard to a given CFG (context-free grammar).

An important distinction with regard to parsers is whether a parser generates a leftmost derivation or a rightmost derivation (see context-free grammar). LL parsers will generate a leftmost derivation and LR parsers will generate a rightmost derivation (although usually in reverse).[4]

Types of parsers[edit]
Top-down parsers[edit]
Some of the parsers that use top-down parsing include:

Recursive descent parser
LL parser (Left-to-right, Leftmost derivation)
Earley parser
Bottom-up parsers[edit]
Some of the parsers that use bottom-up parsing include:

Precedence parser
Operator-precedence parser
Simple precedence parser
BC (bounded context) parsing
LR parser (Left-to-right, Rightmost derivation)
Simple LR (SLR) parser
LALR parser
Canonical LR (LR(1)) parser
GLR parser
CYK parser
Recursive ascent parser
Shift-Reduce Parser
Parser development software[edit]
Some of the well known parser development tools include the following. Also see comparison of parser generators.

ANTLR
Bison
Coco/R
GOLD
JavaCC
JParsec
Lemon
Lex
Parboiled
Parsec
ParseIT
Ragel
SHProto (FSM parser language)[7]
Spirit Parser Framework
Syntax Definition Formalism
SYNTAX
XPL
Yacc
Lookahead[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (April 2012)
Lookahead establishes the maximum incoming tokens that a parser can use to decide which rule it should use. Lookahead is especially relevant to LL, LR, and LALR parsers, where it is often explicitly indicated by affixing the lookahead to the algorithm name in parentheses, such as LALR(1).

Most programming languages, the primary target of parsers, are carefully defined in such a way that a parser with limited lookahead, typically one, can parse them, because parsers with limited lookahead are often more efficient. One important change[citation needed] to this trend came in 1990 when Terence Parr created ANTLR for his Ph.D. thesis, a parser generator for efficient LL(k) parsers, where k is any fixed value.

Parsers typically have only a few actions after seeing each token. They are shift (add this token to the stack for later reduction), reduce (pop tokens from the stack and form a syntactic construct), end, error (no known rule applies) or conflict (does not know whether to shift or reduce).

Lookahead has two advantages.

It helps the parser take the correct action in case of conflicts. For example, parsing the if statement in the case of an else clause.
It eliminates many duplicate states and eases the burden of an extra stack. A C language non-lookahead parser will have around 10,000 states. A lookahead parser will have around 300 states.
Example: Parsing the Expression 1 + 2 * 3

Set of expression parsing rules (called grammar) is as follows,
Rule1: E → E + E Expression is the sum of two expressions.
Rule2: E → E * E Expression is the product of two expressions.
Rule3: E → number Expression is a simple number
Rule4: + has less precedence than *
Most programming languages (except for a few such as APL and Smalltalk) and algebraic formulas give higher precedence to multiplication than addition, in which case the correct interpretation of the example above is (1 + (2*3)). Note that Rule4 above is a semantic rule. It is possible to rewrite the grammar to incorporate this into the syntax. However, not all such rules can be translated into syntax.

Simple non-lookahead parser actions
Initially Input = [1,+,2,*,3]

Shift "1" onto stack from input (in anticipation of rule3). Input = [+,2,*,3] Stack = [1]
Reduces "1" to expression "E" based on rule3. Stack = [E]
Shift "+" onto stack from input (in anticipation of rule1). Input = [2,*,3] Stack = [E,+]
Shift "2" onto stack from input (in anticipation of rule3). Input = [*,3] Stack = [E,+,2]
Reduce stack element "2" to Expression "E" based on rule3. Stack = [E,+,E]
Reduce stack items [E,+] and new input "E" to "E" based on rule1. Stack = [E]
Shift "*" onto stack from input (in anticipation of rule2). Input = [3] Stack = [E,*]
Shift "3" onto stack from input (in anticipation of rule3). Input = [] (empty) Stack = [E,*,3]
Reduce stack element "3" to expression "E" based on rule3. Stack = [E,*,E]
Reduce stack items [E,*] and new input "E" to "E" based on rule2. Stack = [E]
The parse tree and resulting code from it is not correct according to language semantics.

To correctly parse without lookahead, there are three solutions:

The user has to enclose expressions within parentheses. This often is not a viable solution.
The parser needs to have more logic to backtrack and retry whenever a rule is violated or not complete. The similar method is followed in LL parsers.
Alternatively, the parser or grammar needs to have extra logic to delay reduction and reduce only when it is absolutely sure which rule to reduce first. This method is used in LR parsers. This correctly parses the expression but with many more states and increased stack depth.
Lookahead parser actions
Shift 1 onto stack on input 1 in anticipation of rule3. It does not reduce immediately.
Reduce stack item 1 to simple Expression on input + based on rule3. The lookahead is +, so we are on path to E +, so we can reduce the stack to E.
Shift + onto stack on input + in anticipation of rule1.
Shift 2 onto stack on input 2 in anticipation of rule3.
Reduce stack item 2 to Expression on input * based on rule3. The lookahead * expects only E before it.
Now stack has E + E and still the input is *. It has two choices now, either to shift based on rule2 or reduction based on rule1. Since * has more precedence than + based on rule4, so shift * onto stack in anticipation of rule2.
Shift 3 onto stack on input 3 in anticipation of rule3.
Reduce stack item 3 to Expression after seeing end of input based on rule3.
Reduce stack items E * E to E based on rule2.
Reduce stack items E + E to E based on rule1.
The parse tree generated is correct and simply more efficient[citation needed] than non-lookahead parsers. This is the strategy followed in LALR parsers.

See also[edit]
Backtracking
Chart parser
Compiler-compiler
Deterministic parsing
Generating strings
Grammar checker
LALR parser
Lexing
Pratt parser
Shallow parsing
Left corner parser
Parsing expression grammar
ASF+SDF Meta Environment
DMS Software Reengineering Toolkit
Program transformation
Source code generation
References[edit]
Jump up ^ "Bartleby.com homepage". Retrieved 28 November 2010.
^ Jump up to: a b "parse". dictionary.reference.com. Retrieved 27 November 2010.
Jump up ^ "Grammar and Composition".
^ Jump up to: a b Aho, A.V., Sethi, R. and Ullman ,J.D. (1986) " Compilers: principles, techniques, and tools." Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2007) " Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars ." 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE , Pages: 109 - 120, June 2007, Prague.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2008) " Parser Combinators for Ambiguous Left-Recursive Grammars." 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN , Volume 4902/2008, Pages: 167 - 181, January 2008, San Francisco.
Jump up ^ shproto.org
Further reading[edit]
Chapman, Nigel P., LR Parsing: Theory and Practice, Cambridge University Press, 1987. ISBN 0-521-30413-X
Grune, Dick; Jacobs, Ceriel J.H., Parsing Techniques - A Practical Guide, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. Originally published by Ellis Horwood, Chichester, England, 1990; ISBN 0-13-651431-6
External links[edit]
Look up parse in Wiktionary, the free dictionary.
The Lemon LALR Parser Generator
Stanford Parser The Stanford Parser
Turin University Parser Natural language parser for the Italian, open source, developed in Common Lisp by Leonardo Lesmo, University of Torino, Italy.
Short history of parser construction
Categories: Algorithms on stringsCompiler constructionParsing
Navigation menu
Create accountLog inArticleTalkReadEditView history

Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Wikimedia Shop
Interaction
Help
About Wikipedia
Community portal
Recent changes
Contact page
Tools
What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Wikidata item
Cite this page
Print/export
Create a book
Download as PDF
Printable version
Languages
العربية
Bosanski
Català
Čeština
Dansk
Deutsch
Español
Euskara
فارسی
Français
한국어
Hrvatski
Bahasa Indonesia
Italiano
Қазақша
Magyar
Македонски
Nederlands
日本語
Polski
Português
Română
Русский
Simple English
Slovenčina
Српски / srpski
Srpskohrvatski / српскохрватски
Suomi
Svenska
தமிழ்
Українська
粵語
中文
Edit links
This page was last modified on 13 November 2014 at 19:28.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
Privacy policyAbout WikipediaDisclaimersContact WikipediaDevelopersMobile viewWikimedia Foundation Powered by MediaWiki

Asymmetric POSTer
Aug 17, 2005

pram posted:

Parsing
From Wikipedia, the free encyclopedia
"Parse" redirects here. For other uses, see Parse (disambiguation).
"Parser" redirects here. For the computer programming language, see Parser (CGI language).
Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).[1][2]

The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.

Within computational linguistics the term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information.

The term is also used in psycholinguistics when describing language comprehension. In this context, parsing refers to the way that human beings analyze a sentence or phrase (in spoken language or text) "in terms of grammatical constituents, identifying the parts of speech, syntactic relations, etc."[2] This term is especially common when discussing what linguistic cues help speakers to interpret garden-path sentences.

Within computer science, the term is used in the analysis of computer languages, referring to the syntactic analysis of the input code into its component parts in order to facilitate the writing of compilers and interpreters.

Contents [hide]
1 Human languages
1.1 Traditional methods
1.2 Computational methods
1.3 Psycholinguistics
2 Computer languages
2.1 Parser
2.2 Overview of process
3 Types of parsers
4 Types of parsers
4.1 Top-down parsers
4.2 Bottom-up parsers
4.3 Parser development software
5 Lookahead
6 See also
7 References
8 Further reading
9 External links
Human languages[edit]
Main category: Natural language parsing
Traditional methods[edit]
The traditional grammatical exercise of parsing, sometimes known as clause analysis, involves breaking down a text into its component parts of speech with an explanation of the form, function, and syntactic relationship of each part.[3] This is determined in large part from study of the language's conjugations and declensions, which can be quite intricate for heavily inflected languages. To parse a phrase such as 'man bites dog' involves noting that the singular noun 'man' is the subject of the sentence, the verb 'bites' is the third person singular of the present tense of the verb 'to bite', and the singular noun 'dog' is the object of the sentence. Techniques such as sentence diagrams are sometimes used to indicate relation between elements in the sentence.

Parsing was formerly central to the teaching of grammar throughout the English-speaking world, and widely regarded as basic to the use and understanding of written language. However the teaching of such techniques is no longer current.

Computational methods[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Main category: Natural language parsing
In some machine translation and natural language processing systems, written texts in human languages are parsed by computer programs[clarification needed]. Human sentences are not easily parsed by programs, as there is substantial ambiguity in the structure of human language, whose usage is to convey meaning (or semantics) amongst a potentially unlimited range of possibilities but only some of which are germane to the particular case. So an utterance "Man bites dog" versus "Dog bites man" is definite on one detail but in another language might appear as "Man dog bites" with a reliance on the larger context to distinguish between those two possibilities, if indeed that difference was of concern. It is difficult to prepare formal rules to describe informal behaviour even though it is clear that some rules are being followed.

In order to parse natural language data, researchers must first agree on the grammar to be used. The choice of syntax is affected by both linguistic and computational concerns; for instance some parsing systems use lexical functional grammar, but in general, parsing for grammars of this type is known to be NP-complete. Head-driven phrase structure grammar is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn Treebank. Shallow parsing aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is dependency grammar parsing.

Most modern parsers are at least partly statistical; that is, they rely on a corpus of training data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. (See machine learning.) Approaches which have been used include straightforward PCFGs (probabilistic context-free grammars), maximum entropy, and neural nets. Most of the more successful systems use lexical statistics (that is, they consider the identities of the words involved, as well as their part of speech). However such systems are vulnerable to overfitting and require some kind of smoothing to be effective.[citation needed]

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually designed grammars for programming languages. As mentioned earlier some grammar formalisms are very difficult to parse computationally; in general, even if the desired structure is not context-free, some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the CKY algorithm, usually with some heuristic to prune away unlikely analyses to save time. (See chart parsing.) However some systems trade speed for accuracy using, e.g., linear-time versions of the shift-reduce algorithm. A somewhat recent development has been parse reranking in which the parser proposes some large number of analyses, and a more complex system selects the best option.

Psycholinguistics[edit]
In psycholinguistics, parsing involves not just the assignment of words to categories, but the evaluation of the meaning of a sentence according to the rules of syntax drawn by inferences made from each word in the sentence. This normally occurs as words are being heard or read. Consequently, psycholinguistic models of parsing are of necessity incremental, meaning that they build up an interpretation as the sentence is being processed, which is normally expressed in terms of a partial syntactic structure. Creation of the wrong structure can lead to the phenomenon known as garden-pathing.

Computer languages[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Parser[edit]
A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure – giving a structural representation of the input, checking for correct syntax in the process. The parsing may be preceded or followed by other steps, or these may be combined into a single step. The parser is often preceded by a separate lexical analyser, which creates tokens from the sequence of input characters; alternatively, these can be combined in scannerless parsing. Parsers may be programmed by hand or may be automatically or semi-automatically generated by a parser generator. Parsing is complementary to templating, which produces formatted output. These may be applied to different domains, but often appear together, such as the scanf/printf pair, or the input (front end parsing) and output (back end code generation) stages of a compiler.

The input to a parser is often text in some computer language, but may also be text in a natural language or less structured textual data, in which case generally only certain parts of the text are extracted, rather than a parse tree being constructed. Parsers range from very simple functions such as scanf, to complex programs such as the frontend of a C++ compiler or the HTML parser of a web browser. An important class of simple parsing is done using regular expressions, where a regular expression defines a regular language, and then the regular expression engine automatically generates a parser for that language, allowing pattern matching and extraction of text. In other contexts regular expressions are instead used prior to parsing, as the lexing step whose output is then used by the parser.

The use of parsers varies by input. In the case of data languages, a parser is often found as the file reading facility of a program, such as reading in HTML or XML text; these examples are markup languages. In the case of programming languages, a parser is a component of a compiler or interpreter, which parses the source code of a computer programming language to create some form of internal representation; the parser is a key step in the compiler frontend. Programming languages tend to be specified in terms of a deterministic context-free grammar because fast and efficient parsers can be written for them. For compilers, the parsing itself can be done in one pass or multiple passes – see one-pass compiler and multi-pass compiler.

The implied disadvantages of a one-pass compiler can largely be overcome by adding fix-ups, where provision is made for fix-ups during the forward pass, and the fix-ups are applied backwards when the current program segment has been recognized as having been completed. An example where such a fix-up mechanism would be useful would be a forward GOTO statement, where the target of the GOTO is unknown until the program segment is completed. In this case, the application of the fix-up would be delayed until the target of the GOTO was recognized. Obviously, a backward GOTO does not require a fix-up.

Context-free grammars are limited in the extent to which they can express all of the requirements of a language. Informally, the reason is that the memory of such a language is limited. The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced. More powerful grammars that can express this constraint, however, cannot be parsed efficiently. Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out at the semantic analysis (contextual analysis) step.

For example, in Python the following is syntactically valid code:

x = 1
print(x)
The following code, however, is syntactically valid in terms of the context-free grammar, yielding a syntax tree with the same structure as the previous, but is syntactically invalid in terms of the context-sensitive grammar, which requires that variables be initialized before use:

x = 1
print(y)
Rather than being analyzed at the parsing stage, this is caught by checking the values in the syntax tree, hence as part of semantic analysis: context-sensitive syntax is in practice often more easily analyzed as semantics.

Overview of process[edit]
Flow of data in a typical parser
The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic.

The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions. For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^, 2, each of which is a meaningful symbol in the context of an arithmetic expression. The lexer would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like "12*" or "(3" will not be generated.

The next stage is parsing or syntactic analysis, which is checking that the tokens form an allowable expression. This is usually done with reference to a context-free grammar which recursively defines components that can make up an expression and the order in which they must appear. However, not all rules defining programming languages can be expressed by context-free grammars alone, for example type validity and proper declaration of identifiers. These rules can be formally expressed with attribute grammars.

The final phase is semantic parsing or analysis, which is working out the implications of the expression just validated and taking the appropriate action. In the case of a calculator or interpreter, the action is to evaluate the expression or program, a compiler, on the other hand, would generate some kind of code. Attribute grammars can also be used to define these actions.

Types of parsers[edit]
The task of the parser is essentially to determine if and how the input can be derived from the start symbol of the grammar. This can be done in essentially two ways:

Top-down parsing- Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse trees using a top-down expansion of the given formal grammar rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate ambiguity by expanding all alternative right-hand-sides of grammar rules.[4]
Bottom-up parsing - A parser can start with the input and attempt to rewrite it to the start symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on. LR parsers are examples of bottom-up parsers. Another term used for this type of parser is Shift-Reduce parsing.
LL parsers and recursive-descent parser are examples of top-down parsers which cannot accommodate left recursive production rules. Although it has been believed that simple implementations of top-down parsing cannot accommodate direct and indirect left-recursion and may require exponential time and space complexity while parsing ambiguous context-free grammars, more sophisticated algorithms for top-down parsing have been created by Frost, Hafiz, and Callaghan[5][6] which accommodate ambiguity and left recursion in polynomial time and which generate polynomial-size representations of the potentially exponential number of parse trees. Their algorithm is able to produce both left-most and right-most derivations of an input with regard to a given CFG (context-free grammar).

An important distinction with regard to parsers is whether a parser generates a leftmost derivation or a rightmost derivation (see context-free grammar). LL parsers will generate a leftmost derivation and LR parsers will generate a rightmost derivation (although usually in reverse).[4]

Types of parsers[edit]
Top-down parsers[edit]
Some of the parsers that use top-down parsing include:

Recursive descent parser
LL parser (Left-to-right, Leftmost derivation)
Earley parser
Bottom-up parsers[edit]
Some of the parsers that use bottom-up parsing include:

Precedence parser
Operator-precedence parser
Simple precedence parser
BC (bounded context) parsing
LR parser (Left-to-right, Rightmost derivation)
Simple LR (SLR) parser
LALR parser
Canonical LR (LR(1)) parser
GLR parser
CYK parser
Recursive ascent parser
Shift-Reduce Parser
Parser development software[edit]
Some of the well known parser development tools include the following. Also see comparison of parser generators.

ANTLR
Bison
Coco/R
GOLD
JavaCC
JParsec
Lemon
Lex
Parboiled
Parsec
ParseIT
Ragel
SHProto (FSM parser language)[7]
Spirit Parser Framework
Syntax Definition Formalism
SYNTAX
XPL
Yacc
Lookahead[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (April 2012)
Lookahead establishes the maximum incoming tokens that a parser can use to decide which rule it should use. Lookahead is especially relevant to LL, LR, and LALR parsers, where it is often explicitly indicated by affixing the lookahead to the algorithm name in parentheses, such as LALR(1).

Most programming languages, the primary target of parsers, are carefully defined in such a way that a parser with limited lookahead, typically one, can parse them, because parsers with limited lookahead are often more efficient. One important change[citation needed] to this trend came in 1990 when Terence Parr created ANTLR for his Ph.D. thesis, a parser generator for efficient LL(k) parsers, where k is any fixed value.

Parsers typically have only a few actions after seeing each token. They are shift (add this token to the stack for later reduction), reduce (pop tokens from the stack and form a syntactic construct), end, error (no known rule applies) or conflict (does not know whether to shift or reduce).

Lookahead has two advantages.

It helps the parser take the correct action in case of conflicts. For example, parsing the if statement in the case of an else clause.
It eliminates many duplicate states and eases the burden of an extra stack. A C language non-lookahead parser will have around 10,000 states. A lookahead parser will have around 300 states.
Example: Parsing the Expression 1 + 2 * 3

Set of expression parsing rules (called grammar) is as follows,
Rule1: E → E + E Expression is the sum of two expressions.
Rule2: E → E * E Expression is the product of two expressions.
Rule3: E → number Expression is a simple number
Rule4: + has less precedence than *
Most programming languages (except for a few such as APL and Smalltalk) and algebraic formulas give higher precedence to multiplication than addition, in which case the correct interpretation of the example above is (1 + (2*3)). Note that Rule4 above is a semantic rule. It is possible to rewrite the grammar to incorporate this into the syntax. However, not all such rules can be translated into syntax.

Simple non-lookahead parser actions
Initially Input = [1,+,2,*,3]

Shift "1" onto stack from input (in anticipation of rule3). Input = [+,2,*,3] Stack = [1]
Reduces "1" to expression "E" based on rule3. Stack = [E]
Shift "+" onto stack from input (in anticipation of rule1). Input = [2,*,3] Stack = [E,+]
Shift "2" onto stack from input (in anticipation of rule3). Input = [*,3] Stack = [E,+,2]
Reduce stack element "2" to Expression "E" based on rule3. Stack = [E,+,E]
Reduce stack items [E,+] and new input "E" to "E" based on rule1. Stack = [E]
Shift "*" onto stack from input (in anticipation of rule2). Input = [3] Stack = [E,*]
Shift "3" onto stack from input (in anticipation of rule3). Input = [] (empty) Stack = [E,*,3]
Reduce stack element "3" to expression "E" based on rule3. Stack = [E,*,E]
Reduce stack items [E,*] and new input "E" to "E" based on rule2. Stack = [E]
The parse tree and resulting code from it is not correct according to language semantics.

To correctly parse without lookahead, there are three solutions:

The user has to enclose expressions within parentheses. This often is not a viable solution.
The parser needs to have more logic to backtrack and retry whenever a rule is violated or not complete. The similar method is followed in LL parsers.
Alternatively, the parser or grammar needs to have extra logic to delay reduction and reduce only when it is absolutely sure which rule to reduce first. This method is used in LR parsers. This correctly parses the expression but with many more states and increased stack depth.
Lookahead parser actions
Shift 1 onto stack on input 1 in anticipation of rule3. It does not reduce immediately.
Reduce stack item 1 to simple Expression on input + based on rule3. The lookahead is +, so we are on path to E +, so we can reduce the stack to E.
Shift + onto stack on input + in anticipation of rule1.
Shift 2 onto stack on input 2 in anticipation of rule3.
Reduce stack item 2 to Expression on input * based on rule3. The lookahead * expects only E before it.
Now stack has E + E and still the input is *. It has two choices now, either to shift based on rule2 or reduction based on rule1. Since * has more precedence than + based on rule4, so shift * onto stack in anticipation of rule2.
Shift 3 onto stack on input 3 in anticipation of rule3.
Reduce stack item 3 to Expression after seeing end of input based on rule3.
Reduce stack items E * E to E based on rule2.
Reduce stack items E + E to E based on rule1.
The parse tree generated is correct and simply more efficient[citation needed] than non-lookahead parsers. This is the strategy followed in LALR parsers.

See also[edit]
Backtracking
Chart parser
Compiler-compiler
Deterministic parsing
Generating strings
Grammar checker
LALR parser
Lexing
Pratt parser
Shallow parsing
Left corner parser
Parsing expression grammar
ASF+SDF Meta Environment
DMS Software Reengineering Toolkit
Program transformation
Source code generation
References[edit]
Jump up ^ "Bartleby.com homepage". Retrieved 28 November 2010.
^ Jump up to: a b "parse". dictionary.reference.com. Retrieved 27 November 2010.
Jump up ^ "Grammar and Composition".
^ Jump up to: a b Aho, A.V., Sethi, R. and Ullman ,J.D. (1986) " Compilers: principles, techniques, and tools." Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2007) " Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars ." 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE , Pages: 109 - 120, June 2007, Prague.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2008) " Parser Combinators for Ambiguous Left-Recursive Grammars." 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN , Volume 4902/2008, Pages: 167 - 181, January 2008, San Francisco.
Jump up ^ shproto.org
Further reading[edit]
Chapman, Nigel P., LR Parsing: Theory and Practice, Cambridge University Press, 1987. ISBN 0-521-30413-X
Grune, Dick; Jacobs, Ceriel J.H., Parsing Techniques - A Practical Guide, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. Originally published by Ellis Horwood, Chichester, England, 1990; ISBN 0-13-651431-6
External links[edit]
Look up parse in Wiktionary, the free dictionary.
The Lemon LALR Parser Generator
Stanford Parser The Stanford Parser
Turin University Parser Natural language parser for the Italian, open source, developed in Common Lisp by Leonardo Lesmo, University of Torino, Italy.
Short history of parser construction
Categories: Algorithms on stringsCompiler constructionParsing
Navigation menu
Create accountLog inArticleTalkReadEditView history

Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Wikimedia Shop
Interaction
Help
About Wikipedia
Community portal
Recent changes
Contact page
Tools
What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Wikidata item
Cite this page
Print/export
Create a book
Download as PDF
Printable version
Languages
العربية
Bosanski
Català
Čeština
Dansk
Deutsch
Español
Euskara
فارسی
Français
한국어
Hrvatski
Bahasa Indonesia
Italiano
Қазақша
Magyar
Македонски
Nederlands
日本語
Polski
Português
Română
Русский
Simple English
Slovenčina
Српски / srpski
Srpskohrvatski / српскохрватски
Suomi
Svenska
தமிழ்
Українська
粵語
中文
Edit links
This page was last modified on 13 November 2014 at 19:28.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
Privacy policyAbout WikipediaDisclaimersContact WikipediaDevelopersMobile viewWikimedia Foundation Powered by MediaWiki

please dont fishmech in yospos

theadder
Dec 30, 2011


pram posted:

Parsing
From Wikipedia, the free encyclopedia
"Parse" redirects here. For other uses, see Parse (disambiguation).
"Parser" redirects here. For the computer programming language, see Parser (CGI language).
Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).[1][2]

The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.

Within computational linguistics the term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information.

The term is also used in psycholinguistics when describing language comprehension. In this context, parsing refers to the way that human beings analyze a sentence or phrase (in spoken language or text) "in terms of grammatical constituents, identifying the parts of speech, syntactic relations, etc."[2] This term is especially common when discussing what linguistic cues help speakers to interpret garden-path sentences.

Within computer science, the term is used in the analysis of computer languages, referring to the syntactic analysis of the input code into its component parts in order to facilitate the writing of compilers and interpreters.

Contents [hide]
1 Human languages
1.1 Traditional methods
1.2 Computational methods
1.3 Psycholinguistics
2 Computer languages
2.1 Parser
2.2 Overview of process
3 Types of parsers
4 Types of parsers
4.1 Top-down parsers
4.2 Bottom-up parsers
4.3 Parser development software
5 Lookahead
6 See also
7 References
8 Further reading
9 External links
Human languages[edit]
Main category: Natural language parsing
Traditional methods[edit]
The traditional grammatical exercise of parsing, sometimes known as clause analysis, involves breaking down a text into its component parts of speech with an explanation of the form, function, and syntactic relationship of each part.[3] This is determined in large part from study of the language's conjugations and declensions, which can be quite intricate for heavily inflected languages. To parse a phrase such as 'man bites dog' involves noting that the singular noun 'man' is the subject of the sentence, the verb 'bites' is the third person singular of the present tense of the verb 'to bite', and the singular noun 'dog' is the object of the sentence. Techniques such as sentence diagrams are sometimes used to indicate relation between elements in the sentence.

Parsing was formerly central to the teaching of grammar throughout the English-speaking world, and widely regarded as basic to the use and understanding of written language. However the teaching of such techniques is no longer current.

Computational methods[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Main category: Natural language parsing
In some machine translation and natural language processing systems, written texts in human languages are parsed by computer programs[clarification needed]. Human sentences are not easily parsed by programs, as there is substantial ambiguity in the structure of human language, whose usage is to convey meaning (or semantics) amongst a potentially unlimited range of possibilities but only some of which are germane to the particular case. So an utterance "Man bites dog" versus "Dog bites man" is definite on one detail but in another language might appear as "Man dog bites" with a reliance on the larger context to distinguish between those two possibilities, if indeed that difference was of concern. It is difficult to prepare formal rules to describe informal behaviour even though it is clear that some rules are being followed.

In order to parse natural language data, researchers must first agree on the grammar to be used. The choice of syntax is affected by both linguistic and computational concerns; for instance some parsing systems use lexical functional grammar, but in general, parsing for grammars of this type is known to be NP-complete. Head-driven phrase structure grammar is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn Treebank. Shallow parsing aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is dependency grammar parsing.

Most modern parsers are at least partly statistical; that is, they rely on a corpus of training data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. (See machine learning.) Approaches which have been used include straightforward PCFGs (probabilistic context-free grammars), maximum entropy, and neural nets. Most of the more successful systems use lexical statistics (that is, they consider the identities of the words involved, as well as their part of speech). However such systems are vulnerable to overfitting and require some kind of smoothing to be effective.[citation needed]

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually designed grammars for programming languages. As mentioned earlier some grammar formalisms are very difficult to parse computationally; in general, even if the desired structure is not context-free, some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the CKY algorithm, usually with some heuristic to prune away unlikely analyses to save time. (See chart parsing.) However some systems trade speed for accuracy using, e.g., linear-time versions of the shift-reduce algorithm. A somewhat recent development has been parse reranking in which the parser proposes some large number of analyses, and a more complex system selects the best option.

Psycholinguistics[edit]
In psycholinguistics, parsing involves not just the assignment of words to categories, but the evaluation of the meaning of a sentence according to the rules of syntax drawn by inferences made from each word in the sentence. This normally occurs as words are being heard or read. Consequently, psycholinguistic models of parsing are of necessity incremental, meaning that they build up an interpretation as the sentence is being processed, which is normally expressed in terms of a partial syntactic structure. Creation of the wrong structure can lead to the phenomenon known as garden-pathing.

Computer languages[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Parser[edit]
A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure – giving a structural representation of the input, checking for correct syntax in the process. The parsing may be preceded or followed by other steps, or these may be combined into a single step. The parser is often preceded by a separate lexical analyser, which creates tokens from the sequence of input characters; alternatively, these can be combined in scannerless parsing. Parsers may be programmed by hand or may be automatically or semi-automatically generated by a parser generator. Parsing is complementary to templating, which produces formatted output. These may be applied to different domains, but often appear together, such as the scanf/printf pair, or the input (front end parsing) and output (back end code generation) stages of a compiler.

The input to a parser is often text in some computer language, but may also be text in a natural language or less structured textual data, in which case generally only certain parts of the text are extracted, rather than a parse tree being constructed. Parsers range from very simple functions such as scanf, to complex programs such as the frontend of a C++ compiler or the HTML parser of a web browser. An important class of simple parsing is done using regular expressions, where a regular expression defines a regular language, and then the regular expression engine automatically generates a parser for that language, allowing pattern matching and extraction of text. In other contexts regular expressions are instead used prior to parsing, as the lexing step whose output is then used by the parser.

The use of parsers varies by input. In the case of data languages, a parser is often found as the file reading facility of a program, such as reading in HTML or XML text; these examples are markup languages. In the case of programming languages, a parser is a component of a compiler or interpreter, which parses the source code of a computer programming language to create some form of internal representation; the parser is a key step in the compiler frontend. Programming languages tend to be specified in terms of a deterministic context-free grammar because fast and efficient parsers can be written for them. For compilers, the parsing itself can be done in one pass or multiple passes – see one-pass compiler and multi-pass compiler.

The implied disadvantages of a one-pass compiler can largely be overcome by adding fix-ups, where provision is made for fix-ups during the forward pass, and the fix-ups are applied backwards when the current program segment has been recognized as having been completed. An example where such a fix-up mechanism would be useful would be a forward GOTO statement, where the target of the GOTO is unknown until the program segment is completed. In this case, the application of the fix-up would be delayed until the target of the GOTO was recognized. Obviously, a backward GOTO does not require a fix-up.

Context-free grammars are limited in the extent to which they can express all of the requirements of a language. Informally, the reason is that the memory of such a language is limited. The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced. More powerful grammars that can express this constraint, however, cannot be parsed efficiently. Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out at the semantic analysis (contextual analysis) step.

For example, in Python the following is syntactically valid code:

x = 1
print(x)
The following code, however, is syntactically valid in terms of the context-free grammar, yielding a syntax tree with the same structure as the previous, but is syntactically invalid in terms of the context-sensitive grammar, which requires that variables be initialized before use:

x = 1
print(y)
Rather than being analyzed at the parsing stage, this is caught by checking the values in the syntax tree, hence as part of semantic analysis: context-sensitive syntax is in practice often more easily analyzed as semantics.

Overview of process[edit]
Flow of data in a typical parser
The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic.

The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions. For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^, 2, each of which is a meaningful symbol in the context of an arithmetic expression. The lexer would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like "12*" or "(3" will not be generated.

The next stage is parsing or syntactic analysis, which is checking that the tokens form an allowable expression. This is usually done with reference to a context-free grammar which recursively defines components that can make up an expression and the order in which they must appear. However, not all rules defining programming languages can be expressed by context-free grammars alone, for example type validity and proper declaration of identifiers. These rules can be formally expressed with attribute grammars.

The final phase is semantic parsing or analysis, which is working out the implications of the expression just validated and taking the appropriate action. In the case of a calculator or interpreter, the action is to evaluate the expression or program, a compiler, on the other hand, would generate some kind of code. Attribute grammars can also be used to define these actions.

Types of parsers[edit]
The task of the parser is essentially to determine if and how the input can be derived from the start symbol of the grammar. This can be done in essentially two ways:

Top-down parsing- Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse trees using a top-down expansion of the given formal grammar rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate ambiguity by expanding all alternative right-hand-sides of grammar rules.[4]
Bottom-up parsing - A parser can start with the input and attempt to rewrite it to the start symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on. LR parsers are examples of bottom-up parsers. Another term used for this type of parser is Shift-Reduce parsing.
LL parsers and recursive-descent parser are examples of top-down parsers which cannot accommodate left recursive production rules. Although it has been believed that simple implementations of top-down parsing cannot accommodate direct and indirect left-recursion and may require exponential time and space complexity while parsing ambiguous context-free grammars, more sophisticated algorithms for top-down parsing have been created by Frost, Hafiz, and Callaghan[5][6] which accommodate ambiguity and left recursion in polynomial time and which generate polynomial-size representations of the potentially exponential number of parse trees. Their algorithm is able to produce both left-most and right-most derivations of an input with regard to a given CFG (context-free grammar).

An important distinction with regard to parsers is whether a parser generates a leftmost derivation or a rightmost derivation (see context-free grammar). LL parsers will generate a leftmost derivation and LR parsers will generate a rightmost derivation (although usually in reverse).[4]

Types of parsers[edit]
Top-down parsers[edit]
Some of the parsers that use top-down parsing include:

Recursive descent parser
LL parser (Left-to-right, Leftmost derivation)
Earley parser
Bottom-up parsers[edit]
Some of the parsers that use bottom-up parsing include:

Precedence parser
Operator-precedence parser
Simple precedence parser
BC (bounded context) parsing
LR parser (Left-to-right, Rightmost derivation)
Simple LR (SLR) parser
LALR parser
Canonical LR (LR(1)) parser
GLR parser
CYK parser
Recursive ascent parser
Shift-Reduce Parser
Parser development software[edit]
Some of the well known parser development tools include the following. Also see comparison of parser generators.

ANTLR
Bison
Coco/R
GOLD
JavaCC
JParsec
Lemon
Lex
Parboiled
Parsec
ParseIT
Ragel
SHProto (FSM parser language)[7]
Spirit Parser Framework
Syntax Definition Formalism
SYNTAX
XPL
Yacc
Lookahead[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (April 2012)
Lookahead establishes the maximum incoming tokens that a parser can use to decide which rule it should use. Lookahead is especially relevant to LL, LR, and LALR parsers, where it is often explicitly indicated by affixing the lookahead to the algorithm name in parentheses, such as LALR(1).

Most programming languages, the primary target of parsers, are carefully defined in such a way that a parser with limited lookahead, typically one, can parse them, because parsers with limited lookahead are often more efficient. One important change[citation needed] to this trend came in 1990 when Terence Parr created ANTLR for his Ph.D. thesis, a parser generator for efficient LL(k) parsers, where k is any fixed value.

Parsers typically have only a few actions after seeing each token. They are shift (add this token to the stack for later reduction), reduce (pop tokens from the stack and form a syntactic construct), end, error (no known rule applies) or conflict (does not know whether to shift or reduce).

Lookahead has two advantages.

It helps the parser take the correct action in case of conflicts. For example, parsing the if statement in the case of an else clause.
It eliminates many duplicate states and eases the burden of an extra stack. A C language non-lookahead parser will have around 10,000 states. A lookahead parser will have around 300 states.
Example: Parsing the Expression 1 + 2 * 3

Set of expression parsing rules (called grammar) is as follows,
Rule1: E → E + E Expression is the sum of two expressions.
Rule2: E → E * E Expression is the product of two expressions.
Rule3: E → number Expression is a simple number
Rule4: + has less precedence than *
Most programming languages (except for a few such as APL and Smalltalk) and algebraic formulas give higher precedence to multiplication than addition, in which case the correct interpretation of the example above is (1 + (2*3)). Note that Rule4 above is a semantic rule. It is possible to rewrite the grammar to incorporate this into the syntax. However, not all such rules can be translated into syntax.

Simple non-lookahead parser actions
Initially Input = [1,+,2,*,3]

Shift "1" onto stack from input (in anticipation of rule3). Input = [+,2,*,3] Stack = [1]
Reduces "1" to expression "E" based on rule3. Stack = [E]
Shift "+" onto stack from input (in anticipation of rule1). Input = [2,*,3] Stack = [E,+]
Shift "2" onto stack from input (in anticipation of rule3). Input = [*,3] Stack = [E,+,2]
Reduce stack element "2" to Expression "E" based on rule3. Stack = [E,+,E]
Reduce stack items [E,+] and new input "E" to "E" based on rule1. Stack = [E]
Shift "*" onto stack from input (in anticipation of rule2). Input = [3] Stack = [E,*]
Shift "3" onto stack from input (in anticipation of rule3). Input = [] (empty) Stack = [E,*,3]
Reduce stack element "3" to expression "E" based on rule3. Stack = [E,*,E]
Reduce stack items [E,*] and new input "E" to "E" based on rule2. Stack = [E]
The parse tree and resulting code from it is not correct according to language semantics.

To correctly parse without lookahead, there are three solutions:

The user has to enclose expressions within parentheses. This often is not a viable solution.
The parser needs to have more logic to backtrack and retry whenever a rule is violated or not complete. The similar method is followed in LL parsers.
Alternatively, the parser or grammar needs to have extra logic to delay reduction and reduce only when it is absolutely sure which rule to reduce first. This method is used in LR parsers. This correctly parses the expression but with many more states and increased stack depth.
Lookahead parser actions
Shift 1 onto stack on input 1 in anticipation of rule3. It does not reduce immediately.
Reduce stack item 1 to simple Expression on input + based on rule3. The lookahead is +, so we are on path to E +, so we can reduce the stack to E.
Shift + onto stack on input + in anticipation of rule1.
Shift 2 onto stack on input 2 in anticipation of rule3.
Reduce stack item 2 to Expression on input * based on rule3. The lookahead * expects only E before it.
Now stack has E + E and still the input is *. It has two choices now, either to shift based on rule2 or reduction based on rule1. Since * has more precedence than + based on rule4, so shift * onto stack in anticipation of rule2.
Shift 3 onto stack on input 3 in anticipation of rule3.
Reduce stack item 3 to Expression after seeing end of input based on rule3.
Reduce stack items E * E to E based on rule2.
Reduce stack items E + E to E based on rule1.
The parse tree generated is correct and simply more efficient[citation needed] than non-lookahead parsers. This is the strategy followed in LALR parsers.

See also[edit]
Backtracking
Chart parser
Compiler-compiler
Deterministic parsing
Generating strings
Grammar checker
LALR parser
Lexing
Pratt parser
Shallow parsing
Left corner parser
Parsing expression grammar
ASF+SDF Meta Environment
DMS Software Reengineering Toolkit
Program transformation
Source code generation
References[edit]
Jump up ^ "Bartleby.com homepage". Retrieved 28 November 2010.
^ Jump up to: a b "parse". dictionary.reference.com. Retrieved 27 November 2010.
Jump up ^ "Grammar and Composition".
^ Jump up to: a b Aho, A.V., Sethi, R. and Ullman ,J.D. (1986) " Compilers: principles, techniques, and tools." Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2007) " Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars ." 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE , Pages: 109 - 120, June 2007, Prague.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2008) " Parser Combinators for Ambiguous Left-Recursive Grammars." 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN , Volume 4902/2008, Pages: 167 - 181, January 2008, San Francisco.
Jump up ^ shproto.org
Further reading[edit]
Chapman, Nigel P., LR Parsing: Theory and Practice, Cambridge University Press, 1987. ISBN 0-521-30413-X
Grune, Dick; Jacobs, Ceriel J.H., Parsing Techniques - A Practical Guide, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. Originally published by Ellis Horwood, Chichester, England, 1990; ISBN 0-13-651431-6
External links[edit]
Look up parse in Wiktionary, the free dictionary.
The Lemon LALR Parser Generator
Stanford Parser The Stanford Parser
Turin University Parser Natural language parser for the Italian, open source, developed in Common Lisp by Leonardo Lesmo, University of Torino, Italy.
Short history of parser construction
Categories: Algorithms on stringsCompiler constructionParsing
Navigation menu
Create accountLog inArticleTalkReadEditView history

Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Wikimedia Shop
Interaction
Help
About Wikipedia
Community portal
Recent changes
Contact page
Tools
What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Wikidata item
Cite this page
Print/export
Create a book
Download as PDF
Printable version
Languages
العربية
Bosanski
Català
Čeština
Dansk
Deutsch
Español
Euskara
فارسی
Français
한국어
Hrvatski
Bahasa Indonesia
Italiano
Қазақша
Magyar
Македонски
Nederlands
日本語
Polski
Português
Română
Русский
Simple English
Slovenčina
Српски / srpski
Srpskohrvatski / српскохрватски
Suomi
Svenska
தமிழ்
Українська
粵語
中文
Edit links
This page was last modified on 13 November 2014 at 19:28.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
Privacy policyAbout WikipediaDisclaimersContact WikipediaDevelopersMobile viewWikimedia Foundation Powered by MediaWiki

carry on then
Jul 10, 2010

by VideoGames

(and can't post for 10 years!)

pram posted:

Parsing
From Wikipedia, the free encyclopedia
"Parse" redirects here. For other uses, see Parse (disambiguation).
"Parser" redirects here. For the computer programming language, see Parser (CGI language).
Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).[1][2]

The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.

Within computational linguistics the term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information.

The term is also used in psycholinguistics when describing language comprehension. In this context, parsing refers to the way that human beings analyze a sentence or phrase (in spoken language or text) "in terms of grammatical constituents, identifying the parts of speech, syntactic relations, etc."[2] This term is especially common when discussing what linguistic cues help speakers to interpret garden-path sentences.

Within computer science, the term is used in the analysis of computer languages, referring to the syntactic analysis of the input code into its component parts in order to facilitate the writing of compilers and interpreters.

Contents [hide]
1 Human languages
1.1 Traditional methods
1.2 Computational methods
1.3 Psycholinguistics
2 Computer languages
2.1 Parser
2.2 Overview of process
3 Types of parsers
4 Types of parsers
4.1 Top-down parsers
4.2 Bottom-up parsers
4.3 Parser development software
5 Lookahead
6 See also
7 References
8 Further reading
9 External links
Human languages[edit]
Main category: Natural language parsing
Traditional methods[edit]
The traditional grammatical exercise of parsing, sometimes known as clause analysis, involves breaking down a text into its component parts of speech with an explanation of the form, function, and syntactic relationship of each part.[3] This is determined in large part from study of the language's conjugations and declensions, which can be quite intricate for heavily inflected languages. To parse a phrase such as 'man bites dog' involves noting that the singular noun 'man' is the subject of the sentence, the verb 'bites' is the third person singular of the present tense of the verb 'to bite', and the singular noun 'dog' is the object of the sentence. Techniques such as sentence diagrams are sometimes used to indicate relation between elements in the sentence.

Parsing was formerly central to the teaching of grammar throughout the English-speaking world, and widely regarded as basic to the use and understanding of written language. However the teaching of such techniques is no longer current.

Computational methods[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Main category: Natural language parsing
In some machine translation and natural language processing systems, written texts in human languages are parsed by computer programs[clarification needed]. Human sentences are not easily parsed by programs, as there is substantial ambiguity in the structure of human language, whose usage is to convey meaning (or semantics) amongst a potentially unlimited range of possibilities but only some of which are germane to the particular case. So an utterance "Man bites dog" versus "Dog bites man" is definite on one detail but in another language might appear as "Man dog bites" with a reliance on the larger context to distinguish between those two possibilities, if indeed that difference was of concern. It is difficult to prepare formal rules to describe informal behaviour even though it is clear that some rules are being followed.

In order to parse natural language data, researchers must first agree on the grammar to be used. The choice of syntax is affected by both linguistic and computational concerns; for instance some parsing systems use lexical functional grammar, but in general, parsing for grammars of this type is known to be NP-complete. Head-driven phrase structure grammar is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn Treebank. Shallow parsing aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is dependency grammar parsing.

Most modern parsers are at least partly statistical; that is, they rely on a corpus of training data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. (See machine learning.) Approaches which have been used include straightforward PCFGs (probabilistic context-free grammars), maximum entropy, and neural nets. Most of the more successful systems use lexical statistics (that is, they consider the identities of the words involved, as well as their part of speech). However such systems are vulnerable to overfitting and require some kind of smoothing to be effective.[citation needed]

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually designed grammars for programming languages. As mentioned earlier some grammar formalisms are very difficult to parse computationally; in general, even if the desired structure is not context-free, some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the CKY algorithm, usually with some heuristic to prune away unlikely analyses to save time. (See chart parsing.) However some systems trade speed for accuracy using, e.g., linear-time versions of the shift-reduce algorithm. A somewhat recent development has been parse reranking in which the parser proposes some large number of analyses, and a more complex system selects the best option.

Psycholinguistics[edit]
In psycholinguistics, parsing involves not just the assignment of words to categories, but the evaluation of the meaning of a sentence according to the rules of syntax drawn by inferences made from each word in the sentence. This normally occurs as words are being heard or read. Consequently, psycholinguistic models of parsing are of necessity incremental, meaning that they build up an interpretation as the sentence is being processed, which is normally expressed in terms of a partial syntactic structure. Creation of the wrong structure can lead to the phenomenon known as garden-pathing.

Computer languages[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Parser[edit]
A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure – giving a structural representation of the input, checking for correct syntax in the process. The parsing may be preceded or followed by other steps, or these may be combined into a single step. The parser is often preceded by a separate lexical analyser, which creates tokens from the sequence of input characters; alternatively, these can be combined in scannerless parsing. Parsers may be programmed by hand or may be automatically or semi-automatically generated by a parser generator. Parsing is complementary to templating, which produces formatted output. These may be applied to different domains, but often appear together, such as the scanf/printf pair, or the input (front end parsing) and output (back end code generation) stages of a compiler.

The input to a parser is often text in some computer language, but may also be text in a natural language or less structured textual data, in which case generally only certain parts of the text are extracted, rather than a parse tree being constructed. Parsers range from very simple functions such as scanf, to complex programs such as the frontend of a C++ compiler or the HTML parser of a web browser. An important class of simple parsing is done using regular expressions, where a regular expression defines a regular language, and then the regular expression engine automatically generates a parser for that language, allowing pattern matching and extraction of text. In other contexts regular expressions are instead used prior to parsing, as the lexing step whose output is then used by the parser.

The use of parsers varies by input. In the case of data languages, a parser is often found as the file reading facility of a program, such as reading in HTML or XML text; these examples are markup languages. In the case of programming languages, a parser is a component of a compiler or interpreter, which parses the source code of a computer programming language to create some form of internal representation; the parser is a key step in the compiler frontend. Programming languages tend to be specified in terms of a deterministic context-free grammar because fast and efficient parsers can be written for them. For compilers, the parsing itself can be done in one pass or multiple passes – see one-pass compiler and multi-pass compiler.

The implied disadvantages of a one-pass compiler can largely be overcome by adding fix-ups, where provision is made for fix-ups during the forward pass, and the fix-ups are applied backwards when the current program segment has been recognized as having been completed. An example where such a fix-up mechanism would be useful would be a forward GOTO statement, where the target of the GOTO is unknown until the program segment is completed. In this case, the application of the fix-up would be delayed until the target of the GOTO was recognized. Obviously, a backward GOTO does not require a fix-up.

Context-free grammars are limited in the extent to which they can express all of the requirements of a language. Informally, the reason is that the memory of such a language is limited. The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced. More powerful grammars that can express this constraint, however, cannot be parsed efficiently. Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out at the semantic analysis (contextual analysis) step.

For example, in Python the following is syntactically valid code:

x = 1
print(x)
The following code, however, is syntactically valid in terms of the context-free grammar, yielding a syntax tree with the same structure as the previous, but is syntactically invalid in terms of the context-sensitive grammar, which requires that variables be initialized before use:

x = 1
print(y)
Rather than being analyzed at the parsing stage, this is caught by checking the values in the syntax tree, hence as part of semantic analysis: context-sensitive syntax is in practice often more easily analyzed as semantics.

Overview of process[edit]
Flow of data in a typical parser
The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic.

The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions. For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^, 2, each of which is a meaningful symbol in the context of an arithmetic expression. The lexer would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like "12*" or "(3" will not be generated.

The next stage is parsing or syntactic analysis, which is checking that the tokens form an allowable expression. This is usually done with reference to a context-free grammar which recursively defines components that can make up an expression and the order in which they must appear. However, not all rules defining programming languages can be expressed by context-free grammars alone, for example type validity and proper declaration of identifiers. These rules can be formally expressed with attribute grammars.

The final phase is semantic parsing or analysis, which is working out the implications of the expression just validated and taking the appropriate action. In the case of a calculator or interpreter, the action is to evaluate the expression or program, a compiler, on the other hand, would generate some kind of code. Attribute grammars can also be used to define these actions.

Types of parsers[edit]
The task of the parser is essentially to determine if and how the input can be derived from the start symbol of the grammar. This can be done in essentially two ways:

Top-down parsing- Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse trees using a top-down expansion of the given formal grammar rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate ambiguity by expanding all alternative right-hand-sides of grammar rules.[4]
Bottom-up parsing - A parser can start with the input and attempt to rewrite it to the start symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on. LR parsers are examples of bottom-up parsers. Another term used for this type of parser is Shift-Reduce parsing.
LL parsers and recursive-descent parser are examples of top-down parsers which cannot accommodate left recursive production rules. Although it has been believed that simple implementations of top-down parsing cannot accommodate direct and indirect left-recursion and may require exponential time and space complexity while parsing ambiguous context-free grammars, more sophisticated algorithms for top-down parsing have been created by Frost, Hafiz, and Callaghan[5][6] which accommodate ambiguity and left recursion in polynomial time and which generate polynomial-size representations of the potentially exponential number of parse trees. Their algorithm is able to produce both left-most and right-most derivations of an input with regard to a given CFG (context-free grammar).

An important distinction with regard to parsers is whether a parser generates a leftmost derivation or a rightmost derivation (see context-free grammar). LL parsers will generate a leftmost derivation and LR parsers will generate a rightmost derivation (although usually in reverse).[4]

Types of parsers[edit]
Top-down parsers[edit]
Some of the parsers that use top-down parsing include:

Recursive descent parser
LL parser (Left-to-right, Leftmost derivation)
Earley parser
Bottom-up parsers[edit]
Some of the parsers that use bottom-up parsing include:

Precedence parser
Operator-precedence parser
Simple precedence parser
BC (bounded context) parsing
LR parser (Left-to-right, Rightmost derivation)
Simple LR (SLR) parser
LALR parser
Canonical LR (LR(1)) parser
GLR parser
CYK parser
Recursive ascent parser
Shift-Reduce Parser
Parser development software[edit]
Some of the well known parser development tools include the following. Also see comparison of parser generators.

ANTLR
Bison
Coco/R
GOLD
JavaCC
JParsec
Lemon
Lex
Parboiled
Parsec
ParseIT
Ragel
SHProto (FSM parser language)[7]
Spirit Parser Framework
Syntax Definition Formalism
SYNTAX
XPL
Yacc
Lookahead[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (April 2012)
Lookahead establishes the maximum incoming tokens that a parser can use to decide which rule it should use. Lookahead is especially relevant to LL, LR, and LALR parsers, where it is often explicitly indicated by affixing the lookahead to the algorithm name in parentheses, such as LALR(1).

Most programming languages, the primary target of parsers, are carefully defined in such a way that a parser with limited lookahead, typically one, can parse them, because parsers with limited lookahead are often more efficient. One important change[citation needed] to this trend came in 1990 when Terence Parr created ANTLR for his Ph.D. thesis, a parser generator for efficient LL(k) parsers, where k is any fixed value.

Parsers typically have only a few actions after seeing each token. They are shift (add this token to the stack for later reduction), reduce (pop tokens from the stack and form a syntactic construct), end, error (no known rule applies) or conflict (does not know whether to shift or reduce).

Lookahead has two advantages.

It helps the parser take the correct action in case of conflicts. For example, parsing the if statement in the case of an else clause.
It eliminates many duplicate states and eases the burden of an extra stack. A C language non-lookahead parser will have around 10,000 states. A lookahead parser will have around 300 states.
Example: Parsing the Expression 1 + 2 * 3

Set of expression parsing rules (called grammar) is as follows,
Rule1: E → E + E Expression is the sum of two expressions.
Rule2: E → E * E Expression is the product of two expressions.
Rule3: E → number Expression is a simple number
Rule4: + has less precedence than *
Most programming languages (except for a few such as APL and Smalltalk) and algebraic formulas give higher precedence to multiplication than addition, in which case the correct interpretation of the example above is (1 + (2*3)). Note that Rule4 above is a semantic rule. It is possible to rewrite the grammar to incorporate this into the syntax. However, not all such rules can be translated into syntax.

Simple non-lookahead parser actions
Initially Input = [1,+,2,*,3]

Shift "1" onto stack from input (in anticipation of rule3). Input = [+,2,*,3] Stack = [1]
Reduces "1" to expression "E" based on rule3. Stack = [E]
Shift "+" onto stack from input (in anticipation of rule1). Input = [2,*,3] Stack = [E,+]
Shift "2" onto stack from input (in anticipation of rule3). Input = [*,3] Stack = [E,+,2]
Reduce stack element "2" to Expression "E" based on rule3. Stack = [E,+,E]
Reduce stack items [E,+] and new input "E" to "E" based on rule1. Stack = [E]
Shift "*" onto stack from input (in anticipation of rule2). Input = [3] Stack = [E,*]
Shift "3" onto stack from input (in anticipation of rule3). Input = [] (empty) Stack = [E,*,3]
Reduce stack element "3" to expression "E" based on rule3. Stack = [E,*,E]
Reduce stack items [E,*] and new input "E" to "E" based on rule2. Stack = [E]
The parse tree and resulting code from it is not correct according to language semantics.

To correctly parse without lookahead, there are three solutions:

The user has to enclose expressions within parentheses. This often is not a viable solution.
The parser needs to have more logic to backtrack and retry whenever a rule is violated or not complete. The similar method is followed in LL parsers.
Alternatively, the parser or grammar needs to have extra logic to delay reduction and reduce only when it is absolutely sure which rule to reduce first. This method is used in LR parsers. This correctly parses the expression but with many more states and increased stack depth.
Lookahead parser actions
Shift 1 onto stack on input 1 in anticipation of rule3. It does not reduce immediately.
Reduce stack item 1 to simple Expression on input + based on rule3. The lookahead is +, so we are on path to E +, so we can reduce the stack to E.
Shift + onto stack on input + in anticipation of rule1.
Shift 2 onto stack on input 2 in anticipation of rule3.
Reduce stack item 2 to Expression on input * based on rule3. The lookahead * expects only E before it.
Now stack has E + E and still the input is *. It has two choices now, either to shift based on rule2 or reduction based on rule1. Since * has more precedence than + based on rule4, so shift * onto stack in anticipation of rule2.
Shift 3 onto stack on input 3 in anticipation of rule3.
Reduce stack item 3 to Expression after seeing end of input based on rule3.
Reduce stack items E * E to E based on rule2.
Reduce stack items E + E to E based on rule1.
The parse tree generated is correct and simply more efficient[citation needed] than non-lookahead parsers. This is the strategy followed in LALR parsers.

See also[edit]
Backtracking
Chart parser
Compiler-compiler
Deterministic parsing
Generating strings
Grammar checker
LALR parser
Lexing
Pratt parser
Shallow parsing
Left corner parser
Parsing expression grammar
ASF+SDF Meta Environment
DMS Software Reengineering Toolkit
Program transformation
Source code generation
References[edit]
Jump up ^ "Bartleby.com homepage". Retrieved 28 November 2010.
^ Jump up to: a b "parse". dictionary.reference.com. Retrieved 27 November 2010.
Jump up ^ "Grammar and Composition".
^ Jump up to: a b Aho, A.V., Sethi, R. and Ullman ,J.D. (1986) " Compilers: principles, techniques, and tools." Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2007) " Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars ." 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE , Pages: 109 - 120, June 2007, Prague.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2008) " Parser Combinators for Ambiguous Left-Recursive Grammars." 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN , Volume 4902/2008, Pages: 167 - 181, January 2008, San Francisco.
Jump up ^ shproto.org
Further reading[edit]
Chapman, Nigel P., LR Parsing: Theory and Practice, Cambridge University Press, 1987. ISBN 0-521-30413-X
Grune, Dick; Jacobs, Ceriel J.H., Parsing Techniques - A Practical Guide, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. Originally published by Ellis Horwood, Chichester, England, 1990; ISBN 0-13-651431-6
External links[edit]
Look up parse in Wiktionary, the free dictionary.
The Lemon LALR Parser Generator
Stanford Parser The Stanford Parser
Turin University Parser Natural language parser for the Italian, open source, developed in Common Lisp by Leonardo Lesmo, University of Torino, Italy.
Short history of parser construction
Categories: Algorithms on stringsCompiler constructionParsing
Navigation menu
Create accountLog inArticleTalkReadEditView history

Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Wikimedia Shop
Interaction
Help
About Wikipedia
Community portal
Recent changes
Contact page
Tools
What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Wikidata item
Cite this page
Print/export
Create a book
Download as PDF
Printable version
Languages
العربية
Bosanski
Català
Čeština
Dansk
Deutsch
Español
Euskara
فارسی
Français
한국어
Hrvatski
Bahasa Indonesia
Italiano
Қазақша
Magyar
Македонски
Nederlands
日本語
Polski
Português
Română
Русский
Simple English
Slovenčina
Српски / srpski
Srpskohrvatski / српскохрватски
Suomi
Svenska
தமிழ்
Українська
粵語
中文
Edit links
This page was last modified on 13 November 2014 at 19:28.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
Privacy policyAbout WikipediaDisclaimersContact WikipediaDevelopersMobile viewWikimedia Foundation Powered by MediaWiki

i disagree

rotor
Jun 11, 2001

classic case of pineapple derangement syndrome

pram posted:

Parsing
From Wikipedia, the free encyclopedia
"Parse" redirects here. For other uses, see Parse (disambiguation).
"Parser" redirects here. For the computer programming language, see Parser (CGI language).
Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).[1][2]

The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.

Within computational linguistics the term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information.

The term is also used in psycholinguistics when describing language comprehension. In this context, parsing refers to the way that human beings analyze a sentence or phrase (in spoken language or text) "in terms of grammatical constituents, identifying the parts of speech, syntactic relations, etc."[2] This term is especially common when discussing what linguistic cues help speakers to interpret garden-path sentences.

Within computer science, the term is used in the analysis of computer languages, referring to the syntactic analysis of the input code into its component parts in order to facilitate the writing of compilers and interpreters.

Contents [hide]
1 Human languages
1.1 Traditional methods
1.2 Computational methods
1.3 Psycholinguistics
2 Computer languages
2.1 Parser
2.2 Overview of process
3 Types of parsers
4 Types of parsers
4.1 Top-down parsers
4.2 Bottom-up parsers
4.3 Parser development software
5 Lookahead
6 See also
7 References
8 Further reading
9 External links
Human languages[edit]
Main category: Natural language parsing
Traditional methods[edit]
The traditional grammatical exercise of parsing, sometimes known as clause analysis, involves breaking down a text into its component parts of speech with an explanation of the form, function, and syntactic relationship of each part.[3] This is determined in large part from study of the language's conjugations and declensions, which can be quite intricate for heavily inflected languages. To parse a phrase such as 'man bites dog' involves noting that the singular noun 'man' is the subject of the sentence, the verb 'bites' is the third person singular of the present tense of the verb 'to bite', and the singular noun 'dog' is the object of the sentence. Techniques such as sentence diagrams are sometimes used to indicate relation between elements in the sentence.

Parsing was formerly central to the teaching of grammar throughout the English-speaking world, and widely regarded as basic to the use and understanding of written language. However the teaching of such techniques is no longer current.

Computational methods[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Main category: Natural language parsing
In some machine translation and natural language processing systems, written texts in human languages are parsed by computer programs[clarification needed]. Human sentences are not easily parsed by programs, as there is substantial ambiguity in the structure of human language, whose usage is to convey meaning (or semantics) amongst a potentially unlimited range of possibilities but only some of which are germane to the particular case. So an utterance "Man bites dog" versus "Dog bites man" is definite on one detail but in another language might appear as "Man dog bites" with a reliance on the larger context to distinguish between those two possibilities, if indeed that difference was of concern. It is difficult to prepare formal rules to describe informal behaviour even though it is clear that some rules are being followed.

In order to parse natural language data, researchers must first agree on the grammar to be used. The choice of syntax is affected by both linguistic and computational concerns; for instance some parsing systems use lexical functional grammar, but in general, parsing for grammars of this type is known to be NP-complete. Head-driven phrase structure grammar is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn Treebank. Shallow parsing aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is dependency grammar parsing.

Most modern parsers are at least partly statistical; that is, they rely on a corpus of training data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. (See machine learning.) Approaches which have been used include straightforward PCFGs (probabilistic context-free grammars), maximum entropy, and neural nets. Most of the more successful systems use lexical statistics (that is, they consider the identities of the words involved, as well as their part of speech). However such systems are vulnerable to overfitting and require some kind of smoothing to be effective.[citation needed]

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually designed grammars for programming languages. As mentioned earlier some grammar formalisms are very difficult to parse computationally; in general, even if the desired structure is not context-free, some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the CKY algorithm, usually with some heuristic to prune away unlikely analyses to save time. (See chart parsing.) However some systems trade speed for accuracy using, e.g., linear-time versions of the shift-reduce algorithm. A somewhat recent development has been parse reranking in which the parser proposes some large number of analyses, and a more complex system selects the best option.

Psycholinguistics[edit]
In psycholinguistics, parsing involves not just the assignment of words to categories, but the evaluation of the meaning of a sentence according to the rules of syntax drawn by inferences made from each word in the sentence. This normally occurs as words are being heard or read. Consequently, psycholinguistic models of parsing are of necessity incremental, meaning that they build up an interpretation as the sentence is being processed, which is normally expressed in terms of a partial syntactic structure. Creation of the wrong structure can lead to the phenomenon known as garden-pathing.

Computer languages[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Parser[edit]
A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure – giving a structural representation of the input, checking for correct syntax in the process. The parsing may be preceded or followed by other steps, or these may be combined into a single step. The parser is often preceded by a separate lexical analyser, which creates tokens from the sequence of input characters; alternatively, these can be combined in scannerless parsing. Parsers may be programmed by hand or may be automatically or semi-automatically generated by a parser generator. Parsing is complementary to templating, which produces formatted output. These may be applied to different domains, but often appear together, such as the scanf/printf pair, or the input (front end parsing) and output (back end code generation) stages of a compiler.

The input to a parser is often text in some computer language, but may also be text in a natural language or less structured textual data, in which case generally only certain parts of the text are extracted, rather than a parse tree being constructed. Parsers range from very simple functions such as scanf, to complex programs such as the frontend of a C++ compiler or the HTML parser of a web browser. An important class of simple parsing is done using regular expressions, where a regular expression defines a regular language, and then the regular expression engine automatically generates a parser for that language, allowing pattern matching and extraction of text. In other contexts regular expressions are instead used prior to parsing, as the lexing step whose output is then used by the parser.

The use of parsers varies by input. In the case of data languages, a parser is often found as the file reading facility of a program, such as reading in HTML or XML text; these examples are markup languages. In the case of programming languages, a parser is a component of a compiler or interpreter, which parses the source code of a computer programming language to create some form of internal representation; the parser is a key step in the compiler frontend. Programming languages tend to be specified in terms of a deterministic context-free grammar because fast and efficient parsers can be written for them. For compilers, the parsing itself can be done in one pass or multiple passes – see one-pass compiler and multi-pass compiler.

The implied disadvantages of a one-pass compiler can largely be overcome by adding fix-ups, where provision is made for fix-ups during the forward pass, and the fix-ups are applied backwards when the current program segment has been recognized as having been completed. An example where such a fix-up mechanism would be useful would be a forward GOTO statement, where the target of the GOTO is unknown until the program segment is completed. In this case, the application of the fix-up would be delayed until the target of the GOTO was recognized. Obviously, a backward GOTO does not require a fix-up.

Context-free grammars are limited in the extent to which they can express all of the requirements of a language. Informally, the reason is that the memory of such a language is limited. The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced. More powerful grammars that can express this constraint, however, cannot be parsed efficiently. Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out at the semantic analysis (contextual analysis) step.

For example, in Python the following is syntactically valid code:

x = 1
print(x)
The following code, however, is syntactically valid in terms of the context-free grammar, yielding a syntax tree with the same structure as the previous, but is syntactically invalid in terms of the context-sensitive grammar, which requires that variables be initialized before use:

x = 1
print(y)
Rather than being analyzed at the parsing stage, this is caught by checking the values in the syntax tree, hence as part of semantic analysis: context-sensitive syntax is in practice often more easily analyzed as semantics.

Overview of process[edit]
Flow of data in a typical parser
The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic.

The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions. For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^, 2, each of which is a meaningful symbol in the context of an arithmetic expression. The lexer would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like "12*" or "(3" will not be generated.

The next stage is parsing or syntactic analysis, which is checking that the tokens form an allowable expression. This is usually done with reference to a context-free grammar which recursively defines components that can make up an expression and the order in which they must appear. However, not all rules defining programming languages can be expressed by context-free grammars alone, for example type validity and proper declaration of identifiers. These rules can be formally expressed with attribute grammars.

The final phase is semantic parsing or analysis, which is working out the implications of the expression just validated and taking the appropriate action. In the case of a calculator or interpreter, the action is to evaluate the expression or program, a compiler, on the other hand, would generate some kind of code. Attribute grammars can also be used to define these actions.

Types of parsers[edit]
The task of the parser is essentially to determine if and how the input can be derived from the start symbol of the grammar. This can be done in essentially two ways:

Top-down parsing- Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse trees using a top-down expansion of the given formal grammar rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate ambiguity by expanding all alternative right-hand-sides of grammar rules.[4]
Bottom-up parsing - A parser can start with the input and attempt to rewrite it to the start symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on. LR parsers are examples of bottom-up parsers. Another term used for this type of parser is Shift-Reduce parsing.
LL parsers and recursive-descent parser are examples of top-down parsers which cannot accommodate left recursive production rules. Although it has been believed that simple implementations of top-down parsing cannot accommodate direct and indirect left-recursion and may require exponential time and space complexity while parsing ambiguous context-free grammars, more sophisticated algorithms for top-down parsing have been created by Frost, Hafiz, and Callaghan[5][6] which accommodate ambiguity and left recursion in polynomial time and which generate polynomial-size representations of the potentially exponential number of parse trees. Their algorithm is able to produce both left-most and right-most derivations of an input with regard to a given CFG (context-free grammar).

An important distinction with regard to parsers is whether a parser generates a leftmost derivation or a rightmost derivation (see context-free grammar). LL parsers will generate a leftmost derivation and LR parsers will generate a rightmost derivation (although usually in reverse).[4]

Types of parsers[edit]
Top-down parsers[edit]
Some of the parsers that use top-down parsing include:

Recursive descent parser
LL parser (Left-to-right, Leftmost derivation)
Earley parser
Bottom-up parsers[edit]
Some of the parsers that use bottom-up parsing include:

Precedence parser
Operator-precedence parser
Simple precedence parser
BC (bounded context) parsing
LR parser (Left-to-right, Rightmost derivation)
Simple LR (SLR) parser
LALR parser
Canonical LR (LR(1)) parser
GLR parser
CYK parser
Recursive ascent parser
Shift-Reduce Parser
Parser development software[edit]
Some of the well known parser development tools include the following. Also see comparison of parser generators.

ANTLR
Bison
Coco/R
GOLD
JavaCC
JParsec
Lemon
Lex
Parboiled
Parsec
ParseIT
Ragel
SHProto (FSM parser language)[7]
Spirit Parser Framework
Syntax Definition Formalism
SYNTAX
XPL
Yacc
Lookahead[edit]
Question book-new.svg
This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (April 2012)
Lookahead establishes the maximum incoming tokens that a parser can use to decide which rule it should use. Lookahead is especially relevant to LL, LR, and LALR parsers, where it is often explicitly indicated by affixing the lookahead to the algorithm name in parentheses, such as LALR(1).

Most programming languages, the primary target of parsers, are carefully defined in such a way that a parser with limited lookahead, typically one, can parse them, because parsers with limited lookahead are often more efficient. One important change[citation needed] to this trend came in 1990 when Terence Parr created ANTLR for his Ph.D. thesis, a parser generator for efficient LL(k) parsers, where k is any fixed value.

Parsers typically have only a few actions after seeing each token. They are shift (add this token to the stack for later reduction), reduce (pop tokens from the stack and form a syntactic construct), end, error (no known rule applies) or conflict (does not know whether to shift or reduce).

Lookahead has two advantages.

It helps the parser take the correct action in case of conflicts. For example, parsing the if statement in the case of an else clause.
It eliminates many duplicate states and eases the burden of an extra stack. A C language non-lookahead parser will have around 10,000 states. A lookahead parser will have around 300 states.
Example: Parsing the Expression 1 + 2 * 3

Set of expression parsing rules (called grammar) is as follows,
Rule1: E → E + E Expression is the sum of two expressions.
Rule2: E → E * E Expression is the product of two expressions.
Rule3: E → number Expression is a simple number
Rule4: + has less precedence than *
Most programming languages (except for a few such as APL and Smalltalk) and algebraic formulas give higher precedence to multiplication than addition, in which case the correct interpretation of the example above is (1 + (2*3)). Note that Rule4 above is a semantic rule. It is possible to rewrite the grammar to incorporate this into the syntax. However, not all such rules can be translated into syntax.

Simple non-lookahead parser actions
Initially Input = [1,+,2,*,3]

Shift "1" onto stack from input (in anticipation of rule3). Input = [+,2,*,3] Stack = [1]
Reduces "1" to expression "E" based on rule3. Stack = [E]
Shift "+" onto stack from input (in anticipation of rule1). Input = [2,*,3] Stack = [E,+]
Shift "2" onto stack from input (in anticipation of rule3). Input = [*,3] Stack = [E,+,2]
Reduce stack element "2" to Expression "E" based on rule3. Stack = [E,+,E]
Reduce stack items [E,+] and new input "E" to "E" based on rule1. Stack = [E]
Shift "*" onto stack from input (in anticipation of rule2). Input = [3] Stack = [E,*]
Shift "3" onto stack from input (in anticipation of rule3). Input = [] (empty) Stack = [E,*,3]
Reduce stack element "3" to expression "E" based on rule3. Stack = [E,*,E]
Reduce stack items [E,*] and new input "E" to "E" based on rule2. Stack = [E]
The parse tree and resulting code from it is not correct according to language semantics.

To correctly parse without lookahead, there are three solutions:

The user has to enclose expressions within parentheses. This often is not a viable solution.
The parser needs to have more logic to backtrack and retry whenever a rule is violated or not complete. The similar method is followed in LL parsers.
Alternatively, the parser or grammar needs to have extra logic to delay reduction and reduce only when it is absolutely sure which rule to reduce first. This method is used in LR parsers. This correctly parses the expression but with many more states and increased stack depth.
Lookahead parser actions
Shift 1 onto stack on input 1 in anticipation of rule3. It does not reduce immediately.
Reduce stack item 1 to simple Expression on input + based on rule3. The lookahead is +, so we are on path to E +, so we can reduce the stack to E.
Shift + onto stack on input + in anticipation of rule1.
Shift 2 onto stack on input 2 in anticipation of rule3.
Reduce stack item 2 to Expression on input * based on rule3. The lookahead * expects only E before it.
Now stack has E + E and still the input is *. It has two choices now, either to shift based on rule2 or reduction based on rule1. Since * has more precedence than + based on rule4, so shift * onto stack in anticipation of rule2.
Shift 3 onto stack on input 3 in anticipation of rule3.
Reduce stack item 3 to Expression after seeing end of input based on rule3.
Reduce stack items E * E to E based on rule2.
Reduce stack items E + E to E based on rule1.
The parse tree generated is correct and simply more efficient[citation needed] than non-lookahead parsers. This is the strategy followed in LALR parsers.

See also[edit]
Backtracking
Chart parser
Compiler-compiler
Deterministic parsing
Generating strings
Grammar checker
LALR parser
Lexing
Pratt parser
Shallow parsing
Left corner parser
Parsing expression grammar
ASF+SDF Meta Environment
DMS Software Reengineering Toolkit
Program transformation
Source code generation
References[edit]
Jump up ^ "Bartleby.com homepage". Retrieved 28 November 2010.
^ Jump up to: a b "parse". dictionary.reference.com. Retrieved 27 November 2010.
Jump up ^ "Grammar and Composition".
^ Jump up to: a b Aho, A.V., Sethi, R. and Ullman ,J.D. (1986) " Compilers: principles, techniques, and tools." Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2007) " Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars ." 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE , Pages: 109 - 120, June 2007, Prague.
Jump up ^ Frost, R., Hafiz, R. and Callaghan, P. (2008) " Parser Combinators for Ambiguous Left-Recursive Grammars." 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN , Volume 4902/2008, Pages: 167 - 181, January 2008, San Francisco.
Jump up ^ shproto.org
Further reading[edit]
Chapman, Nigel P., LR Parsing: Theory and Practice, Cambridge University Press, 1987. ISBN 0-521-30413-X
Grune, Dick; Jacobs, Ceriel J.H., Parsing Techniques - A Practical Guide, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. Originally published by Ellis Horwood, Chichester, England, 1990; ISBN 0-13-651431-6
External links[edit]
Look up parse in Wiktionary, the free dictionary.
The Lemon LALR Parser Generator
Stanford Parser The Stanford Parser
Turin University Parser Natural language parser for the Italian, open source, developed in Common Lisp by Leonardo Lesmo, University of Torino, Italy.
Short history of parser construction
Categories: Algorithms on stringsCompiler constructionParsing
Navigation menu
Create accountLog inArticleTalkReadEditView history

Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Wikimedia Shop
Interaction
Help
About Wikipedia
Community portal
Recent changes
Contact page
Tools
What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Wikidata item
Cite this page
Print/export
Create a book
Download as PDF
Printable version
Languages
العربية
Bosanski
Català
Čeština
Dansk
Deutsch
Español
Euskara
فارسی
Français
한국어
Hrvatski
Bahasa Indonesia
Italiano
Қазақша
Magyar
Македонски
Nederlands
日本語
Polski
Português
Română
Русский
Simple English
Slovenčina
Српски / srpski
Srpskohrvatski / српскохрватски
Suomi
Svenska
தமிழ்
Українська
粵語
中文
Edit links
This page was last modified on 13 November 2014 at 19:28.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
Privacy policyAbout WikipediaDisclaimersContact WikipediaDevelopersMobile viewWikimedia Foundation Powered by MediaWiki

it always makes me think of parsely

~Coxy
Dec 9, 2003

R.I.P. Inter-OS Sass - b.2000AD d.2003AD
try using SSRS, op

bespoke artisinal VBA

jony ive aces
Jun 14, 2012

designer of the lomarf car


Buglord
*uses computers like some kind of idiot*

black man 3
Oct 29, 2014

by XyloJW

jony ive aces posted:

*uses computers like some kind of idiot*

*posts on yospos*

Farmer Crack-Ass
Jan 2, 2001

this is me posting irl

pram posted:

Parsing
From Wikipedia, the free computer programming language

This section does not cite any references or sources. Please help improve this section by adding citations to reliable "Man bites dog" versus "Dog bites man" sources. Unsourced material may be one pass or multiple passes challenged and removed. (Febuary 2103)
Main category: Natural Latin pars (orationis), meaning "kissing on the mouth"


"Parse" redirects here. For psycholinguistics other uses, see Parse (disambiguation).
"Parser" redirects here. For the encyclopedia, see Parser (CGI language).

Parsing was formerly central to the teaching of central banking throughout the English-speaking world, and widely regarded as basic to the teaching of such techniques is no longer current.

For example, in Python the following is syntactically valid code:

Contents [hide]
1 Human bodies
1.1 Traditional bodies
1.2 Computational bodies
1.3 Psycholinguistics
2 Computer blood
2.1 blood
2.nblod
3 Typebloods of parsers
blood4 Types of parsers
4bloodsers
blood
blood
blood
6 See also blood
blood
8 Further reading
9 External links
Human sentence diagrams[edit]
Main category: Natural elements sentence parsing


Parsing or syntactic analysis is the process of analyzing a string of computational linguistics , either in natural language or in computer languages, conforming to the rules of a formal grammar. The term parsing comes from a part (of speech).[1][2]

To parse a phrase such as 'man bites dog' involves noting that the 'man' is 'to bite', the verb 'bites' the third person singular of the 'dog' subject of the sentence, in another language might appear as "Man dog bites" is the present tense of the verb the Penn Treebank and the singular noun Question book is the object of the sentence. Techniques such as dependency corpus training are sometimes used to indicate machine relation between unlimited written texts in the (contextual analysis) step.

The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.

Within the term is used to refer to the term is also used in symbols when describing language comprehension formal analysis by a computer of a sentence or other string of words into its constituents, resulting in identifying the parts of speech, syntactic relations, a parse tree showing their grammatical constituents to each syntactic relation which may also contain the syntactic analysis of the input semantic and other computer science.

The . In this context, parsing refers to the way that human beings analyze a sentence or phrase (in spoken language or text) "in terms of , etc."[2] This term is especially common when discussing what linguistic cues help speakers to interpret garden-path sentences.

Within , the term is used in the analysis of computer languages, referring to code into its component parts in order to facilitate the writing of compilers and interpreters.


Traditional methods[edit]
The traditional grammatical exercise of parsing, sometimes known as clause analysis, involves breaking down a text into its component parts of speech with an explanation of the form, function, and syntactic relationship of each part.[3] This is determined in large part from study of the language's conjugations and declensions, which can be quite intricate for heavily inflected languages.



singular noun
languages
Computational methods[edit]
-new.svg
language
In some translation and natural language processing systems, in human languages are parsed by computer programs[clarification needed]. Human sentences are not easily parsed by programs, as there is substantial ambiguity in the structure of human language, whose usage is to convey meaning (or semantics) amongst a potentially range of possibilities but only some of which are germane to the particular case. So an utterance is definite on one detail but with a reliance on the larger context to distinguish between those two possibilities, if indeed that difference was of concern. It is difficult to prepare formal rules to describe informal behaviour even though it is clear that some rules are being followed.

In order to parse natural language data, researchers must first agree on the grammar to be used. The choice of syntax is affected by both linguistic and computational concerns; for instance some parsing systems use lexical functional grammar, but in general, parsing for grammars of this type is known to be NP-complete. Head-driven phrase structure grammar is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in . Shallow parsing aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is grammar parsing.

Most modern parsers are at least partly statistical; that is, they rely on a of data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. (See machine learning.) Approaches which have been used include straightforward PCFGs (probabilistic context-free grammars), maximum entropy, and neural nets. Most of the more successful systems use lexical statistics (that is, they consider the identities of the words involved, as well as their part of speech). However such systems are vulnerable to overfitting and require some kind of smoothing to be effective.[citation needed]

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually designed grammars for programming languages. As mentioned earlier some grammar formalisms are very difficult to parse computationally; in general, even if the desired structure is not context-free, some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the CKY algorithm, usually with some heuristic to prune away unlikely analyses to save time. (See chart parsing.) However some systems trade speed for accuracy using, e.g., linear-time versions of the shift-reduce algorithm. A somewhat recent development has been parse reranking in which the parser proposes some large number of analyses, and a more complex system selects the best option.

Psycholinguistics[edit]
In psycholinguistics, parsing involves not just the assignment of words to categories, but the evaluation of the meaning of a sentence according to the rules of syntax drawn by inferences made from each word in the sentence. This normally occurs as words are being heard or read. Consequently, psycholinguistic models of parsing are of necessity incremental, meaning that they build up an interpretation as the sentence is being processed, which is normally expressed in terms of a partial syntactic structure. Creation of the wrong structure can lead to the phenomenon known as garden-pathing.

Computer languages[edit]
x = 1
print(x)
x = 1
print(y)
Question book-new.svg
This section does not cite any Parsersor sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2013)
Parser[edit]
A parser is a software component that takes

̢T̨h̷e ̶rtaìn͏ ́ṕar̕ts̷ ́of͝ t̀he͏ ̨text ͘a͏r̴e͟ ̡e͝xt̀ra̶c̡t̢ȩd͟,͡ ̡r̛ather̸ ̶t͜h̕an ͞a͟ ͢p͞ar̕se͜ tr̕e̸e be̛i͏ng ço̡nst͝r͜uct̕ed. ͢P̢ar͢sers͝ ͜r̀ańge̢ ̡f́ŗom ͏ve̡r͏y s̛i͠mpl̨e f̛unction̡s̕ su͝ch as scan҉f,͝ to͞ ̷c͝omp̛lex p̀r͏ograms̴ such ͢a̡s̸ ̧th͞e f̧r̶o̕n̕teņd ͡of a̡ C+͠+̡ ̧c͜o̶mp̛ile̷r̴ ͢or ̸the̷ HT̨M̨L̷ ̸p̕ar̸se̡r͟ ̧o̸f ̷a͜ ͡web b̀ro̕ẁser. An͞ impor̨t̸a͘n̡t cla͠ss ͡o͏f si͠mp̀l̛s ͜th̨e ͏so҉u͜rc͢e code̷ of̛ a comp͠u͟ţe̷r ̛pro̕gram̢m̴in̸g͢ la̴ng̵ua̶g͏ȩ to create sòme ̀f̷or̶m ͢o̢f̶ ͟i̵nterna̕l ͢r͡epŗe͠s͜eńt͞a̧t̴io̴n͞; ̀th̕e̕ p̷ar͠se͡r ͠is̛ a ͟k͡e͢y͞ ̷s̕t̴e̸p in t͟h͠e҉ ̡c͜omp͟il͟e̡ŕ fr͜o̧nt̨e̴nd̸.̷ P͝r̨o̡g̨r̸aḿming la͜ngùa̶ges̸ tén͟d̴ t͡o͡ ̕b͡e ̨s͝pe҉cif̕ièd in ͟t̵ermś o͠f a ͘det̕erministic̶ ͞c̕o͟ntext-̀free g̛ra͏m̸ma͏r͟ b̛ecàúsé f͞a̷st ̧a͝ńd̡ eff҉ic͘ie͞n͡t ͜p͘aŕs͟e͏rs̕ ͟c̢a̷n ̨be ẃrit͝te̢n̵ ̷fo͟ŗ ͏th͘em.̢ ҉Fo͘r͜ ̸co͡m҉p͠ile̷rs̀, t́h̕e͞ ͘p͜a̛r̴sín̴g͏ ͏it͠sel̡f ćan҉e̷ p̡ar̛s̸ing̶ ҉is̢ ͝don͞e̷ u͜s̴ing͠ ̵r̸ég͝u̷ląr̴ ́e͘x̶pres͜s̴io͏n̢s, ̷whe͠re ̕a͟ reg͢ùla͟ŗ e̶x҉pre͟ss͟ion ͜d̷e͘f̵in͢e̸ş a̡ re͞gul̴ar ̸ĺa̧n͟gua͝ge̢,̢ ͞a̛n̢d the͢n t̡h̷e̶ ̀ręgu͡lar ͝expr͘e͡ssi͜on e̢ng̛i͢ne ͏autóm̕at̴i͢c̶alĺy͜ ̷gèn͜ęra̧t͝es a ͘p̨arse̡r͠ fo̢r t͏ha͝t̡ ́ĺan̷guage, ̨a̵ll҉o͠wi͠ng p̛at̷t͠ern͞ ̵ma͟t̶c̶hing͢ a͏n͢d͢ ̶ex͜t̨ŕac̨ti̴on͡ o͡f̡ t̸e̛xt̢. I̕n ̴o͟t́h͝ér ͟c͘o͠nte͡x̷t̸s͜ ͠r̢e̴g͞ular͢ e̵xpr͢ess͢i̧o̡ns͜ ar͢e inst́ead̀ use͠d ̷pr̸io̕r͏ to ҉paŗs͘i̧ng, ͡as t̢he̵ ͟lex̕i̕n̡ģ ͢s͘tep̸ ̛whos͡e͡ ͟ou͜t͘p̷ut ͘iş th̕en us̷ed̨ by͞ ̶the p͞ar̨se̢r.̧


maniacdevnull
Apr 18, 2007

FOUR CUBIC FRAMES
DISPROVES SOFT G GOD
YOU ARE EDUCATED STUPID

pram posted:

if your parser isnt some horrible chimera of xml and java then lol

yeah, haha!

haha!

ha



:(

~Coxy
Dec 9, 2003

R.I.P. Inter-OS Sass - b.2000AD d.2003AD
excuse me this thread is about reporting

i once wrote a report which had a multiselect with 3 items in in

depending on which combination of items you selected, one of 6 tablix controls would be visible and the rest invisible

gently caress the next guy who has to change the data displayed in that report lol

The Management
Jan 2, 2010

sup, bitch?
the most vexing parser

triple sulk
Sep 17, 2014



Ruby got Railed posted:

i dont use sql i use crystal reports with sql

get me outta this wack rear end crystal reports prison

Adbot
ADBOT LOVES YOU

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

lmfao if you have any idea what any of your business things are doings

  • Locked thread