My grammar contains expressions characterized by an identifier that only optionally is followed by a parenthesized list of expressions.
My problem is that Fparsec will show an “unintuitive” position on syntax errors when the error occurs in a nested expression.
For an example, consider this simple parser
let x = skipChar 'x' .>> spaces >>% Node.X
let y = skipChar 'y' .>> spaces >>% Node.Y
let c = skipChar ',' .>> spaces
let left = pchar '(' .>> spaces
let right = pchar ')' .>> spaces
let expr, exprRef = createParserForwardedToRef()
let paramList, paramListRef = createParserForwardedToRef()
let paramTuple = left >>. paramList .>> right
let xOrY = choice [x ; y]
let exprWithArgs = xOrY .>>. paramTuple |>> Node.Expr
exprRef.Value <- choice [ attempt exprWithArgs ; xOrY ]
paramListRef.Value <- sepBy1 expr c
let parser = expr .>> eof
let result = run parser "x(y, y, y(x, y, y(x,y)) )"
printf "\nParsing correct:\n%O" result
let resultWithError = run parser "x(y, y, y(x, y, y(x,z)) )"
printf "\n\nParsing error:\n%O" resultWithError
The output of the program is:
Parsing correct:
Success: Expr (X, [Y; Y; Expr (Y, [X; Y; Expr (Y, [X; Y])])])
Parsing error:
Failure:
Error in Ln: 1 Col: 2
x(y, y, y(x, y, y(x,z)) )
^
Expecting: end of input
The more nested expressions I have, the more difficult it becomes to find the actual erroneous position.
Is there a way to change my grammar (or to use/configure) FParsec in such a way that the error becomes better to find? I wished, I would get an intuitive error message like this:
Parsing correct:
Success: Expr (X, [Y; Y; Expr (Y, [X; Y; Expr (Y, [X; Y])])])
Parsing error:
Failure:
Error in Ln: 1 Col: 21
x(y, y, y(x, y, y(x,z)) )
^
Expecting: 'x' or 'y'
Let’s use "y(x,z)"
as an example input. The problem occurs in the expr
/exprRef
parser. First, it attempts to parse exprWithArgs
, which fails because of the z
. The parser backtracks (because of attempt
), and tries to parse xOrY
instead, which succeeds on the y
. But the next char is (
instead of eof
, so parser
then fails at that position with an unhelpful message.
Since exprWithArgs
starts by parsing xOrY
, we can factor that out, leading to this solution:
exprRef.Value <-
parse {
let! xy = xOrY
match! opt paramTuple with
| Some args -> return Node.Expr (xy, args)
| None -> return xy
}
This has the advantage of not backtracking, so the parser always moves forward until it can’t proceed further, resulting in a clearer error message. Note that exprWithArgs
is no longer used.
If you don’t like the computation builder, you can do this instead:
exprRef.Value <-
pipe2
xOrY
(opt paramTuple)
(fun xy argsOpt ->
match argsOpt with
| Some args -> Node.Expr (xy, args)
| None -> xy)
Output is now:
Parsing correct:
Success: Expr (X, [Y; Y; Expr (Y, [X; Y; Expr (Y, [X; Y])])])
Parsing error:
Failure:
Error in Ln: 1 Col: 21
x(y, y, y(x, y, y(x,z)) )
^
Expecting: 'x' or 'y'
There might be simpler solutions as well, but I think this does what you want.