Haskell Programming from first principles - aka haskellbook
"misses" a simple, yet hopefully insightful exercise about Alternative
(introduced in Parser Combinators chapter). I have discovered this one, while working on a real world project:
Exercise Write a function with following signature:
count' :: Alternative f
=> Int -- ^ minimum count
-> Int -- ^ maximum count
-> f a
-> f [a]
f
could be a regular expression RE
from regex-applicative
package, in that case count' n m re
should work as (re){n,m}
in a traditional regexp syntax. In other words accept n
to m
occurences of re
.
lambda> "a" =~ count' 2 3 (sym 'a')
Nothing
lambda> "aa" =~ count' 2 3 (sym 'a')
Just "aa"
lambda> "aaa" =~ count' 2 3 (sym 'a')
Just "aaa"
lambda> "aaaa" =~ count' 2 3 (sym 'a')
Nothing
lambda> "ab" =~ count' 2 3 (sym 'a')
Nothing
IF the exercise feels to hard to crack directly, there are few notes, hints and also few follow-ups.
If n < m
, e.g. count' 2 1
, you may return empty
, or accept exactly 2
values. FWIW: JavaScript throws an exception: Invalid regular expression: /^a{3,2}$/: numbers out of order in {}
, let's not do that.
Also in the following sub-exercises, treat negative numbers properly, i.e. write
foo n | n < 0 = ... -- not n == 0, or foo 0
| otherwise = ... foo (n - 1)
this will save you from the infinite recursion when negative numbers are passed in.
The definition of many
and some
in base
library are slightly complicated:
-- | One or more.
some :: f a -> f [a]
some v = some_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
-- | Zero or more.
many :: f a -> f [a]
many v = many_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
Write many
as a one-liner, using <$>
, <*>
and <|>
.
Note: <$>
and <*>
bind tighter than <|>
, so you can write expressions like fun <$> with <*> parsers <|> foo <$> bar <*> quux
, which is grouped as (fun <$> with <*> parsers) <|> foo <$> bar <*> quux)
.
A small step towards the goal is to write simpler function first:
count :: Alternative f
=> Int -- ^ exact count
-> f a
-> f [a]
That should accept the exact amount of parses. count n re = count' n n re
≈ (re){n}
in traditional regex notation.
lambda> "aaa" =~ count 3 (sym 'a')
Just "aaa"
Another variant with the same type-signature (!):
upto :: Alternative f
=> Int -- ^ up to this amount
-> f a
-> f [a]
This should accept upto some amount of parses. upto n re = count' 0 n re
≈ (re){n}
in traditional regex notation.
Now, when you have defined count
and upto
, you can define count'
as simply as:
count' n m p = (++) <$> count n p <*> upto (m - n) p
GHC is smart, but not super-smart. Let's help it. Inline definition of count
, removing the use of (++) from above definition.
Virtually every parser combinator library is left-biased, in other word the left hand side of <|>
is preferred. Explain what's the behavioral difference between
pure [] <|> some p
some p <|> pure []
Analyse your count'
, count
and upto
definitions, and try examples above with your backtracking parser-combinator library of choice (e.g. trifecta
). Do they work as expected?
There are also other Alternative
s than parsers. For example []
and STM
. Explain to your collegue what count'
does for []
and STM
. If there are other interesting Alternative
s, where count'
does something neat, please tell me! (e.g. via Twitter).
Fortunately, parser-combinators
package by Mark Karpov provides count
and count'
, so you could use them in your code, and check the source for one way of solving this exercise.