Oleg's gists https://oleg.fi/gists/atom.xml Oleg Grenrus oleg.grenrus@iki.fi 2021-01-08T00:00:00Z Indexed optics dilemma https://oleg.fi/gists/posts/2021-01-08-indexed-optics-dilemma.html 2021-01-08T00:00:00Z 2021-01-08T00:00:00Z Posted on 2021-01-08 by Oleg Grenrus lens, optics

Indexed optics are occasionally very useful. They generalize mapWithKey-like operations found for various containers.

iover (imapped % _2) :: (i -> b -> c) ->  Map i (a, b) -> Map i (a, c)

Indexed lens are constructed with ilens combinator.

ilens :: (s -> (i, a)) -> (s -> b -> t) -> IxLens i s t a b

It is implicit that the getter and the (indexed) setter part have to satisfy usual lens laws.

However there are problematic combinators, e.g. indices:

indices :: (Is k A_Traversal, is HasSingleIndex i)
=> (i -> Bool) -> Optic k is s t a a -> IxTraversal i s t a a

An example usage is

>>> toListOf (itraversed %& indices even) "foobar"
"foa"

If we combine ilens and indices we get a nasty thing:

\p -> indices p (ilens (\a -> (a,a)) (\_ b ->  b))
:: (i -> Bool) -> IxTraversal i i i i i

That is (almost) the type of unsafeFiltered, which has a warning sign:

Note: This is not a legal Traversal., unless you are very careful not to invalidate the predicate on the target.

However, neither indices nor ilens have warning attached.

There should be an additional indexed lens law(s).

My proposal is to require that indices and values are independent, which for indexed lens can be checked by following equation:

Whatever you put in, you cannot change the index.

fst (iview l s) ≡ fst (iview l (iover l f s))

where l :: IxLens i s t a b, s :: s, f :: i -> a -> b.

This law is generalisable to other optic kinds. For traversals replace fst and iview with map fst and itoList. For setters it is harder to specify, but the idea is the same.

Similarly we can talk about indexed prisms or even isomorphisms. The independence requirement would mean that that index have to be boring (i.e. isomorphic to ()), thus there isn't any additional power.

However sometimes violating laws might be justified, (e.g. when we quotient types would made program correct, but we don't have them in Haskell).

This new law doesn't prohibit having duplicate indices in a traversal.

This observation also extends in to TraversableWithIndex. As far as I can tell, all instances satisfy the above requirement (of indices being independent of values). Should we make that (IMHO natural) assumption explicit?

]]>
Benchmarks of discrimination package https://oleg.fi/gists/posts/2021-01-07-discrimination-benchmarks.html 2021-01-07T00:00:00Z 2021-01-07T00:00:00Z Posted on 2021-01-07 by Oleg Grenrus

I originally posted these as a Twitter thread. Its point is to illustrate that constant factors matter, not only the whether it is , or (though quadratic is quite bad quite soon).

I have been playing with discrimination package.

It offers linear-time grouping and sorting. Data.List.nub vs Data.Discrimination.nub chart is fun to look at. (Look at the x-axis: size of input).

Don't use Data.List.nub. There is ordNub (in many librairies, e.g. in Cabal). And indeed, it is a good alternative. Data.Discrimination.nub is still faster, when n is large enough (around hundred thousands for Word64 I use in these benchmarks). hashNub :: (Eq a, Hashable a) => [a] -> [a]

performs well too: All four variants on the same graph. Even for small n, nub is bad. ordNub and Discrimination.nub are about the same. hashNub is fastest.

(I have to find something better to play with than Word64) UUID is slightly more intestesting type.

data UUID = UUID !Word64 !Word64
deriving (Eq, Ord, Generic, Show, NFData, Hashable, Grouping, Sorting)

Same pattern: hashNub is the fastest, ordNub becomes slower then Discrimination.nub when there are enough elements.  Because it is hard to see, we can try loglog plot. Everything looks linear there, but we can see crossover points better. It turns out that Data.List.sort is quite good, at least if your lists are less than a million in length. Comparison with Data.Discrimination.sort: (I sort UUIDs, for more fun). Making a vector from a list, sorting it (using vector-algorithms) and converting back to list seems to be a good option too, (we only copy pointers, copying million pointers is ... cheap). Something weird happens on GHC-9.0 though. discrimination has the same performance, yet vector based degrades. Yet, @ekmett thinks there is still plenty of opportunity to make discrimination faster. In the meantime, I'll add (a variant of) these benchmarks to the repository.

]]>
Coindexed optics https://oleg.fi/gists/posts/2021-01-04-coindexed-optics.html 2021-01-04T00:00:00Z 2021-01-04T00:00:00Z Posted on 2021-01-04 by Oleg Grenrus lens, optics

The term coindexed optics is sometimes brought up. But what are they? One interpretation is optics with error reporting, i.e. which can tell why e.g. Prism didn’t match1. For some time I started to dislike that interpretation. It doesn’t feel right.

Recently I run into documentation of witherable. There is Wither, which is like a lens, but not quite. I think that is closer to what coindexed optics could be. (However, there are plenty arrows to flip, and you may flip others).

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE EmptyCase #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeOperators #-}

The import list is shorter, Data.SOP is the only module (from sop-core) which is not in base library.

import Control.Applicative (liftA2)
import Data.Kind           (Type)
import Data.Bifunctor      (Bifunctor (first))
import Data.Char           (toUpper)
import Data.Coerce         (coerce)
import Data.SOP            (NP (..), NS (..), I (..))

import qualified Data.Maybe as L (mapMaybe)

I will use a variant of profunctor encoding of optics. The plan is to show

• ordinary, unindexed optics;

• then indexed optics;

• and finally coindexed optics.

### Ordinary, unindexed optics

The profunctor encoding of optics is relatively simple. However, instead of ordinary profunctors, we will use a variant with additional type level list argument. This is similar to indexed-profunctors, though there the type list is curried. Currying works well for indexed optics, but complicates coindexed story.

type Optic p (is :: [Type]) (js :: [Type]) s t a b = p is a b -> p js s t

To not make this post unnecessarily long, we will use only a subset of profunctor hierarchy: Profunctor and Mapping for now. Profunctor is used to encode isomorphisms (isos), and Mapping is used to encode setters.

class Profunctor (p :: [Type] -> Type -> Type -> Type) where
dimap :: (a -> b) -> (c -> d) -> p is b c -> p is a d

class Profunctor p => Mapping p where
roam :: ((a -> b) -> s -> t) -> p is a b -> p is s t

A go to example of a setter is mapped. With mapped we can set, or map over, elements in a Functor.

mapped :: (Mapping p, Functor f)
=> Optic p is is (f a) (f b) a b
mapped = roam fmap

To implement the over operation we need a concrete profunctor. If we used ordinary profunctors, the function arrow would do. In this setup we need a newtype to adjust the kind:

newtype FunArrow (is :: [Type]) a b =
FunArrow { runFunArrow :: a  -> b }

Instance implementations are straight forward:

instance Profunctor FunArrow where
dimap f g (FunArrow k) = FunArrow (g . k . f)

instance Mapping FunArrow where
roam f (FunArrow k) = FunArrow (f k)

Using FunArrow we can implement over. over uses a setter to map over focused elements in a bigger structure. Note, we allow any index type-lists. We (or rather FunArrow) simply ignore them.

over :: Optic FunArrow is js s t a b
-> (a -> b)
-> (s -> t)
over o f = runFunArrow (o (FunArrow f))

Some examples, to show it works what we have written so far. over mapped is a complicated way to say fmap:

-- "FOOBAR"
example01 :: String
example01 = over mapped toUpper "foobar"

Optics compose. over (mapped . mapped) maps over two composed functors:

-- ["FOOBAR","XYZZY"]
example02 :: [String]
example02 = over (mapped . mapped) toUpper ["foobar", "xyzzy"]

This was a brief refresher of profunctor optics. It is "standard", except that we added an additional type-level list argument to the profunctors.

We will next use that type-level list to implement indexed optics.

### Indexed optics

Indexed optics let us set, map, traverse etc. using an additional index. The operation we want to generalize with optics is provided by few classes, like FunctorWithIndex:

class Functor f => FunctorWithIndex i f | f -> i where
imap :: (i -> a -> b) -> f a -> f b

imap operation is sometimes called mapWithKey (for example in containers library).

Ordinary lists are indexed with integers. Map k v is indexed with k. We will use lists in the examples, so let us define an instance:

instance FunctorWithIndex Int [] where
imap f = zipWith f [0..]

Next, we need to make that available in optics framework. New functionality means new profunctor type class. Note how an indexed combinator conses the index to the list.

class Mapping p => IMapping p where
iroam :: ((i -> a -> b) -> s -> t) -> p (i ': is) a b -> p is s t

Using iroam and imap we can define imapped, which is an example of indexed setter.

imapped :: (FunctorWithIndex i f, IMapping p)
=> p (i ': is) a b -> p is (f a) (f b)
imapped = iroam imap

Here, we should note that FunArrow can be given an instance of IMapping. We simply ignore the index argument.

instance IMapping FunArrow where
iroam f (FunArrow k) = FunArrow (f (\_ -> k))

That allows us to use imapped instead of mapped. (both optics and lens libraries have tricks to make that efficient).

-- ["FOOBAR","XYZZY"]
example03 :: [String]
example03 = over (mapped . imapped) toUpper ["foobar", "xyzzy"]

To actually use indices, we need new concrete profunctor. The IxFunArrow takes a heterogeneous list, NP I (n-ary product), of indices in addition to the element as an argument of an arrow.

newtype IxFunArrow is a b =
IxFunArrow { runIxFunArrow :: (NP I is, a) -> b }

The IxFunArrow instances are similar to FunArrow ones, they involve just a bit of additional plumbing.

instance Profunctor IxFunArrow where
dimap f g (IxFunArrow k) = IxFunArrow (g . k . fmap f)

instance Mapping IxFunArrow where
roam f (IxFunArrow k) = IxFunArrow (\(is, s) -> f (\a -> k (is, a)) s)

IMapping instance is the most interesting. As the argument provides an additional index i, it is consed to the the list of existing indices.

instance IMapping IxFunArrow where
iroam f (IxFunArrow k) = IxFunArrow $\(is, s) -> f (\i a -> k (I i :* is, a)) s As I already mentioned, indexed-profunctors uses curried variant, so the index list is implicitly encoded in uncurried form i1 -> i2 -> .... That is clever, but hides the point. Next, the indexed over. The general variant takes an optic with any indices list. gen_iover :: Optic IxFunArrow is '[] s t a b -> ((NP I is, a) -> b) -> s -> t gen_iover o f s = runIxFunArrow (o (IxFunArrow f)) (Nil, s) Usually we use the single-index variant, iover: iover :: Optic IxFunArrow '[i] '[] s t a b -> (i -> a -> b) -> s -> t iover o f = gen_iover o (\(I i :* Nil, a) -> f i a) We can also define the double-index variant, iover2 (and so on). iover2 :: Optic IxFunArrow '[i,j] '[] s t a b -> (i -> j -> a -> b) -> (s -> t) iover2 o f = gen_iover o (\(I i :* I j :* Nil, a) -> f i j a) Lets see what we can do with indexed setters. For example we can upper case every odd character in the string: -- "fOoBaR" example04 :: String example04 = iover imapped (\i a -> if odd i then toUpper a else a) "foobar" In nested case, we have access to all indices: -- ["fOoBaR","XyZzY","uNoRdErEd-cOnTaInErS"] example05 :: [String] example05 = iover2 (imapped . imapped) (\i j a -> if odd (i + j) then toUpper a else a) ["foobar", "xyzzy", "unordered-containers"] We don’t need to index at each step, e.g. we can index only at the top level: -- ["foobar","XYZZY","unordered-containers"] example06 :: [String] example06 = iover (imapped . mapped) (\i a -> if odd i then toUpper a else a) ["foobar", "xyzzy", "unordered-containers"] Indexed optics are occasionally very useful. We can provide extra information in indices, which would otherwise not fit into optical frameworks. ## Coindexes The indexed optics from previous sections can be flipped to be coindexed ones. As I mentioned in the introduction, I got an idea at looking at witherable. package. witherable provides (among many things) a useful type-class, in a simplified form: class Functor f => Filterable f where mapMaybe :: (a -> Maybe b) -> f a -> f b It is however too simple. (Hah!). The connection to indexed optics is easier to see using an Either variant: class Functor f => FunctorWithCoindex j f | f -> j where jmap :: (a -> Either j b) -> f a -> f b We’ll also need a Traversable variant (Witherable in witherable): class (Traversable f, FunctorWithCoindex j f) => TraversableWithCoindex j f | f -> j where jtraverse :: Applicative m => (a -> m (Either j b)) -> f a -> m (f b) Instances for list are not complicated. The coindex of list is a unit (). instance FunctorWithCoindex () [] where jmap f = L.mapMaybe (either (const Nothing) Just . f) instance TraversableWithCoindex () [] where jtraverse _ [] = pure [] jtraverse f (x:xs) = liftA2 g (f x) (jtraverse f xs) where g (Left ()) ys = ys g (Right y) ys = y : ys With "boring" coindex, like the unit, we can recover mapMaybe: mapMaybe' :: FunctorWithCoindex () f => (a -> Maybe b) -> f a -> f b mapMaybe' f = jmap (maybe (Left ()) Right . f) With TraversableWithCoindex class, doing the same tricks as previously with indexed optics, we get coindexed optics. Easy. I didn’t manage to get JMapping (a coindexed mapping) to work, so I’ll use JTraversing. We abuse the index list for coindices. class Profunctor p => Traversing p where wander :: (forall f. Applicative f => (a -> f b) -> s -> f t) -> p js a b -> p js s t class Traversing p => JTraversing p where jwander :: (forall f. Applicative f => (a -> f (Either j b)) -> s -> f t) -> p (j : js) a b -> p js s t Using JTraversing we can define our first coindexed optic. traversed :: (Traversable f, Traversing p) => p js a b -> p js (f a) (f b) traversed = wander traverse jtraversed :: (TraversableWithCoindex j f, JTraversing p) => p (j : js) a b -> p js (f a) (f b) jtraversed = jwander jtraverse To make use of it we once again need a concrete profunctor. newtype CoixFunArrow js a b = CoixFunArrow { runCoixFunArrow :: a -> Either (NS I js) b } instance Profunctor CoixFunArrow where dimap f g (CoixFunArrow p) = CoixFunArrow (fmap g . p . f) instance Traversing CoixFunArrow where wander f (CoixFunArrow p) = CoixFunArrow$ f p

instance JTraversing CoixFunArrow where
jwander f (CoixFunArrow p) = CoixFunArrow $f (plumb . p) where plumb :: Either (NS I (j : js)) b -> Either (NS I js) (Either j b) plumb (Right x) = Right (Right x) plumb (Left (Z (I y))) = Right (Left y) plumb (Left (S z)) = Left z Interestingly, Traversing CoixFunArrow instance looks like Mapping FunArrow, and it seems to be impossible to write Mapping IxFunArrow. Anyway, next we define a coindexed over, which I unimaginatively call jover. Like in the previous section, I start with a generic version first. gen_jover :: Optic CoixFunArrow is '[] s t a b -> (a -> Either (NS I is) b) -> s -> t gen_jover o f s = either nsAbsurd id$ runCoixFunArrow (o (CoixFunArrow f)) s

jover
:: Optic CoixFunArrow '[i] '[] s t a b
-> (a -> Either i b) -> s -> t
jover o f = gen_jover o (first (Z . I) . f)

jover2
:: Optic CoixFunArrow '[i,j] '[] s t a b
-> (a -> Either (Either i j) b)
-> s -> t
jover2 o f = gen_jover o (first plumb . f) where
plumb (Left i)  = Z (I i)
plumb (Right j) = S (Z (I j)) 

And now the most fun: the coindexed optics examples. First we can recover the mapMaybe behavior:

-- ["foobar"]
example07 :: [String]
example07 = jover jtraversed
(\s -> if length s > 5 then Right s else Left ())
["foobar", "xyzzy"]

And because we have separate coindexes in the type level list, we can filter on the different levels of the structure! If we find a character 'y', we skip the whole word, otherwise we skip all vowels.

-- ["fbr","nrdrd-cntnrs"]
example08 :: [String]
example08 = jover2 (jtraversed . jtraversed)
predicate
["foobar", "xyzzy", "unordered-containers"]
where
predicate 'y'           = Left (Right ())  -- skip word
predicate c | isVowel c = Left (Left ())   -- skip character
predicate c             = Right c

isVowel :: Char -> Bool
isVowel c = elem c ['a','o','u','i','e']

Note, the coindex doesn’t need to mean filtering. For example, consider the following type:

newtype JList j a = JList { unJList :: [Either j a] }
deriving (Functor, Foldable, Traversable, Show)

It’s not Filterable, but it can write a FunctorWithCoindex instance:

instance FunctorWithCoindex j (JList j) where
jmap f (JList xs) = JList (map (>>= f) xs) where

instance TraversableWithCoindex j (JList j) where
jtraverse f (JList xs) = fmap JList (go xs) where
go []             = pure []
go (Left j : ys)  = fmap (Left j :) (go ys)
go (Right x : ys) = liftA2 (:) (f x) (go ys)

Using JList we can do different things. In this example, we return why elements didn’t match, but that information is returned embedded inside the structure itself. We "filter" long strings:

jlist :: [a] -> JList j a
jlist = JList . map Right

ex_jlist_a :: JList Int String
ex_jlist_a = jlist ["foobar", "xyzzy", "unordered-containers"]

-- JList {unJList = [Left 6,Right "xyzzy",Left 20]}
example09 :: JList Int String
example09 = jover jtraversed
(\s -> let l = length s in if l > 5 then Left l else Right s)
ex_jlist_a

Similarly we can filter, or rather "change structure", on different levels, and these levels can have different coindices:

ex_jlist_b :: JList Int (JList Bool Char)
ex_jlist_b = fmap jlist ex_jlist_a

example88b :: JList Int (JList Bool Char)
example88b = jover2
(jtraversed . jtraversed)
predicate
ex_jlist_b
where
predicate 'x'           = Left (Right 0)
predicate 'y'           = Left (Right 1)
predicate 'z'           = Left (Right 2)
predicate c | isVowel c = Left (Left (c == 'o'))
predicate c             = Right c

{-
[ Right [Right 'f',Left True,Left True,Right 'b',Left False,Right 'r']
, Left 0
, Right [Left False,Right 'n',Left True,Right 'r',Right 'd',Left False, ...
]
-}
example88b' :: [Either Int [Either Bool Char]]
example88b' = coerce example88b

The "xyzzy" is filtered immediately, we see Left 0 as a reason. We can also see how vowels are filtered, and 'o' are marked specifically with Left True.

Having coindices reside inside the structure makes composition just work. That what makes this different from "error reporting optics". And using coindices approach we can compose filters, the Wither from witherable doesn’t seem to compose with itself.

## Both

Obvious follow up question is whether we can have both indices and coindices. Why not, the concrete profunctor would look like:

newtype DuplexFunArrow is js a b = DuplexFunArrow
{ runDuplexFunArrow :: (NP I is, a) -> Either (NS I js) b }

Intuitively, the structure traversals would provide additional information in indices, and we’ll be able to alter it by optionally returning coindices.

Would that be useful? I have no idea.

## Utilities

nsAbsurd :: NS I '[] -> a
nsAbsurd x = case x of {}

1. https://blog.fp-tower.com/2020-01-27-introducing-error-reporting-in-optics/ looks like an example of that. In Scala.↩︎

]]>
Dependent Linear types in QTT https://oleg.fi/gists/posts/2020-12-18-dependent-linear.html 2020-12-18T00:00:00Z 2020-12-18T00:00:00Z Posted on 2020-12-18 by Oleg Grenrus linear

This post is my musings about a type system proposed by Conor McBride in I Got Plenty o' Nuttin' and refined by Robert Atkey in Syntax and Semantics of Quantitative Type Theory: Quantitative Type Theory or QTT for short. Idris 2 is based on QTT, so at the end there is some code too.

## Non-linear

But let me start with recalling Simply Typed Lambda Calculus with Products and Coproducts, (that is a mouthful!). As the name already says there are three binary connectives, functions, products and coproducts (or sums).

But in Martin Löf Type Theory (not quantiative) we have only two: pi and sigma types.

We can recover ordinary functions, and ordinary pairs in a straigh-forward way:

However, to get coproducts we need something extra. One general way is to add finite sets. We get falsehood (empty finite set), truth (singleton finite set), booleans (binary finite sets), and so on. But these three are enough.

With booleans we can define

Which is very reasonable. That is how we represent sum types on physical machines: a tag and a payload which type depends on a tag.

A natural question is what we get if switch to . Unfortuntely, nothing new:

We get another way to describe pairs.

To summarise:

Let us next try to see how this works out with linear types.

## Linear

In the intuitionistic linear calculus (ILC, also logic, ILL) we have four binary connectives: linear functions, times, plus and with.

Often an unary bang

is also added to allow writing non-linear programs as well.

Making linear dependent theory is hard, and Quantitative Type Theory is a promising attempt. It seems to work.

And it is not complicated. In fact, as far as we concerned, it still has just two connectives

but slightly modified to include multiplicity (denoted by ):

The multiplicity can be , i.e. linear single use. Or , the unrestricted use, but also , which is irrelevant usage. I find this quite elegant.

The rules are then set up so we don't run into problems with "types using our linear variables", as they are checked in irrelevant context.

As in the non-linear setting, we can recover two "simple" connectives immediately:

Other multiplicities allow us to create some "new" connectives

where is the irrelevant quantification.

You should guess that next we will recover and using booleans. However booleans in linear calculus are conceptually hard, when you start to think about resource interpretation of linear calculus. However, booleans are not resources, they are information. About introduction Atkey says

... no resources are required to construct the constants true and false

but the elimination is a bit more involved. The important bit is, however, that we have if-then-else which behaves reasonably.

Then

and

That behaves as we want. When we match on we learn the tag and payload. As tag has multiplicity 1, we can once match on it to learn the type of the payload. (Note: I should really write the typing rules, and derivations, but I'm quite confident it works this way. Highly likely I'm wrong :)

The with-connective, , is mind-juggling. It's a product (in fact, the product in CT-sense). We can extract parts, but if we have to decide which, cannot do both.

The value of type is a function, and we can only call it once, so we cannot write a value of type , nor the inverse.

So the and the are both product like, but different.

We redraw the table from the previous section. there are no more unelegant duplication:

## Diag

It's often said that you cannot write

diag :: a -> (a, a)

in linear calculus.

This is true if we assume that tuples are tensors. That is natural assumption as is what makes curry with arrows.

However, the product is . I argue that "the correct" type of diag () is

diag : a -> a & a
diag x = x :&: x

And in fact, the adjointness is the same as in CT-of-STLC, (Idris2 agrees with me):

If we could just normalise the notation, then we'd use

But that would be sooo... confusing.

There is plenty of room for more confusion, it gets better.

## Units

Products and coproducts usually have units, so is the case in the linear calculus too.

Spot a source of possible confusion.

We know, that because is the product, its unit is the terminal object, . And now we have to be careful.

Definition: T is terminal object if for every object X in category C there exist unique morphism .

Indeed the in linear logic is such object. It acts like a dumpster. If we don't like some (resource) thing, we can map it to . If we already have (there is no way to get rid of it), we can tensor it with something else we don't like and map the resulting "pair" to another . Bottomless trash can!

In category theory we avoid speaking about objects directly (it is point-free extreme). If we need to, we speak about object , we rather talk about morphism (constant function). This works because, e.g. in Sets category:

There the use of comes from it being the unit of used to internalize arrows, i.e. define objects (and binary functions, currying, etc).

In linear logic, the "friend" of is, however, , and its unit is not terminal object.

So we rather have

which is again confusing, as you can confuse for initial object, , which it isn't. To help avoid that I used the subscript.

The takeaway is that and in linear logic are different objects, and you have to be very careful so ordinary lambda calculus (or e.g. Haskell) intuition doesn't confuse you.

I wish there were a category where linear stuff is separate. In Sets . Vector spaces are close, but they have own source of confusion (there , direct sum, which is a product, and coproduct).

## Idris 2

All above is nicely encodable in Idris 2. If you want to play with linear logic concepts, I'd say that Idris2 is the best playground at the moment.

module QTT

-----------------------------------------------------------------------
-- Pi and Sigma
-----------------------------------------------------------------------

Pi1 : (a : Type) -> (a -> Type) -> Type
Pi1 a b = (1 x : a) -> b x

data Sigma1 : (a : Type) -> (a -> Type) -> Type where
Sig1 : (1 x : a) -> (1 y : b x) -> Sigma1 a b

-----------------------------------------------------------------------
-- Lollipop
-----------------------------------------------------------------------

Lollipop : Type -> Type -> Type
Lollipop a b = Pi1 a \_ => b

-- handy alias
(-@) : Type -> Type -> Type
(-@) = Lollipop
infixr 0 -@

-- for constructor, just write \x => expr

-- Lollipop elimination, $lollipopElim : Lollipop a b -@ a -@ b lollipopElim f x = f x ----------------------------------------------------------------------- -- Times ----------------------------------------------------------------------- Times : Type -> Type -> Type Times a b = Sigma1 a \_ => b -- Times introduction times : a -@ b -@ Times a b times x y = Sig1 x y -- Times elimination timesElim : Times a b -@ (a -@ b -@ c) -@ c timesElim (Sig1 x y) k = k x y ----------------------------------------------------------------------- -- With ----------------------------------------------------------------------- With : Type -> Type -> Type With a b = Pi1 Bool \t => if t then a else b -- With elimination 1 fst : With a b -@ a fst w = w True -- With elimination 2 snd : With a b -@ b snd w = w False -- There isn't really a way to write a function for with introduction, -- let me rather write diag. diag : a -@ With a a diag x True = x diag x False = x -- Also note, that even if With would be a built-in, it should -- be non-strict (and a function is). -- We may use the same resource in both halfs differently, -- and the resource cannot be used until user have selected the half. ----------------------------------------------------------------------- -- Plus ----------------------------------------------------------------------- Plus : Type -> Type -> Type Plus a b = Sigma1 Bool \t => if t then a else b -- Plus introduction 1 inl : a -@ Plus a b inl x = Sig1 True x -- Plus introduction 2 inr : b -@ Plus a b inr y = Sig1 False y -- Plus elimination, either... with a with twist -- Give me two functions, I'll use one of them, not both. plusElim : With (a -@ c) (b -@ c) -@ Plus a b -@ c plusElim f (Sig1 True x) = f True x plusElim f (Sig1 False y) = f False y ----------------------------------------------------------------------- -- Extras ----------------------------------------------------------------------- -- plusElim is reversible. -- Plus -| Diag plusElimRev : (Plus a b -@ c) -@ With (a -@ c) (b -@ c) plusElimRev f True = \x => f (inl x) plusElimRev f False = \y => f (inr y) -- Diag -| With adjunctFwd : (c -@ With a b) -@ With (c -@ a) (c -@ b) adjunctFwd f True = \z => f z True adjunctFwd f False = \z => f z False adjunctBwd : With (c -@ a) (c -@ b) -@ (c -@ With a b) adjunctBwd f c True = f True c adjunctBwd f c False = f False c ----------------------------------------------------------------------- -- Hard exercise ----------------------------------------------------------------------- -- What would be a good way to imlement Top, i.e unit of With. -- -- fwd : a -@ With Top a -- bwd : With Top a -@ a -- -- I have ideas, I'm not sure I like them. ]]> true is false https://oleg.fi/gists/posts/2020-11-14-true-is-false.html 2020-11-14T00:00:00Z 2020-11-14T00:00:00Z Posted on 2020-11-14 by Oleg Grenrus agda {-# OPTIONS --cubical --safe #-} module TrueFalse where open import Cubical.Foundations.Everything open import Cubical.Data.Bool -- The following is basic stuff true≡true : true ≡ true true≡true = refl -- in Cubical Agda _≡_ isn't a primitive, but a (constant) path true≡true₂ : Path Bool true true true≡true₂ = refl -- but we can also have dependent paths. eq-Bool : Path Type₀ Bool Bool eq-Bool = refl -- i.e. paths which endpoints are values of -- types of another path by another path endpoints -- (and "sadly", we have to eta-expand the path) true≡true₃ : PathP (λ i → eq-Bool i) true true true≡true₃ = refl -- Additionally, Cubical Agda allows to say that -- Booleans are equal via the not-isomorphism -- (this is notEq in cubical library) eq-not-Bool : Bool ≡ Bool eq-not-Bool = isoToPath (iso not not notnot notnot) -- And then, one can write what may look like non-sense true≡false : PathP (λ i → eq-not-Bool i) true false true≡false = toPathP refl -- "Unfortunately" Agda makes that all a bit to explicit, -- otherwise you could see just -- -- true≡false : PathP ... true false -- true≡false = ... refl -- -- and get very confused. -- Devil is in the details (of dependent paths). ]]> A design for paths in Cabal https://oleg.fi/gists/posts/2020-09-13-a-design-for-paths.html 2020-09-13T00:00:00Z 2020-09-13T00:00:00Z Posted on 2020-09-13 by Oleg Grenrus engineering Where a big part of Cabal is about interpreting your-package.cabal file, also an important part of it and also cabal-install are filepaths. After all, cabal-install is a build tool. Currently (as of Cabal-3.4) the type used for all filepath needs is infamous type FilePath = String One can say that all paths in the codebase are dynamically typed. It is very hard to say whether paths are absolute or relative, and if relative to what. A solution would be to use path or paths library. I like paths better, because it is set up to talk about relative paths to arbitrary roots, not only absolute paths. Still, neither is good enough. Why I say so? Because Cabal and cabal-install have to deal with three kinds of paths. 1. Abstract paths 2. Paths on the host system 3. Paths on the target system It is that simple, but path is very concretely the second kind, and paths is somewhere in between first and second, but doesn't let you differentiate them. ## Abstract paths Abstract paths are the ones written in your-package.cabal file. For example hs-source-dirs: src/. It is not Unix path. It is not Windows path. It is in fact something which should be interpretable as either, and also path inside tarball archive. In fact it currently have to be common denominator of it, which means that backslashes \, i.e. Windows filepaths aren't portable, but I suspect they work if you build on Windows. Just thinking about types uncovers a possible bug. If we had a -- | An abstract path. data APath root = ... Then we could enforce format, for example prohibiting some (i.e. all known) special characters. Note: abstract paths are relative. There might be some abstract root, for example PackageRoot, but its interpretation still depends. The representation of APath is not important. It, however, should be some kind of text. ## Paths on the host system These are the concrete paths on your disk. -- | A path on host (build) system data HPath root = ... The HPath can have different roots as well, for example CWD (for current working directory), HomeDir or Absolute. Maybe even talking about HPath PackageRoot is meaningful. My gut feeling says that we rather should be able to provide an operation to resolve APath PackageRoot into HPath Absolute, given a HPath Absolute of package root. Also directory operations, i.e. IO operations, like listDirectory are only meaningful for HPaths. These are concrete paths. HPaths have to be represented in a systems native way. It can still be FilePath in the first iteration, but e.g. absolute paths on Windows may start with \\?\ and use backslashes as directory separators (c.f. APath which probably will look like POSIX path everywhere). ## Paths on the target system The third kind of paths are paths on the target system. While cross-compilation support in Cabal is barely existing, having own type for paths for target system should help that improve. One example are YourPackage_Paths modules. Currently it contains hardcoded paths to e.g. data-files directory of installed package, i.e. somewhere on the target system. While having hardcodes absolute paths in YourPackage_Paths is a bad idea nowadays, and the data-files discovery should be based on some relative (relocatable, using abstract APaths maybe?) system, having a -- | A path on the target (run) system data TPath root = ... will at least show where we use (absolute) target system paths. Perfectly we won't have them anywhere, if that is possible. But identifying where we have them now will help to get rid of them. Another example is running (custom) ./Setup or tests or benchmarks. I hope that we can engineer the code in a way that executables built for target system won't be callable, but will need to use a runner wrapper (which we have, but I don't know much about it). Even a host = target (common) system case, where the wrapper is trivial. Note: whether TPath is Windows path or POSIX path will depend on run-time information, so the conversion functions will need that bit of information. You couldn't be able to purely convert :: APath -> TPath, we will need to pass an extra context. Here again, better types should help guide the design process. ## Conclusion These are my current thoughts about how the paths will look in some future version of Cabal. Instead of one FilePath (or Path) there will be three: APath, HPath and TPath1. As I write down, this seems so obvious, this is how about paths have to be classified. Have anyone done something like this before? Please tell me, so I could learn from your experiences. 1. Names are subject to change, maybe SymPath (for symbolic), HostPath and TargetPath.↩︎ ]]> (Approximate) integer square root https://oleg.fi/gists/posts/2020-09-04-integer-square-root.html 2020-09-04T00:00:00Z 2020-09-04T00:00:00Z Posted on 2020-09-04 by Oleg Grenrus Quoting Wikipedia article: In number theory, the integer square root (intSqrt) of a positive integer is the positive integer which is the greatest integer less than or equal to the square root of , How to compute it in Haskell? The Wikipedia article mentions Newton’s method, but doesn’t discuss how to make the initial guess. In base-4.8 (GHC-7.10) we got countLeadingZeros function, which can be used to get good initial guess. Recall that finite machine integers look like n = 0b0......01..... ^^^^^^^^ -- @countLeadingZeros n@ bits ^^^^^^ -- @b = finiteBitSize n - countLeadingZeros n@ bits  We have an efficient way to get significant bits” count , which can be used to approximate the number. It is also easy to approximate the square root of numbers like : We can use this approximation as the initial guess, and write simple implementation of intSqrt: module IntSqrt where import Data.Bits intSqrt :: Int -> Int intSqrt 0 = 0 intSqrt 1 = 1 intSqrt n = case compare n 0 of LT -> 0 -- whatever :) EQ -> 0 GT -> iter guess -- only single iteration where iter :: Int -> Int iter 0 = 0 iter x = shiftR (x + n div x) 1 -- shifting is dividing guess :: Int guess = shiftL 1 (shiftR (finiteBitSize n - countLeadingZeros n) 1) Note, I do only single iteration1. Is it enough? My need is to calculate square roots of small numbers. We can test quite a large range exhaustively. Lets define a correctness predicate: correct :: Int -> Int -> Bool correct n x = sq x <= n && n < sq (x + 1) where sq y = y * y Out of hundred numbers correct100 = length [ (n,x) | n <- [ 0..99 ], let x = intSqrt n, correct n x ] the computed intSqrt is correct for 89! Which are the incorrect ones? incorrect100 = [ (8,3) , (24,5) , (32,6), (33,6), (34,6), (35,6) , (48,7) , (80,9) , (96,10), (97,10), (98,10), (99,10) ] The numbers which are close to perfect square (, , …) are over estimated. If we take bigger range, say 0...99999 then with single iteration 23860 numbers are correct, with two iterations 96659. For my usecase (mangling the size of QuickCheck generators) this is good enough, small deviations are very well acceptable. Bit fiddling FTW! 1. Like infamous Fast inverse square root algorithm, which also uses only single iteration, because the initial guess is very good,↩︎ ]]> ANN: cabal-fmt-0.1.4 - --no-cabal-file flag and fragments https://oleg.fi/gists/posts/2020-08-30-cabal-fmt-0.1.4.html 2020-08-30T00:00:00Z 2020-08-30T00:00:00Z Posted on 2020-08-30 by Oleg Grenrus packages I spent this Sunday writing two small patches to cabal-fmt. ## --no-cabal-file flag cabal-fmt reasonably assumes that the file it is formatting is a cabal package definition file. So it parses it as such. That is needed to correctly pretty print the fields, as some syntax, for example leading comma requires somewhat recent cabal-version: 2.2 (see Package Description Format Specification History for details). However, there are other files using the same markup format, for example cabal.project files or cabal.haskell-ci configuration files used by haskell-ci tool. Wouldn't it be nice if cabal-fmt could format these as well. In cabal-fmt-0.1.4 you can pass -n or --no-cabal-fmt flag, to prevent cabal-fmt from parsing these files as cabal package file. The downside is that the latest known cabal specification will be used. That shouldn't break cabal.haskell-ci files, but it might break cabal.project files if you are not careful. (Their parsing code is somewhat antique). An example of reformatting the cabal.project of this blog: --- a/cabal.project +++ b/cabal.project @@ -1,9 +1,7 @@ index-state: 2020-05-10T17:53:22Z with-compiler: ghc-8.6.5 - packages: - "." - pkg/gists-runnable.cabal + "." + pkg/gists-runnable.cabal -constraints: - hakyll +previewServer +constraints: hakyll +previewServer So satisfying. ## Fragments Another addition are fragments. They are best illustrated by an example. Imagine you have a multi-package project, and you use haskell-ci to generate your .travis.yml. Each .cabal package file must have the same ... tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.3 || ==8.10.1 library ... Then you find out that GHC 8.8.4 and GHC-8.10.2 were recently released, and you want to update your CI configuration. Editing multiple files, with the same change. Busy work. With cabal-fmt-0.1.4 you can create a fragment file, lets call it tested-with.fragment: tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.4 || ==8.10.2 And then edit your package files with a cabal-fmt pragma ( the fragment is probably in the root directory of project, but .cabal files are inside a directory per package):  ... +-- cabal-fmt: fragment ../tested-with.fragment tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.3 || ==8.10.1 library ... Then when you next time run cabal-fmt --inplace */*.cabal you'll see the diff  ... -- cabal-fmt: fragment ../tested-with.fragment -tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.3 || ==8.10.1 +tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.4 || ==8.10.2 library ... for all libraries. Handy! Some design comments: • Fragment is only a single field or a single section (e.g. common stanzas). Never multiple single fields. (Easier to implement, least surprising behavior: pragma is attached to a single field or section). • Field name or section header in the .cabal file and the fragment have to match. (To avoid mistakes). • Substitution is not recursive. (Guaranteed termination). • Other pragmas in fragments are not executed. Neither are comments in fragments preserved. (Not sure whether that would be valuable). Finally, you can use cabal-fmt --no-cabal-fmt to format fragment files too, even they are reformatted when spliced. ## Conclusion cabal-fmt-0.1.4 is a small release. I made --no-cabal-file to scratch my itch, and fragments partly to highlight that not every feature can exist in Cabal, but is very fine for preprocessors. I do think that fragments could be very useful in bigger projects. Let me know! ]]> Fixed points of indexed functors https://oleg.fi/gists/posts/2020-08-28-indexed-fixpoint.html 2020-08-28T00:00:00Z 2020-08-28T00:00:00Z Posted on 2020-08-28 by Oleg Grenrus ## Introduction I was lately thinking about fixed points, more or less. A new version of data-fix was released recently, and also corresponding version of recursion-schemes. Also I wrote a Fix-ing regular expressions post, about adding fixed points to regular expression. This post is another exploration: Fixed points of Indexed functors. This is not novel idea at all, but I’m positively surprised this works out quite nicely in modern GHC Haskell. I define a IxFix type and illustrate it with three examples. Note: The HFix in multirec package is the same as IxFix in this post. I always forget about the existence of multirec. In the following, the "modern GHC Haskell" is quite conservative, only eight extensions: {-# LANGUAGE DataKinds #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE PolyKinds #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeOperators #-} And this literate Haskell script is warning free, with -Wall {-# OPTIONS_GHC -Wall #-} On this trip module IxFix where we need a handful of imports -- Type should be added to Prelude import Data.Kind (Type) -- Few newtypes import Data.Functor.Identity (Identity (..)) import Data.Functor.Compose (Compose (..)) import Data.Functor.Const (Const (..)) -- dependently typed programming! import Data.Fin (Fin) import Data.Type.Nat import Data.Vec.Lazy (Vec (..)) -- magic import Data.Coerce (coerce) Before we go further, let me remind you about ordinary fixed points, as defined in data-fix package. newtype Fix f = Fix { unFix :: f (Fix f) } foldFix :: Functor f => (f a -> a) -> Fix f -> a foldFix f = go where go = f . fmap go . unFix Using Fix we can define recursive types using non-recursive base functors, e.g. for a list we’d have data ListF a rec = NilF | ConsF a rec We then use foldFix (or cata and other recursion schemes in recursion-schemes) to decouple "how we recurse" and "what we do at each step". I won’t try to convince you why this separation of concerns might be useful. Instead I continue directly to the topic: define indexed fixed points. Why we need them? Because Fix is not powerful enough to allow working with Vec or polymorphically recursive types. -- hello dependently typed world. data Vec (n :: Nat) (a :: Type) where VNil :: Vec 'Z a (:::) :: a -> Vec n a -> Vec ('S n) a ## Fixed points of indexed functors Before talking about fixed points, we need to figure out what are indexed functors. Recall a normal functor is a thing of kind Type -> Type: class Functor f where fmap :: (a -> b) -> (f a -> f b) Indexed version is the one with Type replaced with k -> Type, for some index k (it is still a functor, but in different category). We want morphisms to work for all indices, and preserve them. Thus we define a commonly used type alias1 -- natural, or parametric, transformation type f ~> g = forall (j :: k). f j -> g j Using it we can define a Functor variant2: it looks almost the same. class IxFunctor (f :: (k -> Type) -> (k -> Type)) where ixmap :: (a ~> b) -> (f a ~> f b) With IxFunctor in our toolbox, we can define an IxFix, note how the definition is again almost the same as for unindexed Fix and foldFix: newtype IxFix f i = IxFix { unIxFix :: f (IxFix f) i } foldIxFix :: IxFunctor f => (f g ~> g) -> IxFix f ~> g foldIxFix alg = alg . ixmap (foldIxFix alg) . unIxFix Does this work? I hope that following examples will convince that IxFix is usable (at least in theory). ## Example: length indexed lists, Vec The go to example of recursion schemes is a folding a list, The go to example of dependent types is length indexed list, often called Vec. I combine these traditions by defining Vec as an indexed fixed point: data VecF (a :: Type) rec (n :: Nat) where NilF :: VecF a rec 'Z ConsF :: a -> rec n -> VecF a rec ('S n) VecF is an IxFunctor: instance IxFunctor (VecF a) where ixmap _ NilF = NilF ixmap f (ConsF x xs) = ConsF x (f xs) And we can define Vec as fixed point of VecF, with constructors: type Vec' a n = IxFix (VecF a) n nil :: Vec' a 'Z nil = IxFix NilF cons :: a -> Vec' a n -> Vec' a ('S n) cons x xs = IxFix (ConsF x xs) Can we actually use it? Of course! Lets define concatenation3 Vec' a n -> Vec' a m -> Vec' a (Plus n m). We cannot use foldIxFix directly, as Plus n m is not the same index as n, so we need to define an auxiliary newtype to plumb the indices. Another way to think about these kind of newtypes, is that they work around the lack of type-level anonymous functions in nowadays Haskell. newtype Appended m a n = Append { getAppended :: Vec' a m -> Vec' a (Plus n m) } append :: forall a n m. Vec' a n -> Vec' a m -> Vec' a (Plus n m) append xs ys = getAppended (foldIxFix alg xs) ys where alg :: VecF a (Appended m a) j -> Appended m a j alg NilF = Append id alg (ConsF x rec) = Append$ \zs -> cons x (getAppended rec zs)

We can also define a refold function, which doesn’t mention IxFix at all.

ixrefold :: IxFunctor f => (f b ~> b) -> (a ~> f a) -> a ~> b
ixrefold f g = f . ixmap (ixrefold f g) . g

And then, using ixrefold we can define concatenation for Vec from vec package, which isn’t defined using IxFix. Here we need auxiliary newtypes as well.

newtype Swapped f a b =
Swap { getSwapped :: f b a }
newtype Appended2 m a n =
Append2 { getAppended2  :: Vec m a -> Vec (Plus n m) a }

append2 :: forall a n m. Vec n a -> Vec m a -> Vec (Plus n m) a
append2 xs ys = getAppended2 (ixrefold f g (Swap xs)) ys where
-- same as alg in 'append'
f :: VecF a (Appended2 m a) j -> Appended2 m a j
f NilF          = Append2 id
f (ConsF z rec) = Append2 $\zs -> z ::: (getAppended2 rec zs) -- 'project' g :: Swapped Vec a j -> VecF a (Swapped Vec a) j g (Swap VNil) = NilF g (Swap (z ::: zs)) = ConsF z (Swap zs) You may note that one can implement append as induction over length, that’s how vec implements them. Theoretically it is not right, and IxFix formulation highlights it: append3 :: forall a n m. SNatI n => Vec' a n -> Vec' a m -> Vec' a (Plus n m) append3 xs ys = getAppended3 (induction caseZ caseS) xs where caseZ :: Appended3 m a 'Z caseZ = Append3 (\_ -> ys) caseS :: Appended3 m a p -> Appended3 m a ('S p) caseS rec = Append3$ \(IxFix (ConsF z zs)) ->
cons z (getAppended3 rec zs)

-- Note: this is different than Appended!
newtype Appended3 m a n =
Append3 { getAppended3 :: Vec' a n -> Vec' a (Plus n m) }

Here we pattern match on IxFix value. If we want to treat it as least fixed point, the only valid elimination is to use foldIxFix!

However, the induction over length is the right approach if Vec is defined as a data or type family:

type family VecFam (a :: Type) (n :: Nat) :: Type where
VecFam a 'Z     = ()
VecFam a ('S n) = (a, VecFam a n)

Whether you want to have data or type-family or GADT depends on the application. (Even in Agda or Coq). Family variant doesn’t intristically know its length, which is sometimes a blessing, sometimes a curse. For what it’s worth, vec package provides both variants, with almost the same module interface.

## Example: Polymorphically recursive type

The IxFix can also be used to define polymorphically recursive types like

data Nested a = a :<: (Nested [a]) | Epsilon
infixr 5 :<:

nested :: Nested Int
nested = 1 :<: [2,3,4] :<: [[5,6],,[8,9]] :<: Epsilon

A length function defined over this datatype will be polymorphically recursive, as the type of the argument changes from Nested a to Nested [a] in the recursive call:

-- >>> nestedLength nested
-- 3
nestedLength :: Nested a -> Int
nestedLength Epsilon    = 0
nestedLength (_ :<: xs) = 1 + nestedLength xs

We cannot represent Nested as Fix of some functor, and we can not use recursion-schemes either. However, we can redefine Nested as indexed fixed point.

An important observation is that we (often or always?) use polymorphic recursion as a solution to the lack indexed types. My favorite example is de Bruijn indices for well-scoped terms. Compare

data Expr1 a
= Var1 a
| App1 (Expr1 a) (Expr1 a)
| Abs1 (Expr1 (Maybe a))

and

data Expr2 a n
= Free2 a                -- split free and bound variables
| Bound2 (Fin n)
| App2 (Expr2 a n) (Expr2 a n)
| Abs2 (Expr2 a ('S n))  -- extend bound context by one

Which one is simpler is a really good discussion, but for another time.

In Nested example the single argument is also used for two purposes: the type of an base element (Int) and container type (starts with Identity and increases with extra list layer).

One approach is just use Nat index and have a type family 4

type family Container (n :: Nat) :: Type -> Type where
Container 'Z     = Identity
Container ('S n) = Compose [] (Container n)

or

data NestedF a rec f
= f a :<<: rec (Compose [] f)
| EpsilonF

instance IxFunctor (NestedF a) where
ixmap _ EpsilonF    = EpsilonF
ixmap f (x :<<: xs) = x :<<: f xs

We can convert from Nested a to IxFix (NestedF a) Identity and back. We use coerce to help with newtype plumbing.

convert :: Nested a -> IxFix (NestedF a) Identity
convert = aux . coerce where
aux :: Nested (f a) -> IxFix (NestedF a) f
aux Epsilon    = IxFix EpsilonF
aux (x :<: xs) = IxFix (x :<<: aux (coerce xs))

-- back left as an exercise

And then we can write nestedLength as a fold.

-- >>> nestedLength2 (convert nested)
-- 3
nestedLength2 :: IxFix (NestedF a) f -> Int
nestedLength2 = getConst . foldIxFix alg where
alg :: NestedF a (Const Int) ~> Const Int
alg EpsilonF         = Const 0
alg (_ :<<: Const n) = Const (n + 1)

## Non-example: ListF

In the introduction I mentioned an ordinary list, which is a fixed point

where I use notation to represent least fixed points: . Note that we first introduce a type parameter with , and then make a fixed point with .

We can define an ordirinary list using IxFix, by taking a fixed point of a Type -> Type thing, i.e. first , and then .

data ListF1 rec a = NilF1 | ConsF1 a (rec a)
type List1 = IxFix ListF1

fromList1 :: [a] -> List1 a
fromList1 []     = IxFix NilF1
fromList1 (x:xs) = IxFix (ConsF1 x (fromList1 xs))

Compare to Agda code:

-- parameter
data List (A : Set) : Set where
nil  : List A
cons : A -> List A -> List A

-- index
data List : Set -> Set where
nil :  (A : Set) -> List A
cons : (A : Set) -> A -> List A -> List A

These types are subtly different. See https://stackoverflow.com/questions/24600256/difference-between-type-parameters-and-indices.

This gives a hint why Agda people define Vec (A : Set) : Nat -> Set, i.e. length as the last parameter: because you have to do that way. And Haskellers (usually) define as Vec (n :: Nat) (a :: Type), because then Vec can be given Functor etc instances. In other words, machinery in both languages forces an order of of type arguments.

Finally, we can write parametric version of List using IxFix too. We just use a dummy, boring index.

data ListF2 a rec (unused :: ()) = NilF2 | ConsF2 a (rec unused)
type List2 a = IxFix (ListF2 a) '()

fromList2 :: [a] -> List2 a
fromList2 []     = IxFix NilF2
fromList2 (x:xs) = IxFix (ConsF2 x (fromList2 xs))

IxFix is more general than Fix, but if you don’t need an extra power, maybe you shouldn’t use it.

Do we need something even more powerful than IxFix? I don’t think so. If we need more (dependent) indices, we can pack them all into a single index by tupling (or -mming) them.

## Conclusion

We have seen IxFix, fixed point of indexed functor. I honestly do not think that you should start looking in your code base whether you can use it. I suspect it is more useful as thinking and experimentation tool. It is an interesting gadget.

1. I’m sorry that tilde ~> and dash -> arrows look so similar.↩︎

2. Note that FFunctor in https://hackage.haskell.org/package/hkd-0.1/docs/Data-HKD.html (which is defined with different names in other packages as well) is of different kind. IxFunctor in https://hackage.haskell.org/package/indexed-0.1.3/docs/Data-Functor-Indexed.html is again different. Sorry for proliferation of various functors. And for confusing terminology. Dominic Orchard et al uses terms graded (k -> Type, this post) and parameterised (k -> k -> Type, indexed-package) in https://arxiv.org/abs/2001.10274v2. There is no monad-name for hkd-package variant, as that cannot be made into monad-like thing.↩︎

3. You may wonder why function name is append, but operation is concatenation? This is similar to having plus for addition.↩︎

4. Here one starts to wish that GHC had unsaturated type families, so we wouldn’t need to use newtypes...↩︎

]]>
ANN: git-badc0de - a tool to improve git commits you receive https://oleg.fi/gists/posts/2020-08-04-git-badc0de.html 2020-08-04T00:00:00Z 2020-08-04T00:00:00Z Posted on 2020-08-04 by Oleg Grenrus

There are various practices of authoring patches or commits in version control systems. If you are, like me, annoyed by fix typo fix up commits in pull or merge requests you get at work or in a open source project, or if you simply get too much contributions (which is good place to be, tell me how get there), then git-badc0de is the tool for you.

The problem is clearly that it is too easy to make new commits. The solution is make creating commits harder. Git developers make git interface saner with each release, and there are various tools (like GitHub web editing), which make writing fix typo commits child play easy.

git-badc0de (GitHub: phadej/git-badc0de) takes an out-of-the-box approach. Instead of trying to encourage (or force) humans to put more effort into each commit, it makes their machines do the work.

git-badc0de takes the HEAD commit, and creates an altered copy, such that the commit hash starts with some (by default badc0de) prefix. Obviously, I use git-badc0de in the development process of git-badc0de itself. Check the tail of git log:

badc0dea Add CONTRIBUTING.md
badc0de6 Comment out some debug output
badc0de5 Initial commit

It's up to the project owners to decide how long prefix you want to have. Seven base16 characters (i.e. 28 bits out of 160) is doable on modern multi-core hardware in a minute, with good luck in less1. These seconds are important. It is an opportunity to reflect, maybe even notice a typo in the commit message. Modern machines are so fast, and even some compilers too2, that we don't pause and think of what we have just done.

Git is content-addressable file system is how a chapter in Pro Git book on Git objects starts. Very nice model, very easy to tinker with. You can try out to git cat-file -p HEAD in your current Git project to see the HEAD commit object data. In git-badc0de one commit looks like:

tree 91aaad77e68aa7bf94219a5b9cea97f26e2cce2b
author Oleg Grenrus <oleg.grenrus@iki.fi> 1596481157 +0300
committer Oleg Grenrus <oleg.grenrus@iki.fi> 1596481157 +0300

PoW: HRYsAAAAAAF

Git commit hash is a hash of a header and these contents. A header for commit object looks like

commit <content length as ASCII number><NUL>

git-badc0de takes the most recent commit data, and by adding some PoW: DECAFC0FFEE salts to the end, tries to find one which makes commit hash with the correct prefix. It takes 11 characters to encode 64 bits in base64. Why base64, no particular reason. Based on this StackOverflow answer we could put salts into commit headers, to hide them from git log. Something to do in the future.

When a valid salt is found, git-badc0de writes the new commit object to the Git object store. At this point nothing is changed, only a new dangling object inside .git directory. You can reset your branch to point to the new commit with git reset, and git-badc0de invites you to do so.

## Are you serious

Yes, I'm dead serious (No). But I had fun implementing git-badc0de. I was surprised that getting seven characters "right" in a hash is an easy job. That causes nice artifacts in GitHub web interface.

The top commit shown on the project main page is always badc0de... ... and in fact all commits seem to have the same hash (prefix)... Note how command line git log is smart to show enough characters to make prefixes unambiguous. It is deliberate, check on some of your smaller projects, there git log --oneline probably prints seven character abbreviations. In GHC (Haskell compiler) git log --oneline prints ten characters for me (GitHub still shows just seven, so I assume it is hardcoded).

We can also use git-badc0de to produce commits with ordered hashes! The downside is that you have to decide the maximum commit count at the start. Yet should be enough for about any project. See ordered branch, isn't that cool!? How git-badc0de is implemented? I have to confess: I started with a Python prototype. Python comes with all pieces needed, though I essentially only needed hashlib.

The Haskell implementation has eleven dependencies at the moment of writing. Five of them are bundled with compiler, the rest six are not. Even for some basic tasks you have to go package shopping:

My first Haskell implementation was noticeably faster than Python3 version. I suspect that is because Haskell is simply better at gluing bytes together.

• I just use Haskell for everything. (Except for prototyping silly ideas). This is the most important reason.
• Haskell is good for writing parallel programs. This is a bonus.

To my surprise, my first Haskell parallelization attempt didn't work at all. An idea is to spawn multiple workers, which would try different salts. And then make them race, until one worker finds a valid salt. Adding more workers should not slowdown the overall program, minus maybe some small managerial overhead.

The overhead turned out to be quite large. Parallelism in Haskell works well when you deal with Haskell "native" data. git-badc0de use case is however gluing bytes (ByteStrings) together and calling out to C implementation of SHA1 algorithm.

The nasty detail of, I guess any, higher level languages is that foreign function interface has culprits. I run into foreign import unsafe issue. You may read about foreign import unsafe in the excellent GHC User Guide.

GHC guarantees that garbage collection will never occur during an unsafe call, ...

With many threads generating some amount of garbage, but also calling unsafe foreign functions in a tight loop caused problems. Surprisingly, both cryptohash-sha1 and bytestring use plenty of unsafe calls (cryptonite uses too).

My solution was to redo the loop. Less garbage generation and less foreign calls.

cryptohash-sha1 (and cryptonite) import _init, _update and _finalize C functions. The hashing context is allocated and plumbing done in Haskell. However, we can setup things in way such that we pass a single continuous block of memory to be hashed. This functionality is missing from the library, so I copied cbits/ from cryptohash-sha1 and added small C function, to do C plumbing in C:

void
hs_cryptohash_sha1(const uint8_t *data, size_t len, uint8_t *out)
{
struct sha1_ctx ctx;
hs_cryptohash_sha1_init(&ctx);
hs_cryptohash_sha1_update(&ctx, data, len);
hs_cryptohash_sha1_finalize(&ctx, out);
}

This way we can have one safe foreign call to calculate the hash. We have to make sure that pointers are pointing at pinned memory, i.e. memory which garbage collector won't move.

Next, the same problem is in the bytestring library, I was naive to think that as byte data I work with is so small, that concatenating it (and thus memcpying) won't be noticeable, hashing should dominate. Usually it isn't a problem, but as copying was done on each loop iteration and memcpy is foreign import unsafe in bytestring library, that also contributed to slowdown. That was my hypothesis.

Figuring out how to do it better with bytestring seemed difficult, so I opted for a different solution. Write some C-in-Haskell. Now each worker creates own mutable template, which is updated with new salt on each loop iteration. Salt length is fixed, so we don't need to change the commit object header. As a bonus, the loop become essentially non-allocating (I didn't check though).

After that change, git-badc0de started to use all the cores, and not just spin in GC locks. The runtime system statistics are nice to look at

                                Tot time (elapsed)  Avg pause  Max pause
Gen  0      0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
Gen  1      1 colls,     0 par    0.000s   0.000s     0.0004s    0.0004s

...

Productivity 100.0% of total user, 99.9% of total elapsed

No time is spent in garbage collection. Productivity is an amount of time used to do actual work and not collecting garbage. Disclaimer: it seems that waiting for GC locks is not counted towards GC time, but as there was only a single collection, that doesn't matter.

I could optimize further: as the salt is at the end of the content it is silly to rehash whole commit object every time. Yet, git-badc0de is silly project to begin with, and I am satisfied with the current state.

The lesson here is that foreign function interface (FFI) is not easy, you have to think and test.

"Luckily" I learned about the unsafe issue recently in postgresql-libpq, so was able to think about it causing my problems. In this case, unsafe doesn't mean that "I know what I'm doing" (as e.g. with unsafePerformIO), but rather the opposite.

Also, I don't think that we (= Haskell ecosystem) have a good tooling to benchmark how code behaves in highly parallel environments. I hope that Data.ByteString.Builder, for example, doesn't use any unsafe foreign calls, Ecosystem relies on that module for constructing JSON (in aeson) and HTML (both blaze-markup and lucid). Something for someone to test, maybe fix and document.

## Learnings

Does this mean that Haskell is crap, and the promise for easy parallel and concurrent programming is a lie, and we should all use Rust instead?

Well, no. In this isolated example, Rust would probably shine. There are, however, also other parts than hashing loop even in this simple program, and they have to be written as well. There Haskell feels a lot like Python, in a sense that I can just write consice code which works.

Python was quite nice in the very early prototyping stage, as it happened to have all needed functionality available in the repl. The "early prototyping stage" lasted for maybe 10 or 15 minutes, that was enough to verify that basic idea might work. With Haskell, you would need to restart repl to add a new library, losing all the context, which would killed the flow. For some other "problem"I might start to prototype directly in Haskell. I have no experience with how nice repl experience Rust has.

If git-badc0de project were to develop further, I would rewrite the hashing loop in C, instead of writing C-in-Haskell. Maybe. Or in Rust, if I that was easier to setup. (GHC knows how to invoke C compiler).

Haskell is a great glue language, among many other great properties it has. Don't believe anyone who tells you otherwise.

## Prior art

There is prior art, this is not a novel idea. Just search the internet for git commit prefix miner, e.g.

or git commit vanity hash

None is meta-used, i.e. use themselves on own commits, so I cannot be sure that they work :)

1. Eight base16 characters (i.e. 4 bytes) took one and half hour of CPU time or "just" 5.5 minutes of wall clock time. I run that experiment only few times. Take a look at deadc0de branch.↩︎

2. If your programming language of choice is a compiled one.↩︎

]]>