Where a big part of `Cabal`

is about interpreting `your-package.cabal`

file, also an important part of it and also `cabal-install`

are **filepaths**. After all, `cabal-install`

is a build tool.

Currently (as of `Cabal-3.4`

) the type used for all filepath needs is infamous

`type FilePath = String`

One can say that all paths in the codebase are *dynamically typed*. It is very hard to say whether paths are absolute or relative, and if relative to what.

A solution would be to use `path`

or `paths`

library.

I like `paths`

better, because it is set up to talk about relative paths to arbitrary roots, not only *absolute paths*.

Still, neither is good enough. Why I say so? Because `Cabal`

and `cabal-install`

have to deal with *three kinds of paths*.

- Abstract paths
- Paths on the host system
- Paths on the target system

It is that simple, but `path`

is very concretely the second kind, and `paths`

is somewhere in between first and second, but doesn't let you differentiate them.

Abstract paths are the ones written in `your-package.cabal`

file. For example `hs-source-dirs: src/`

. It is not Unix path. It is not Windows path. It is in fact something which should be interpretable as either, and also path inside tarball archive. In fact it currently have to be common denominator of it, which means that backslashes `\`

, i.e. Windows filepaths aren't portable, but I suspect they work *if you build on Windows*.

Just *thinking* about types uncovers a possible bug.

If we had a

```
-- | An abstract path.
data APath root = ...
```

Then we could enforce format, for example prohibiting some (i.e. all known) special characters.

*Note:* abstract paths are relative. There might be some *abstract* root, for example `PackageRoot`

, but its interpretation still depends.

The representation of `APath`

is not important. It, however, should be some kind of *text*.

These are the concrete paths on your disk.

```
-- | A path on host (build) system
data HPath root = ...
```

The `HPath`

can have different roots as well, for example `CWD`

(for current working directory), `HomeDir`

or `Absolute`

. Maybe even talking about `HPath PackageRoot`

is meaningful. My gut feeling says that we rather should be able to provide an operation to resolve `APath PackageRoot`

into `HPath Absolute`

, given a `HPath Absolute`

of package root.

Also `directory`

operations, i.e. IO operations, like `listDirectory`

are only meaningful for `HPath`

s. These are concrete paths.

`HPath`

s have to be represented in a systems native way. It can still be `FilePath`

in the first iteration, but e.g. absolute paths on Windows may start with `\\?\`

and use backslashes as directory separators (c.f. `APath`

which probably will look like POSIX path everywhere).

The third kind of paths are paths on the *target system*. While cross-compilation support in `Cabal`

is barely existing, having own type for paths for target system should help that improve.

One example are `YourPackage_Paths`

modules. Currently it contains hardcoded paths to e.g. `data-files`

directory of *installed* package, i.e. somewhere on the *target* system.

While having hardcodes absolute paths in `YourPackage_Paths`

is a bad idea nowadays, and the `data-files`

discovery should be based on some relative (relocatable, using abstract `APath`

s maybe?) system, having a

```
-- | A path on the target (run) system
data TPath root = ...
```

will at least show where we use (absolute) target system paths. Perfectly we won't have them anywhere, if that is possible. But identifying where we have them *now* will help to get rid of them.

Another example is running (custom) `./Setup`

or tests or benchmarks. I hope that we can engineer the code in a way that executables built for target system won't be callable, but will need to use a runner wrapper (which we have, but I don't know much about it). Even a *host = target* (common) system case, where the wrapper is trivial.

*Note:* whether `TPath`

is Windows path or POSIX path will depend on run-time information, so the conversion functions will need that bit of information. You couldn't be able to purely `convert :: APath -> TPath`

, we will need to pass an extra context.

Here again, better types should help guide the design process.

These are my current thoughts about how the paths will look in some future version of `Cabal`

. Instead of one `FilePath`

(or `Path`

) there will be three: `APath`

, `HPath`

and `TPath`

^{1}.

As I write down, this seems so obvious, this is how about paths have to be classified. Have anyone done something like this before? Please tell me, so I could learn from your experiences.

Names are subject to change, maybe

`SymPath`

(for symbolic),`HostPath`

and`TargetPath`

.↩︎

Quoting Wikipedia article: In number theory, the *integer square root* (`intSqrt`

) of a positive integer is the positive integer which is the greatest integer less than or equal to the square root of ,

How to compute it in Haskell? The Wikipedia article mentions Newton’s method, but doesn’t discuss how to make the initial guess.

In `base-4.8`

(GHC-7.10) we got `countLeadingZeros`

function, which can be used to get good initial guess.

Recall that finite machine integers look like

```
= 0b0......01.....
n ^^^^^^^^ -- @countLeadingZeros n@ bits
^^^^^^ -- @b = finiteBitSize n - countLeadingZeros n@ bits
```

We have an efficient way to get *``significant bits”* count , which can be used to approximate the number.

It is also easy to approximate the square root of numbers like :

We can use this approximation as the initial guess, and write simple implementation of `intSqrt`

:

```
module IntSqrt where
import Data.Bits
intSqrt :: Int -> Int
0 = 0
intSqrt 1 = 1
intSqrt = case compare n 0 of
intSqrt n LT -> 0 -- whatever :)
EQ -> 0
GT -> iter guess -- only single iteration
where
iter :: Int -> Int
0 = 0
iter = shiftR (x + n `div` x) 1 -- shifting is dividing
iter x
guess :: Int
= shiftL 1 (shiftR (finiteBitSize n - countLeadingZeros n) 1) guess
```

Note, I do only single iteration^{1}. Is it enough? My need is to calculate square roots of small numbers. We can test quite a large range exhaustively. Lets define a correctness predicate:

```
correct :: Int -> Int -> Bool
= sq x <= n && n < sq (x + 1) where sq y = y * y correct n x
```

Out of hundred numbers

```
= length
correct100 | n <- [ 0..99 ], let x = intSqrt n, correct n x ] [ (n,x)
```

the computed `intSqrt`

is correct for 89! Which are the incorrect ones?

```
=
incorrect100 8,3)
[ (24,5)
, (32,6), (33,6), (34,6), (35,6)
, (48,7)
, (80,9)
, (96,10), (97,10), (98,10), (99,10)
, ( ]
```

The numbers which are close to perfect square (, , …) are over estimated.

If we take bigger range, say 0...99999 then with single iteration 23860 numbers are correct, with two iterations 96659.

For my usecase (mangling the `size`

of `QuickCheck`

generators) this is good enough, small deviations are very well acceptable. Bit fiddling FTW!

Like infamous Fast inverse square root algorithm, which also uses only single iteration, because the initial guess is very good,↩︎

I spent this Sunday writing two small patches to `cabal-fmt`

.

`cabal-fmt`

reasonably assumes that the file it is formatting is a cabal package definition file. So it parses it as such. That is needed to correctly pretty print the fields, as some syntax, for example leading comma requires somewhat recent `cabal-version: 2.2`

(see Package Description Format Specification History for details).

However, there are other files using the same markup format, for example `cabal.project`

files or `cabal.haskell-ci`

configuration files used by haskell-ci tool. Wouldn't it be nice if `cabal-fmt`

could format these as well. In `cabal-fmt-0.1.4`

you can pass `-n`

or `--no-cabal-fmt`

flag, to prevent `cabal-fmt`

from parsing these files as cabal package file.

The downside is that the latest known cabal specification will be used. That shouldn't break `cabal.haskell-ci`

files, but it might break `cabal.project`

files if you are not careful. (Their parsing code is somewhat antique).

An example of reformatting the `cabal.project`

of this blog:

```
--- a/cabal.project
+++ b/cabal.project
@@ -1,9 +1,7 @@
index-state: 2020-05-10T17:53:22Z
with-compiler: ghc-8.6.5-
packages:- "."
- pkg/gists-runnable.cabal
+ "."
+ pkg/gists-runnable.cabal
-constraints:
- hakyll +previewServer
+constraints: hakyll +previewServer
```

So satisfying.

Another addition are *fragments*. They are best illustrated by an example. Imagine you have a multi-package project, and you use `haskell-ci`

to generate your `.travis.yml`

. Each `.cabal`

package file must have the same

```
...
tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.3 || ==8.10.1
library
...
```

Then you find out that GHC 8.8.4 and GHC-8.10.2 were recently released, and you want to update your CI configuration. Editing multiple files, with the same change. Busy work.

With `cabal-fmt-0.1.4`

you can create a fragment file, lets call it `tested-with.fragment`

:

`tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.4 || ==8.10.2`

And then edit your package files with a *cabal-fmt pragma* ( the fragment is probably in the root directory of project, but `.cabal`

files are inside a directory per package):

```
...
+-- cabal-fmt: fragment ../tested-with.fragment
tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.3 || ==8.10.1
library ...
```

Then when you next time run

`cabal-fmt --inplace */*.cabal`

you'll see the diff

```
...
-- cabal-fmt: fragment ../tested-with.fragment-tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.3 || ==8.10.1
+tested-with: GHC ==8.4.4 || ==8.6.5 || ==8.8.4 || ==8.10.2
library ...
```

for all libraries. *Handy*!

Some design comments:

- Fragment is only a single field or a single section (e.g. common stanzas). Never multiple single fields. (Easier to implement, least surprising behavior: pragma is attached to a single field or section).
- Field name or section header in the
`.cabal`

file and the fragment have to match. (To avoid mistakes). - Substitution is not recursive. (Guaranteed termination).
- Other pragmas in fragments are not executed. Neither are comments in fragments preserved. (Not sure whether that would be valuable).

Finally, you can use `cabal-fmt --no-cabal-fmt`

to format fragment files too, even they are reformatted when spliced.

`cabal-fmt-0.1.4`

is a small release. I made `--no-cabal-file`

to scratch my itch, and fragments partly to highlight that not every feature can exist in `Cabal`

, but is very fine for preprocessors. I do think that fragments could be very useful in bigger projects. Let me know!

I was lately thinking about fixed points, more or less.

A new version of `data-fix`

was released recently, and also corresponding version of `recursion-schemes`

.

Also I wrote a Fix-ing regular expressions post, about adding fixed points to regular expression.

This post is another exploration: *Fixed points of Indexed functors*. This is not novel idea at all, but I’m positively surprised this works out quite nicely in modern GHC Haskell. I define a `IxFix`

type and illustrate it with three examples.

*Note:* The `HFix`

in `multirec`

package is the same as `IxFix`

in this post. I always forget about the existence of `multirec`

.

In the following, the "modern GHC Haskell" is quite conservative, only eight extensions:

```
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
```

And this literate Haskell script is warning free, with `-Wall`

`{-# OPTIONS_GHC -Wall #-}`

On this trip

`module IxFix where`

we need a handful of imports

```
-- Type should be added to Prelude
import Data.Kind (Type)
-- Few newtypes
import Data.Functor.Identity (Identity (..))
import Data.Functor.Compose (Compose (..))
import Data.Functor.Const (Const (..))
-- dependently typed programming!
import Data.Fin (Fin)
import Data.Type.Nat
import Data.Vec.Lazy (Vec (..))
-- magic
import Data.Coerce (coerce)
```

Before we go further, let me remind you about ordinary fixed points, as defined in `data-fix`

package.

```
newtype Fix f = Fix { unFix :: f (Fix f) }
foldFix :: Functor f => (f a -> a) -> Fix f -> a
= go where go = f . fmap go . unFix foldFix f
```

Using `Fix`

we can define recursive types using non-recursive *base functors*, e.g. for a list we’d have

`data ListF a rec = NilF | ConsF a rec`

We then use `foldFix`

(or `cata`

and other recursion schemes in `recursion-schemes`

) to decouple "how we recurse" and "what we do at each step". I won’t try to convince you why this separation of concerns might be useful.

Instead I continue directly to the topic: define indexed fixed points. Why we need them? Because `Fix`

is not powerful enough to allow working with `Vec`

or polymorphically recursive types.

```
-- hello dependently typed world.
data Vec (n :: Nat) (a :: Type) where
VNil :: Vec 'Z a
(:::) :: a -> Vec n a -> Vec ('S n) a
```

Before talking about fixed points, we need to figure out what are indexed functors. Recall a normal functor is a thing of kind `Type -> Type`

:

```
class Functor f where
fmap :: (a -> b) -> (f a -> f b)
```

Indexed version is the one with `Type`

replaced with `k -> Type`

, for some index `k`

(it is still a functor, but in different category). We want morphisms to work for all indices, and preserve them. Thus we define a commonly used type alias^{1}

```
-- natural, or parametric, transformation
type f ~> g = forall (j :: k). f j -> g j
```

Using it we can define a `Functor`

variant^{2}: it looks almost the same.

```
class IxFunctor (f :: (k -> Type) -> (k -> Type)) where
ixmap :: (a ~> b) -> (f a ~> f b)
```

With `IxFunctor`

in our toolbox, we can define an `IxFix`

, note how the definition is again almost the same as for unindexed `Fix`

and `foldFix`

:

```
newtype IxFix f i = IxFix { unIxFix :: f (IxFix f) i }
foldIxFix :: IxFunctor f => (f g ~> g) -> IxFix f ~> g
= alg . ixmap (foldIxFix alg) . unIxFix foldIxFix alg
```

Does this work? I hope that following examples will convince that `IxFix`

is usable (at least in theory).

The go to example of recursion schemes is a folding a list, The go to example of dependent types is length indexed list, often called `Vec`

. I combine these traditions by defining `Vec`

as an indexed fixed point:

```
data VecF (a :: Type) rec (n :: Nat) where
NilF :: VecF a rec 'Z
ConsF :: a -> rec n -> VecF a rec ('S n)
```

`VecF`

is an `IxFunctor`

:

```
instance IxFunctor (VecF a) where
NilF = NilF
ixmap _ ConsF x xs) = ConsF x (f xs) ixmap f (
```

And we can define `Vec`

as fixed point of `VecF`

, with constructors:

```
type Vec' a n = IxFix (VecF a) n
nil :: Vec' a 'Z
= IxFix NilF
nil
cons :: a -> Vec' a n -> Vec' a ('S n)
= IxFix (ConsF x xs) cons x xs
```

Can we actually use it? Of course! Lets define concatenation^{3} `Vec' a n -> Vec' a m -> Vec' a (Plus n m)`

. We cannot use `foldIxFix`

directly, as `Plus n m`

is not the same index as `n`

, so we need to define an auxiliary `newtype`

to plumb the indices. Another way to think about these kind of `newtype`

s, is that they work around the lack of type-level anonymous functions in nowadays Haskell.

```
newtype Appended m a n =
Append { getAppended :: Vec' a m -> Vec' a (Plus n m) }
```

```
append :: forall a n m. Vec' a n -> Vec' a m -> Vec' a (Plus n m)
= getAppended (foldIxFix alg xs) ys where
append xs ys alg :: VecF a (Appended m a) j -> Appended m a j
NilF = Append id
alg ConsF x rec) = Append $ \zs -> cons x (getAppended rec zs) alg (
```

We can also define a refold function, which doesn’t mention `IxFix`

at all.

```
ixrefold :: IxFunctor f => (f b ~> b) -> (a ~> f a) -> a ~> b
= f . ixmap (ixrefold f g) . g ixrefold f g
```

And then, using `ixrefold`

we can define concatenation for `Vec`

from `vec`

package, which isn’t defined using `IxFix`

. Here we need auxiliary `newtype`

s as well.

```
newtype Swapped f a b =
Swap { getSwapped :: f b a }
newtype Appended2 m a n =
Append2 { getAppended2 :: Vec m a -> Vec (Plus n m) a }
append2 :: forall a n m. Vec n a -> Vec m a -> Vec (Plus n m) a
= getAppended2 (ixrefold f g (Swap xs)) ys where
append2 xs ys -- same as alg in 'append'
f :: VecF a (Appended2 m a) j -> Appended2 m a j
NilF = Append2 id
f ConsF z rec) = Append2 $ \zs -> z ::: (getAppended2 rec zs)
f (
-- 'project'
g :: Swapped Vec a j -> VecF a (Swapped Vec a) j
Swap VNil) = NilF
g (Swap (z ::: zs)) = ConsF z (Swap zs) g (
```

You may note that one can implement `append`

as induction over length, that’s how `vec`

implements them. Theoretically it is not right, and `IxFix`

formulation highlights it:

```
append3 :: forall a n m. SNatI n
=> Vec' a n -> Vec' a m -> Vec' a (Plus n m)
= getAppended3 (induction caseZ caseS) xs where
append3 xs ys caseZ :: Appended3 m a 'Z
= Append3 (\_ -> ys)
caseZ
caseS :: Appended3 m a p -> Appended3 m a ('S p)
= Append3 $ \(IxFix (ConsF z zs)) ->
caseS rec
cons z (getAppended3 rec zs)
-- Note: this is different than Appended!
newtype Appended3 m a n =
Append3 { getAppended3 :: Vec' a n -> Vec' a (Plus n m) }
```

Here we *pattern match* on `IxFix`

value. If we want to treat it as least fixed point, the only valid elimination is to use `foldIxFix`

!

However, the induction over length is the right approach if `Vec`

is defined as a data or type family:

```
type family VecFam (a :: Type) (n :: Nat) :: Type where
VecFam a 'Z = ()
VecFam a ('S n) = (a, VecFam a n)
```

Whether you want to have data or type-family or GADT depends on the application. (Even in Agda or Coq). Family variant doesn’t intristically know its length, which is sometimes a blessing, sometimes a curse. For what it’s worth, `vec`

package provides both variants, with almost the same module interface.

The `IxFix`

can also be used to define polymorphically recursive types like

```
data Nested a = a :<: (Nested [a]) | Epsilon
infixr 5 :<:
nested :: Nested Int
= 1 :<: [2,3,4] :<: [[5,6],[7],[8,9]] :<: Epsilon nested
```

A length function defined over this datatype will be polymorphically recursive, as the type of the argument changes from Nested a to Nested [a] in the recursive call:

```
-- >>> nestedLength nested
-- 3
nestedLength :: Nested a -> Int
Epsilon = 0
nestedLength :<: xs) = 1 + nestedLength xs nestedLength (_
```

We cannot represent `Nested`

as `Fix`

of some functor, and we can not use `recursion-schemes`

either. However, we can redefine `Nested`

as indexed fixed point.

An important observation is that we (often or always?) use polymorphic recursion as a solution to the lack indexed types. My favorite example is de Bruijn indices for well-scoped terms. Compare

```
data Expr1 a
= Var1 a
| App1 (Expr1 a) (Expr1 a)
| Abs1 (Expr1 (Maybe a))
```

and

```
data Expr2 a n
= Free2 a -- split free and bound variables
| Bound2 (Fin n)
| App2 (Expr2 a n) (Expr2 a n)
| Abs2 (Expr2 a ('S n)) -- extend bound context by one
```

Which one is *simpler* is a really good discussion, but for another time.

In `Nested`

example the single argument is also used for two purposes: the type of an base element (`Int`

) and container type (starts with `Identity`

and increases with extra list layer).

One approach is just use `Nat`

index and have a type family ^{4}

```
type family Container (n :: Nat) :: Type -> Type where
Container 'Z = Identity
Container ('S n) = Compose [] (Container n)
```

or

```
data NestedF a rec f
= f a :<<: rec (Compose [] f)
| EpsilonF
instance IxFunctor (NestedF a) where
EpsilonF = EpsilonF
ixmap _ :<<: xs) = x :<<: f xs ixmap f (x
```

We can convert from `Nested a`

to `IxFix (NestedF a) Identity`

and back. We use `coerce`

to help with `newtype`

plumbing.

```
convert :: Nested a -> IxFix (NestedF a) Identity
= aux . coerce where
convert aux :: Nested (f a) -> IxFix (NestedF a) f
Epsilon = IxFix EpsilonF
aux :<: xs) = IxFix (x :<<: aux (coerce xs))
aux (x
-- back left as an exercise
```

And then we can write `nestedLength`

as a fold.

```
-- >>> nestedLength2 (convert nested)
-- 3
nestedLength2 :: IxFix (NestedF a) f -> Int
= getConst . foldIxFix alg where
nestedLength2 alg :: NestedF a (Const Int) ~> Const Int
EpsilonF = Const 0
alg :<<: Const n) = Const (n + 1) alg (_
```

In the introduction I mentioned an ordinary list, which is a fixed point

where I use notation to represent least fixed points: . Note that we first introduce a type parameter with , and then make a fixed point with .

We can define an ordirinary list using `IxFix`

, by taking a fixed point of a `Type -> Type`

thing, i.e. first , and then .

```
data ListF1 rec a = NilF1 | ConsF1 a (rec a)
type List1 = IxFix ListF1
fromList1 :: [a] -> List1 a
= IxFix NilF1
fromList1 [] :xs) = IxFix (ConsF1 x (fromList1 xs)) fromList1 (x
```

Compare to Agda code:

```
-- parameter
data List (A : Set) : Set where
: List A
nil : A -> List A -> List A
cons
-- index
data List : Set -> Set where
: (A : Set) -> List A
nil : (A : Set) -> A -> List A -> List A cons
```

These types are subtly different. See https://stackoverflow.com/questions/24600256/difference-between-type-parameters-and-indices.

This gives a hint why `Agda`

people define `Vec (A : Set) : Nat -> Set`

, i.e. length as the last parameter: because you have to do that way. And Haskellers (usually) define as `Vec (n :: Nat) (a :: Type)`

, because then `Vec`

can be given `Functor`

etc instances. In other words, machinery in both languages forces an order of of type arguments.

Finally, we can write parametric version of `List`

using `IxFix`

too. We just use a dummy, boring index.

```
data ListF2 a rec (unused :: ()) = NilF2 | ConsF2 a (rec unused)
type List2 a = IxFix (ListF2 a) '()
fromList2 :: [a] -> List2 a
= IxFix NilF2
fromList2 [] :xs) = IxFix (ConsF2 x (fromList2 xs)) fromList2 (x
```

`IxFix`

is more general than `Fix`

, but if you don’t need an extra power, maybe you shouldn’t use it.

Do we need something even more powerful than `IxFix`

? I don’t think so. If we need more (dependent) indices, we can pack them all into a single index by tupling (or -mming) them.

We have seen `IxFix`

, fixed point of indexed functor. I honestly do not think that you should start looking in your code base whether you can use it. I suspect it is more useful as thinking and experimentation tool. It is an interesting gadget.

I’m sorry that tilde

`~>`

and dash`->`

arrows look so similar.↩︎Note that

`FFunctor`

in https://hackage.haskell.org/package/hkd-0.1/docs/Data-HKD.html (which is defined with different names in other packages as well) is of different kind.`IxFunctor`

in https://hackage.haskell.org/package/indexed-0.1.3/docs/Data-Functor-Indexed.html is again different. Sorry for proliferation of various functors. And for confusing terminology. Dominic Orchard et al uses terms graded (`k -> Type`

, this post) and parameterised (`k -> k -> Type`

,`indexed`

-package) in https://arxiv.org/abs/2001.10274v2. There is no monad-name for`hkd`

-package variant, as that cannot be made into monad-like thing.↩︎You may wonder why function name is

`append`

, but operation is concatenation? This is similar to having`plus`

for addition.↩︎Here one starts to wish that GHC had unsaturated type families, so we wouldn’t need to use newtypes...↩︎

There are various practices of authoring patches or commits in version control systems. If you are, like me, annoyed by *fix typo* fix up commits in pull or merge requests you get at work or in a open source project, or if you simply get too much contributions (which is good place to be, tell me how get there), then `git-badc0de`

is the tool for you.

The problem is clearly that it is too easy to make new commits. The solution is make creating commits harder. Git developers make `git`

interface saner with each release, and there are various tools (like GitHub web editing), which make writing *fix typo* commits child play easy.

`git-badc0de`

(GitHub: phadej/git-badc0de) takes an out-of-the-box approach. Instead of trying to encourage (or force) humans to put more effort into each commit, it makes their machines do the work.

`git-badc0de`

takes the `HEAD`

commit, and creates an altered copy, such that the commit hash starts with some (by default `badc0de`

) prefix. Obviously, I use `git-badc0de`

in the development process of `git-badc0de`

itself. Check the tail of `git log`

:

```
badc0dea Add CONTRIBUTING.md
badc0ded Make prefix configurable.
badc0de4 Add message to do git reset
badc0de6 Comment out some debug output
badc0de5 Initial commit
```

It's up to the project owners to decide how long prefix you want to have. Seven base16 characters (i.e. 28 bits out of 160) is doable on modern multi-core hardware in a minute, with good luck in less^{1}. These seconds are important. It is an opportunity to reflect, maybe even notice a typo in the commit message. Modern machines are so fast, and even some compilers too^{2}, that we don't pause and think of what we have just done.

*Git is content-addressable file system* is how a chapter in Pro Git book on Git objects starts. Very nice model, very easy to tinker with. You can try out to `git cat-file -p HEAD`

in your current Git project to see the `HEAD`

commit object data. In `git-badc0de`

one commit looks like:

```
tree 91aaad77e68aa7bf94219a5b9cea97f26e2cce2b
parent badc0dea0106987c4edfb1d169b5a43d95845569
author Oleg Grenrus <oleg.grenrus@iki.fi> 1596481157 +0300
committer Oleg Grenrus <oleg.grenrus@iki.fi> 1596481157 +0300
Rewrite README.md
PoW: HRYsAAAAAAF
```

Git commit hash is a hash of a header and these contents. A header for commit object looks like

`commit <content length as ASCII number><NUL>`

`git-badc0de`

takes the most recent commit data, and by adding some `PoW: DECAFC0FFEE`

*salts* to the end, tries to find one which makes commit hash with the correct prefix. It takes 11 characters to encode 64 bits in base64. Why base64, no particular reason. Based on this StackOverflow answer we could put salts into commit headers, to hide them from `git log`

. Something to do in the future.

When a valid salt is found, `git-badc0de`

writes the new commit object to the Git object store. At this point nothing is changed, only a new dangling object inside `.git`

directory. You can reset your branch to point to the new commit with `git reset`

, and `git-badc0de`

invites you to do so.

Yes, I'm dead serious (*No*). But I had fun implementing `git-badc0de`

. I was surprised that getting seven characters "right" in a hash is an easy job. That causes nice artifacts in GitHub web interface.

The top commit shown on the project main page is always `badc0de`

...

... and in fact all commits seem to have the same hash (prefix)...

Note how command line `git log`

is smart to show enough characters to make prefixes unambiguous. It is deliberate, check on some of your smaller projects, there `git log --oneline`

probably prints seven character abbreviations. In GHC (Haskell compiler) `git log --oneline`

prints ten characters for me (GitHub still shows just seven, so I assume it is hardcoded).

We can also use `git-badc0de`

to produce commits with ordered hashes! The downside is that you have to decide the maximum commit count at the start. Yet should be enough for about any project. See ordered branch, isn't that cool!?

How `git-badc0de`

is implemented? I have to confess: I started with a Python prototype. Python comes with all pieces needed, though I essentially only needed `hashlib`

.

The Haskell implementation has eleven dependencies at the moment of writing. Five of them are bundled with compiler, the rest six are not. Even for some basic tasks you have to go package shopping:

`async`

to parallelize computations`base16-bytestring`

`cryptohash-sha1`

`primitive`

to write some low level code`utf8-string`

to convert from byte UTF-8`ByteString`

representation to`String`

and back.`zlib`

to compress git objects, as they are stored compressed.

My first Haskell implementation was noticeably faster than Python3 version. I suspect that is because Haskell is simply better at gluing bytes together.

The motivation to use Haskell had two parts:

- I just use Haskell for everything. (Except for prototyping silly ideas). This is the most important reason.
- Haskell is good for writing parallel programs. This is a bonus.

To my surprise, my first Haskell parallelization attempt didn't work at all. An idea is to spawn multiple workers, which would try different salts. And then make them race, until one worker finds a valid salt. Adding more workers should not slowdown the overall program, minus maybe some small managerial overhead.

The overhead turned out to be quite large. Parallelism in Haskell works well when you deal with Haskell "native" data. `git-badc0de`

use case is however gluing bytes (`ByteString`

s) together and calling out to C implementation of SHA1 algorithm.

The nasty detail of, I guess any, higher level languages is that foreign function interface has culprits. I run into `foreign import unsafe`

issue. You may read about `foreign import unsafe`

in the excellent GHC User Guide.

GHCguaranteesthat garbage collection will never occur during an`unsafe`

call, ...

With many threads generating some amount of garbage, but also calling `unsafe`

foreign functions in a tight loop caused problems. Surprisingly, both `cryptohash-sha1`

and `bytestring`

use plenty of `unsafe`

calls (`cryptonite`

uses too).

My solution was to redo the loop. Less garbage generation and less foreign calls.

`cryptohash-sha1`

(and `cryptonite`

) import `_init`

, `_update`

and `_finalize`

C functions. The hashing context is allocated and plumbing done in Haskell. However, we can setup things in way such that we pass a single continuous block of memory to be hashed. This functionality is missing from the library, so I copied `cbits/`

from `cryptohash-sha1`

and added small C function, to do C plumbing in C:

```
void
const uint8_t *data, size_t len, uint8_t *out)
hs_cryptohash_sha1(
{struct sha1_ctx ctx;
hs_cryptohash_sha1_init(&ctx);
hs_cryptohash_sha1_update(&ctx, data, len);
hs_cryptohash_sha1_finalize(&ctx, out); }
```

This way we can have one `safe`

foreign call to calculate the hash. We have to make sure that pointers are pointing at pinned memory, i.e. memory which garbage collector won't move.

Next, the same problem is in the `bytestring`

library, I was naive to think that as byte data I work with is so small, that concatenating it (and thus `memcpy`

ing) won't be noticeable, hashing should dominate. Usually it isn't a problem, but as copying was done on each loop iteration and `memcpy`

is `foreign import unsafe`

in `bytestring`

library, that also contributed to slowdown. That was my hypothesis.

Figuring out how to do it better with `bytestring`

seemed difficult, so I opted for a different solution. Write some C-in-Haskell. Now each worker creates own *mutable* template, which is updated with new salt on each loop iteration. Salt length is fixed, so we don't need to change the commit object header. As a bonus, the loop become essentially non-allocating (I didn't check though).

After that change, `git-badc0de`

started to use all the cores, and not just spin in GC locks. The runtime system statistics are nice to look at

```
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.000s 0.000s 0.0004s 0.0004s
...
Productivity 100.0% of total user, 99.9% of total elapsed
```

No time is spent in garbage collection. Productivity is an amount of time used to do actual work and not collecting garbage. Disclaimer: it seems that waiting for GC locks is not counted towards GC time, but as there was only a single collection, that doesn't matter.

I could optimize further: as the salt is at the end of the content it is silly to rehash whole commit object every time. Yet, `git-badc0de`

is silly project to begin with, and I am satisfied with the current state.

The lesson here is that foreign function interface (FFI) is not easy, you have to think and test.

"Luckily" I learned about the `unsafe`

issue recently in `postgresql-libpq`

, so was able to think about it causing my problems. In this case, `unsafe`

doesn't mean that "I know what I'm doing" (as e.g. with `unsafePerformIO`

), but rather the opposite.

Also, I don't think that we (= Haskell ecosystem) have a good tooling to benchmark how code behaves in highly parallel environments. I *hope* that `Data.ByteString.Builder`

, for example, doesn't use any `unsafe`

foreign calls, Ecosystem relies on that module for constructing JSON (in `aeson`

) and HTML (both `blaze-markup`

and `lucid`

). Something for someone to test, maybe fix and document.

Does this mean that Haskell is crap, and the promise for easy parallel and concurrent programming is a lie, and we should all use Rust instead?

Well, no. In this isolated example, Rust would probably shine. There are, however, also other parts than hashing loop even in this simple program, and they have to be written as well. There Haskell feels a lot like Python, in a sense that I can just write consice code which works.

Python was quite nice in the very early prototyping stage, as it happened to have all needed functionality available in the repl. The "early prototyping stage" lasted for maybe 10 or 15 minutes, that was enough to verify that basic idea might work. With Haskell, you would need to restart repl to add a new library, losing all the context, which would killed the flow. For some other "problem"I might start to prototype directly in Haskell. I have no experience with how nice repl experience Rust has.

If `git-badc0de`

project were to develop further, I would rewrite the hashing loop in C, instead of writing C-in-Haskell. Maybe. Or in Rust, if I that was easier to setup. (GHC knows how to invoke C compiler).

Haskell is a great glue language, among many other great properties it has. Don't believe anyone who tells you otherwise.

There is prior art, this is not a novel idea. Just search the internet for *git commit prefix miner*, e.g.

- Rust version: https://github.com/gunnihinn/git-commit-mine
- Bash: https://github.com/franckverrot/git-mine-commit

or *git commit vanity hash*

- Python and OpenCL: https://github.com/tochev/git-vanity
- Ruby: https://github.com/mattbaker/git-vanity-sha

None is meta-used, i.e. use themselves on own commits, so I cannot be sure that they work :)

Eight base16 characters (i.e. 4 bytes) took one and half hour of CPU time or "just" 5.5 minutes of wall clock time. I run that experiment only few times. Take a look at deadc0de branch.↩︎

If your programming language of choice is a compiled one.↩︎

Michael PJ recently wrote a post about Lenses for Tree Traversals. In a r/haskell discussion there is a comment which got my attention.

And here is the problem. With mutually recursive datatypes even with generics we can't write generic type-safe traversal. We have to do with boilerplate.

*Challenge accepted.*

As pointed out on Twitter, the approach below is similar to what `multiplate`

library by Russell O'Connor gives combinators for. It's usable with `lens`

too.

```
{-# LANGUAGE RankNTypes #-}
-- For GPlated
{-# LANGUAGE DeriveGeneric, TypeOperators #-}
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, FlexibleContexts #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE UndecidableInstances #-}
{-# OPTIONS_GHC -Wall #-}
module MutualTraversals where
import Control.Lens (transformOf)
import GHC.Generics
```

I will use the same example as Michael, simply implemented, simply typed lambda calculus.

```
type Name = String
data Type = IntegerType | FunType Type Type deriving (Eq, Show)
```

but instead of direct variant

```
data Term =
Var Name
| Lam Name Type Term
| App Term Term
| Plus Term Term
| Constant Integer
```

let us have bidirectional version. (I just wrote a post about Bidirectional Pure Type Systems, so an obvious choice).

Bidirectionality forces us to think about `Plus`

. A solution is to add an additional constructor.

```
data Syn
= Var Name
| App Syn Chk
| Ann Chk Type
deriving (Show, Generic)
data Chk
= Lam Name Chk -- note, no type annotation
| Constant Integer
| UnaryPlus Integer Syn -- additional constructor
| Plus Syn Syn
| Conv Syn
deriving (Show, Generic)
```

Note, how `UnaryPlus`

and `Plus`

are "stuck" on `Syn`

terms.

The goal is fold constants. Interesting stuff happens in checkable terms, `Chk`

.

```
cfChk :: Chk -> Chk
= case t of
cfChk t UnaryPlus n (Ann (UnaryPlus m p) _) -> UnaryPlus (n + m) p
UnaryPlus n (Ann (Constant m) _) -> Constant (n + m)
Plus (Ann (UnaryPlus n m) ty) p -> UnaryPlus n (Ann (Plus m p) ty)
Plus n (Ann (UnaryPlus m p) ty) -> UnaryPlus m (Ann (Plus n p) ty)
Plus (Ann (Constant n) _) m -> UnaryPlus n m
Plus n (Ann (Constant m) _) -> UnaryPlus m n
-> t _
```

We have more rules because we added `UnaryPlus`

, but we can fold more constants, exploiting the commutativity and associativity of addition.

But how to write `constantFold`

, we have two mutually recursive types. The answer is obvious after you hear it. If you have two mutually recursive types, then there are two mutually recursive traversals.

They look like monomorphic `Bitraversable`

.

Let me define helper type-aliases:

```
type Star f a b = a -> f b
type LensLike' f s a = Star f a a -> Star f s s
type BilensLike' f s a b = Star f a a -> Star f b b -> Star f s s
type Traversal' s a = forall f. Applicative f => LensLike' f s a
```

Then we can define bitraversals:

```
chkSubterms' :: Applicative f => BilensLike' f Chk Syn Chk
Lam n x) = Lam n <$> chk x
chkSubterms' _syn chk (@Constant{} = pure t
chkSubterms' _syn _chk tUnaryPlus n x) = UnaryPlus n <$> syn x
chkSubterms' syn _chk (Plus x y) = Plus <$> syn x <*> syn y
chkSubterms' syn _chk (Conv x) = Conv <$> syn x
chkSubterms' syn _chk (
synSubterms' :: Applicative f => BilensLike' f Syn Syn Chk
App f x) = App <$> syn f <*> chk x
synSubterms' syn chk (Ann x t) = Ann <$> chk x <*> pure t
synSubterms' _syn chk (@Var {} = pure t synSubterms' _syn _chk t
```

And using these we can define

```
chkSubterms :: Traversal' Chk Chk
= chkSubterms' aux f where aux = synSubterms' aux f chkSubterms f
```

The above definition is slightly complicated. We have to make recursive `aux`

to drill through `Syn`

terms until it finds `Chk`

terms.

But after all the setup, we can define `constantFold`

.

```
constantFold :: Chk -> Chk
= transformOf chkSubterms cfChk constantFold
```

Let us also try it out. We are going to write a redundant program, there are plenty of type annotations highlighting that.

```
expr1 :: Chk
expr1= Plus (annZ (Constant 2))
$ annZ $ Plus (Var "x")
$ annZ (Constant 3)
where
= Ann n IntegerType annZ n
```

After two iterations of `constantFold`

we get completely simplified result:

```
*MutualTraversals> constantFold expr1
UnaryPlus 3 (Ann (Plus (Ann (Constant 2) IntegerType) (Var "x")) IntegerType)
*MutualTraversals> constantFold $ constantFold expr1
UnaryPlus 5 (Var "x")
```

*It works*.

To complete the challenge, we need to write `chkSubterms'`

and `synSubterms'`

generically. If we are allowed to use Template Haskell, that would be as straight forward as writing Template Haskell is. Nor I see any immediate problems generalizing `GPlated`

definitions to generate bitraversals.

EDIT: Later today I added `GPlated2`

in appendix. It is straight forward generalization of `GPlate`

implementation in `lens`

.

```
chkSubterms2' :: Applicative f => BilensLike' f Chk Syn Chk
synSubterms2' :: Applicative f => BilensLike' f Syn Syn Chk
= gplate2 g f
chkSubterms2' f g = gplate2 f g
synSubterms2' f g
chkSubterms2 :: Traversal' Chk Chk
= chkSubterms2' aux f where aux = synSubterms2' aux f
chkSubterms2 f
constantFold2 :: Chk -> Chk
= transformOf chkSubterms2 cfChk constantFold2
```

*It works.*

```
*MutualTraversals> constantFold2 expr1
UnaryPlus 3 (Ann (Plus (Ann (Constant 2) IntegerType) (Var "x")) IntegerType)
*MutualTraversals> constantFold2 $ constantFold2 expr1
UnaryPlus 5 (Var "x")
```

So you don't even need to define boilerplate by hand.

We can define a `Plate`

type like `multiplate`

library advises.

```
data Plate f = Plate
chkPlate :: Star f Chk Chk
{ synPlate :: Star f Syn Syn
, }
```

and define a value

```
synChkPlate :: Applicative f => Plate f -> Plate f
= Plate
synChkPlate p = chkSubterms' (synPlate p) (chkPlate p)
{ chkPlate = synSubterms' (synPlate p) (chkPlate p)
, synPlate }
```

Now it's easy to see how you would add `Type`

traversals to the mix.

I could also refute Michael's comment

recursion-schemes does badly with mutually recursive types. If this is a problem for you, you’ll realize pretty quickly.

The `recursion-schemes`

itself cannot deal with mutually recursive types, but the approach can be generalized. In this post we used stuff beyond `lens`

as well.

I'll leave that for a future post. (Or you can look into https://hackage.haskell.org/package/multirec and paper which explains it).

Note, that Michael slightly *cheats* in counting nodes with `Folds`

:

```
-- ... plus the number of nodes in all the subterms
<> foldMapOf termSubterms countTermNodes t
-- ... plus the number of nodes in all the subtypes
<> foldMapOf termSubtypes countTypeNodes t
```

here he examines the same `t`

twice. First looking for subterms, and then for subtypes.

With his definition of (unidirectional) `Term`

, he could use bitraversal to look for types and terms simultaneously!

Appendix: GPlated2

```
-- | Implement 'plate' operation for a type using its 'Generic' instance.
gplate2 :: (Generic a, GPlated2 a b (Rep a), Applicative f)
=> BilensLike' f a a b
= GHC.Generics.to <$> gplate2' f g (GHC.Generics.from x)
gplate2 f g x
class GPlated2 a b g where
gplate2' :: Applicative f => BilensLike' f (g p) a b
instance GPlated2 a b f => GPlated2 a b (M1 i c f) where
M1 x) = M1 <$> gplate2' f g x
gplate2' f g (
instance (GPlated2 a b f, GPlated2 a b g) => GPlated2 a b (f :+: g) where
L1 x) = L1 <$> gplate2' f g x
gplate2' f g (R1 x) = R1 <$> gplate2' f g x
gplate2' f g (
instance (GPlated2 a b f, GPlated2 a b g) => GPlated2 a b (f :*: g) where
:*: y) = (:*:) <$> gplate2' f g x <*> gplate2' f g y
gplate2' f g (x
instance {-# OVERLAPPING #-} GPlated2 a b (K1 i a) where
K1 x) = K1 <$> f x
gplate2' f _ (
instance {-# OVERLAPPING #-} GPlated2 a b (K1 i b) where
K1 x) = K1 <$> g x
gplate2' _ g (
instance GPlated2 a b (K1 i c) where
= pure
gplate2' _ _
instance GPlated2 a b U1 where
= pure
gplate2' _ _
instance GPlated2 a b V1 where
= v `seq` error "GPlated2/V1" gplate2' _ _ v
```

This are my notes, where I write things down to try to clarify my own thoughts. All the mistakes are my own.

I try to show show the rules for Pure Type Systems in a bidirectional type-checking style.

This post was motivated by me thinking why Conor McBride has sorts and function types as checkable types in his systems. For example look at *I Got Plenty o’ Nuttin’* paper (definition 4 for syntax, and definition 17 for typing judgements in the linked version).

I present two variants of bidirectional pure type systems. In the first variant type formers are inferrable terms. It's slightly different then one used in *Lambda Pi: A tutorial implementation of a dependently typed lambda calculus*, and generalized to arbitrary (single sorted) PTS.

Type formers in the second system are checkable. This is requirement for cumulative universes, where types of types are not unique.

This is the review of Barendregt (*Lambda calculi with types*, 1992).

A Pure Type System (PTS) is a type system with following syntax

The type judgement of PTS is parameterized by a *specification*

- is a subset of constants, called
*sorts* - is a set of
*axioms*in the form , with constant and - is a set of
*rules*of the form with .

Typical examples are *simply typed lambda calculus* (without products or sums),

or a predicative system with an infinite hierarchy of sorts (like in Agda), which some call :

Barendregt shows a lot of properties for pure type systems, One important result is that if PTS is *single sorted*:

in other words and are partial functions, then terms have unique types

A corollary of that property, is that beta-reduction preserves types.

and systems are single sorted.

Barendregt gives *declarative* typing judgements for the PTS. Here I omit writing context in every rule, and only show how contexts are extended. Reverse is used to lookup in the context.

Bidirectional systems have two syntactical groups, which McBride calls *terms* and *elimination*. I choose to call them *check terms* and *synth terms*, as I find the name elimination somewhat misleading.

There are three groups of terms: there are *introduction* and *elimination* forms, but also *type formations*, because types appear in terms.

One valid approach is to divide introduction, elimination and type formers arbitrarily into check and synth terms, trying to minimize the need for the type-annotations.

**Example: Booleans** We know that and , thus introduction forms can be synth terms. In the boolean elimination (`if`

expression) scrutinee have to be Boolean, so we can check that. As we don't know anything about the type of branches, we can decide to avoid type annotation for the result type and ask it to be provided.

Here I use the Conor's notation to make type judgments be "executable" in the clockwise order. is read as "check that has type ", and as "infer (or synthesize) a type of which will be ".

These are the rules you will find in David Christiansen tutorial.

As said, this is a valid design strategy.

Another approach is to take normal forms, which are divided into neutral and normal terms, and add double agent, cut as a type annotation construct to normal forms.

Frank Pfenning in his lectures talks about natural deduction and *verifications* (and sequent calculus and cut elimination. I haven't been in his class, but have watched his and Brigitte Pientka's OPLSS lectures on Proof Theory. The atomic proposition are allowed to meet

which is restricted version of conversion rule we will have. (Restricting to atomic propositions means that you need to -expand everything in normal forms). The opposite rule

is analogous to *cut* rule in sequent calculus, i.e. shows where the derivation can be simplified. This rule is admissible.

We can design bidirectional systems so the cut-rule has second purpose as type-annotation rule. This way, we know precisely where we need to perform reductions to normalize terms. As far as I understand, this is the design principle Conor McBride advocates for.

As a consequence, all introduction forms have to be checkable terms. This is natural for function abstraction (which is introduction form for function types), but feels less natural for "data". With these design principles, rules for Booleans are:

Now we **cannot** write

as is check term, but the scrutinee of if have to be synth term. If we want to write such redundant program, it have to be written as

Notice, as there is a type annotation, we *know* that expression can be reduced:

**Exercises**

Does if expression itself have to be checkable term, can't we have it as inferrable?

What problems will occur? Hint: Nest things.

This is an example why I don't think that calling inferrable terms eliminations is good idea. Some elimination forms have to be checkable.

The sum types, even Boolean, are however quite challenging.

In this section I will describe a bidirectional pure type system. I have proved nothing about this system, but I have implemented it. I relies on PTS specification to be single sorted.

It is fun to instantiate the implementation to different specifications and see what is possible or not. A specific highlight is to see Hurkens Paradox to fail to type-check in one system, and fail to terminate in other.

**Check and synth terms.** We have the same five syntactical forms as previously, and two new ones to allow conversions between check and synth terms. Unlike McBride, type formation forms are synth terms. Also type in type annotation have to be synth term. Notice that lambda term don't have argument type annotation.

The typing rules are syntax directed, we have one rule for each syntactic form.