Skip to content

Conversation

@bhagany
Copy link
Contributor

@bhagany bhagany commented Nov 18, 2016

Executive summary:

  • separated markdown rendering from custom function rendering
  • split rendering into phases
    • generating data, a map of {"page" {:render-data ... :entry ...}}
    • rendering to file
    • metadata modification
  • removed :groupby option from collection task
  • added new assortment task, which replaces :groupby and adds more flexibility

For those not following along on Slack, this PR was motivated by wanting some rendering functionality that wasn't available out of the box. The use case is: suppose you have a blog with multiple tags per article, and you want to generate a collection for each tag without knowing what the tags are ahead of time. The :groupby functionality in collection isn't sufficient, because it can only group each article into a single collection. You could write your own task, but this is rather involved, because perun doesn't expose its rendering machinery to client code. So, an enhancement to perun itself seemed appropriate.

This is by no means a final product, and it's quite likely that I've made changes here that have knock-on effects that I haven't taken into account, so please feel free to be critical.

The first thing I wanted to do was make render and collection more like each other, so I abstracted their common operations into the render-pre-wrap function. As a part of this, I also made a distinction between rendering markdown, and rendering via a custom function a la render or collection. The former still results in a :content metadata key, while the latter now produces a :body key. This was necessary in order to have a common abstraction between render and collection, but it also "feels" better to me. Now, :content always contains an intermediate form, while :body always contains the final product, instead of both kinds of data being mixed into :content.

Once that was accomplished, I added a new level to perun's content taxonomy, which was previously two things: articles and collections. I called the new level "assortments". Here's how I think about it:

  • articles originate from markdown files
  • collections are groups of articles
  • assortments are groups of collections

Instead of passing a group-by predicate, as you would to collection, assortment takes a user-defined :grouper function that can implement any sort of grouping you might want. To achieve the familiar :groupby behavior, you can pass #(group-by your-predicate %) as the grouper argument.

I removed the :groupby option from collection so that there would be a clear separation between collections and assortments, but it could easily be added to my implementation if backwards compatibility is a concern. Aside from this and the addition of assortment, the API should be the same.

Please let me know what you think.

- separated markdown rendering from custom function rendering
- split rendering into phases
  - generating data, a map of `{"page" {:render-data ... :entry ...}}`
  - rendering to file
  - metadata modification
- remove `:groupby` option from `collection` task
- add new `assortment` task, which replaces `:groupby` and adds more flexibility
Copy link
Contributor

@martinklepsch martinklepsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few quick thoughts, may take a more thorough look later.

src/io/perun.clj Outdated
"Generate Atom feed"
[f filename FILENAME str "generated Atom feed filename"
_ filterer FILTER code "predicate to use for selecting entries (default: `:content`)"
_ filterer FILTER code "predicate to use for selecting entries (default: `:body`)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change to :body as default filterer wouldn't all files show up in feeds/sitemap etc? Like we probably don't want index.html to show up in an atom feed for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, but I don't think this is a change in behavior. Previously by default, all files touched by render and collection would have a :content key. For render, it's because :content was the default filterer, and for collection, :content was explicitly set in the task (https://github.com/hashobject/perun/blob/master/src/io/perun.clj#L498). You didn't mention how your example index.html was generated, but I assume it was either via render or collection? I think in order to exclude index.html from an atom feed, a custom filterer would have needed to be supplied.

Copy link
Contributor

@martinklepsch martinklepsch Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • All files the render and collection tasks rendered had to have a :content key.
  • That key was set by the markdown task usually but could have been set by other tasks in theory.
  • I think setting :content to the rendered HTML inside the collection task (as in the line you linked) does not make much sense but I might be missing something.
  • Now with :body being the default filterer tasks after the render and collection tasks would see all the files rendered by those tasks. Previously they only respected files that had the :content key — in this case indicating they are original units of content. (With the exception of files rendered by the collection task. Not sure why :content is being set there.)

It seems that the :content key was added to the files generated by the collection task here: 69a7149. I think overloading the meaning of the :content key like this can end up being very confusing so I'm thinking that maybe we should reconsider this one @podviaznikov (perhaps use :body or something similar as @bhagany does in this PR)

In this regard it may also make sense to start using namespaced keywords like io.perun/markdown or whatever to indicate what can be expected to be the value of these keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm, replying by email didn't put my comments in the right place. My apologies, they should have gone here.

(-> fileset
(perun/merge-meta new-metadata)
(commit tmp))))))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to add docstrings to those fns if we end up merging them :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, will do

src/io/perun.clj Outdated
"Generate Atom feed"
[f filename FILENAME str "generated Atom feed filename"
_ filterer FILTER code "predicate to use for selecting entries (default: `:content`)"
_ filterer FILTER code "predicate to use for selecting entries (default: `:body`)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, my interpretation was that :content was added to the metadata in collection so that collection pages would be present in sitemaps by default. I agree that something more fine-grained would be desirable. Seems like that should be a separate pull request though?

On Nov 18, 2016, 8:06 AM -0600, Martin Klepsch [email protected], wrote:

In src/io/perun.clj (#109 (comment)):

:out-dir "public"}) > > (deftask atom-feed > "Generate Atom feed" > [f filename FILENAME str "generated Atom feed filename" > - _ filterer FILTER code "predicate to use for selecting entries (default: :content)" > + _ filterer FILTER code "predicate to use for selecting entries (default: :body)"
All files the render and collection tasks rendered had to have a :content key.
That key was set by the markdown task usually but could have been set by other tasks in theory.
I think setting :content to the rendered HTML inside the collection task (as in the line you linked (https://github.com/hashobject/perun/blob/master/src/io/perun.clj#L498)) does not make much sense but I might be missing something.
Now with :body being the default filterer tasks after the render and collection tasks would see all the files rendered by those tasks. Previously they only respected files that had the :content key — in this case indicating they are original units of content. (With the exception of files rendered by the collection task. Not sure why :content is being set there.)

It seems that the :content key was added to the files generated by the collection task here: 69a7149 (69a7149). I think overloading the meaning of the :content key like this can end up being very confusing so I'm thinking that maybe we should reconsider this one @podviaznikov (https://github.com/podviaznikov) (perhaps use :body or something similar as @bhagany (https://github.com/bhagany) does in this PR)

In this regard it may also make sense to start using namespaced keywords like io.perun/markdown or whatever to indicate what can be expected to be the value of these keys.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (https://github.com/hashobject/perun/pull/109/files/fc71654711af0c716d5374eb045a3bdedaa7fef3#r88661956), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAbIcpUif3NwAKNjYfWDo21DckWthHktks5q_bDvgaJpZM4K2Mwt).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just realized I could resolve the immediate tension by using :content as the default filterer for tasks like rss-feed and atom-feed, and :body for sitemap. I'll make that change when I'm back at my computer.

On Nov 18, 2016, 8:06 AM -0600, Martin Klepsch [email protected], wrote:

In src/io/perun.clj (#109 (comment)):

:out-dir "public"}) > > (deftask atom-feed > "Generate Atom feed" > [f filename FILENAME str "generated Atom feed filename" > - _ filterer FILTER code "predicate to use for selecting entries (default: :content)" > + _ filterer FILTER code "predicate to use for selecting entries (default: :body)"
All files the render and collection tasks rendered had to have a :content key.
That key was set by the markdown task usually but could have been set by other tasks in theory.
I think setting :content to the rendered HTML inside the collection task (as in the line you linked (https://github.com/hashobject/perun/blob/master/src/io/perun.clj#L498)) does not make much sense but I might be missing something.
Now with :body being the default filterer tasks after the render and collection tasks would see all the files rendered by those tasks. Previously they only respected files that had the :content key — in this case indicating they are original units of content. (With the exception of files rendered by the collection task. Not sure why :content is being set there.)

It seems that the :content key was added to the files generated by the collection task here: 69a7149 (69a7149). I think overloading the meaning of the :content key like this can end up being very confusing so I'm thinking that maybe we should reconsider this one @podviaznikov (https://github.com/podviaznikov) (perhaps use :body or something similar as @bhagany (https://github.com/bhagany) does in this PR)

In this regard it may also make sense to start using namespaced keywords like io.perun/markdown or whatever to indicate what can be expected to be the value of these keys.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (https://github.com/hashobject/perun/pull/109/files/fc71654711af0c716d5374eb045a3bdedaa7fef3#r88661956), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAbIcpUif3NwAKNjYfWDo21DckWthHktks5q_bDvgaJpZM4K2Mwt).

@bhagany
Copy link
Contributor Author

bhagany commented Nov 18, 2016

I forgot to mention above - render-pre-wrap is partially intended to make writing perun-compatible render tasks in client code easier.

I've also added docstrings and changed the default filterer for rss and atom-feed back to :content

…ctions

Since collections are entries in their own right, their metadata should be accessible to render functions. This commit also allows custom metadata to be set for assortments.
@bhagany
Copy link
Contributor Author

bhagany commented Nov 19, 2016

While playing around with the assortment functionality, I had a desire for collection-specific metadata to be available in my render functions. It turned out to be a small addition, so I decided to submit it for consideration here. I also added the ability to specify collection-level metadata in custom :grouper functions for the assortment task.

@podviaznikov
Copy link
Member

Overall I liked this a lot.
I have some trouble understanding having :body property. Something is confusing to me here. Maybe it's just name? Can be difficult to now what is the difference between :content and :body.
Also I do like assortment task and cleanup of collection. However not sure about name. Is there is other variant we can come up with?

@bhagany
Copy link
Contributor Author

bhagany commented Nov 24, 2016

Great, I'm glad you liked it! I'm definitely open to ideas on the names.

My reason for differentiating between :content and :body is that, in master, :content can contain two kinds of data. markdown populates :content with a page fragment, whereas collection populates :content with the whole rendered page. In this PR, :content is always your parsed input, and :body is always the rendered output, and to me, this is clearer. Also, as @martinklepsch later pointed out, two keys are necessary to distinguish between your input content, and data that is derived from your input. For instance, the output of render should be put into RSS feeds and sitemaps, while the output of collection should only be in sitemaps.

As for naming, I'll explain how I came up with the names, to get the conversation started.

  • I decided on :body because that's what the non-header part of an HTTP message is called, and that's what I intended the key to mean. There is some potential for confusion with things like an HTML <body>, though.
  • I came by assortment through an analogy. If you are at a dinner party and see an assortment of cheese, I picture a plate with several groups of different kinds of cheese grouped together. In the analogy, each piece of cheese is an entry, each group of the same kind of cheese is a collection, and the whole plate is an assortment. I also considered supercollection, aggregation, and cluster. The main difficulty here for me is, words that could mean "a collection of collections" can also mean just "collection".

Happy to make changes if anyone has suggestions, please let me know.

@podviaznikov
Copy link
Member

  • I understand now the need for :body and :content. That makes sense to me. Name also seems reasonable if there are no other ideas. I tried to think something like raw-content and rendered-content but that is too difficult. The only problem with :body and :content is that it might be hard to tell from name which one is original input and which one is rendered
  • render-pre-wrap refactor looks great.
  • Still thinking on assortment. What would be downsides of such new task? Would it be possible to make collection more flexible more? In the :assortment case there is clear separation between two tasks but there is price in terms of understanding that you need another task to have generated collection page.

@MartyGentillon
Copy link
Contributor

I would argue that collection and assortment shouldn't populate :content with a completely rendered page. Instead, it should only do the rendering that is unique to the collection. After all, a collection of posts should still have the same headers and sidebars as the rest of the site.

@MartyGentillon
Copy link
Contributor

Also, if we want to filter the rss feeds, then it is better done by adding a new key to the metadata. Something like :include-rss or :skip-rss. This way the code communicates it's intent better.

@bhagany
Copy link
Contributor Author

bhagany commented Nov 28, 2016

Still thinking on assortment. What would be downsides of such new task? Would it be possible to make collection more flexible more? In the :assortment case there is clear separation between two tasks but there is price in terms of understanding that you need another task to have generated collection page.

The only downside that comes to mind is the one you mentioned - the overhead of having a new task that does a closely-related thing. To me, it is clearer to draw a distinction between a task that creates one page, and a task that creates many. However, we could roll the functionality of both tasks into one, and still call the combined task collection. The only cost there (besides losing the one/many split) would be that reimplementing the :groupby option becomes a little awkward. If you're fine with :groupby staying gone, I'm open to this route.

I would argue that collection and assortment shouldn't populate :content with a completely rendered page. Instead, it should only do the rendering that is unique to the collection. After all, a collection of posts should still have the same headers and sidebars as the rest of the site.

I'm a little confused here - collection and assortment don't populate :content at all, they populate :body with a completely rendered page, just like render does. However, if I'm reading your intent correctly, are you proposing that collection and assortment should produce a page fragment instead of a whole page? And then, the final product would be produced by render just as it is for first-order content. If so, I think that idea deserves some exploration. Some initial thoughts on that:

  • In this case, it would be appropriate for collection and assortment to populate :content with a page fragment, like markdown does
  • the :renderer argument should be renamed, to differentiate it from render's :renderer
  • replicating current functionality, where render and collection use separate renderers would be greatly helped by the {:perun/trace [:content :render]} feature you mentioned in Pluggable parsing #113
  • changing the kind of output collection (and assortment, if it survives) generates feels like a separate PR to me

Also, if we want to filter the rss feeds, then it is better done by adding a new key to the metadata. Something like :include-rss or :skip-rss. This way the code communicates it's intent better.

For this, we'd need a more convenient way to add metadata keys to collections/assortments. But I think you're right.

@MartyGentillon
Copy link
Contributor

First, I am not sure I like the separate keys for markdown and render. If nothing else, because it prevents multiple independent render steps from being pipelined. In my mind, the :content key contains the current representation of the resource at what ever stage of processing you currently have it. Before the front-matter is processed, it doesn't really even contain markdown. After you have finished your final render, it's probably complete HTML web pages. Between, it could be whatever intermediate form makes sense to feed to the next step for that file / set of files.

are you proposing that collection and assortment should produce a page fragment instead of a whole page? And then, the final product would be produced by render just as it is for first-order content. If so, I think that idea deserves some exploration. Some initial thoughts on that:

Essentially. From my perspective, there isn't a lot of difference between the output of a collection task and a markdown task. When I build a website, there is usually a lot of front matter that I want shared between all my pages. Render is what puts that on. This way any shared content can be in the top level renderer, and I don't have to call it from within my collection task or renderer. (I think that, currently, if you put the collection task before the render task, that is what would happen.)

As for how to add metadata keys to collections/assortments, might I suggest an option that adds a metadata key to every file output from the task. That way you could say if something should be in the rss feed by saying:

(collection :renderer 'my.pkg/collection-renderer :page "index.html" :meta {:skip-feed true})

Also, I certainly think that assortment has it's uses. For instance, right now, I have a situation where I have a bunch of collections of images that need index pages generated. One for each set of pictures. It sounds like an assortment task might prevent me from having to write that from scratch.

@bhagany
Copy link
Contributor Author

bhagany commented Nov 29, 2016

First, I am not sure I like the separate keys for markdown and render. If nothing else, because it prevents multiple independent render steps from being pipelined. In my mind, the :content key contains the current representation of the resource at what ever stage of processing you currently have it.

Okay, I've let this one percolate for a while, and I think I can get on board with it. It definitely makes arbitrary pipelining easier, and I like the run-everything-through-render part. It also resolves any lingering doubts about the :body key.

I think I still like the content task for the initial step in the pipeline, but the case for it is weaker if we go with @MartyGentillon's approach. I'd appreciate some input on whether people would like to see content stick around, or go back to separate tasks for each markup language.

As for how to add metadata keys to collections/assortments, might I suggest an option that adds a metadata key to every file output from the task.

I like this too.

I'm not sure when I'll get back around to this, but hopefully soon. In the meantime, please weigh in - I always like having buy-in before the coding :)

@bhagany
Copy link
Contributor Author

bhagany commented Nov 29, 2016

I guess the part about the content task belongs in #113, but both of these pull requests are starting to flow into each other. That's probably not all bad, but I apologize for making the conversation more confusing. Feel free to respond about content on the other PR, if you'd like.

@MartyGentillon
Copy link
Contributor

As for content, I sort of like it, especially when it is setup to make adding new parsers / selecting parsers easy, though I might call it parse-content.

@MartyGentillon
Copy link
Contributor

really, when it comes to parse-content, I could go 50-50. I had always considered that I wanted to break frontmatter processing out of the markdown step, in which case I would probably want to make the markdown step be the combination of the frontmatter and parse-markdown steps. content could be much the same way. (I am not suggesting that you should break the frontmatter out, just that if the step that reads the file, handles front matter, and does the parsing is named content then parse-content would be a logical name for the thing that only does parsing.)

@bhagany
Copy link
Contributor Author

bhagany commented Nov 30, 2016

Brought this branch up to date with @Deraen's render pod improvements.

@bhagany
Copy link
Contributor Author

bhagany commented Dec 1, 2016

I've had some time to think more about this, and I've come to the conclusion that this pull request will grow to gargantuan proportions if I let it. So I'm going to take a step back and concentrate on some smaller, standalone pieces as separate PR's, while still aiming at the overall goal of making an arbitrary rendering pipeline possible. I'm also going to let #113 sit for a while, because decisions here will affect decisions there, and possibly make #113 obsolete altogether. Stay tuned.

bhagany added a commit to bhagany/perun that referenced this pull request Dec 5, 2016
This is a subset of the changes in hashobject#109 that seemed to get the most
positive response. It's almost purely a refactor intended to make
rendering more pluggable. The only new feature is the addition of
tracing to (aka `:io.perun/trace` metadata) `render` and `collection`.
@bhagany
Copy link
Contributor Author

bhagany commented Dec 5, 2016

I've split the ideas here into several different PR's and future PR's, and I'm pretty happy with that direction, so I'm closing this PR in favor of the others.

@bhagany bhagany closed this Dec 5, 2016
@bhagany bhagany mentioned this pull request Jan 8, 2017
This was referenced Jan 22, 2017
@bhagany bhagany deleted the render-ideas branch January 30, 2017 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants