Another approach to embedding org-source in html

| categories: orgmode, data | tags:

In this post I examined a way to embed the org-source in a comment in the html of the post, and developed a reasonably convenient way to extract the source in emacs. One downside of the approach was the need to escape at least the dashes, and then unescape them on extraction. I came across another idea, which is to put the org-source in base64 encoded form in a data uri .

First let us see what the encoding means:

(base64-encode-string "<!-- test-->")
PCEtLSB0ZXN0LS0+

And decoding:

(base64-decode-string "PCEtLSB0ZXN0LS0+")
<!-- test-->

The encoding looks random, but it is reversible. More importantly, it probably will not have any html like characters in it that need escaped. The idea of a data uri is that the data it serves is embedded in the URL href attribute. This is basically how to make a data uri. We give the url here a class so we can find it later.

<a class="some-org-source" href="data:text/plain;charset=US-ASCII;base64,PCEtLSB0ZXN0LS0+">source</a>

Here is the actual html for the browser. If you click on it, your browser automatically decodes it for you!

source

So, during the blog publish step, we just need to add this little step to the html generation, and it will be included as a data uri. Here is the function that generates the data uri for us, and example of using it. The encoded source is not at all attractive to look at it, but you almost never need to look at it, it is invisible in the browser. Interestingly, if you click on the link, you will see the org source right in your browser!

(defun source-data-uri (source)
  "Encode the string in SOURCE to a data uri."
  (format
   "<a class=\"org-source\" href=\"data:text/plain;charset=US-ASCII;base64,%s\">source</a>"
   (base64-encode-string source)))

(source-data-uri (buffer-string))
source

Now, we integrate it into the blogofile function:

(defun bf-get-post-html ()
  "Return a string containing the YAML header, the post html, my
copyright line, and a link to the org-source code."
  (interactive)
  (let ((org-source (buffer-string))
        (url-to-org (bf-get-url-to-org-source))
        (yaml (bf-get-YAML-heading))
        (body (bf-get-HTML)))

    (with-temp-buffer
      (insert yaml)
      (insert body)
      (insert
       (format "<p>Copyright (C) %s by John Kitchin. See the <a href=\"/copying.html\">License</a> for information about copying.<p>"
               (format-time-string "%Y")))
      (insert (format "<p><a href=\"%s\">org-mode source</a><p>"
                      url-to-org))
      (insert (format "<p>Org-mode version = %s</p>" (org-version)))
      ;; this is the only new code we need to add.
      (insert (source-data-uri org-source))
      ;; return value
      (buffer-string))))

Now we need a new adaptation of the grab-org-source function. We still need a regexp search to get the source, and we still need to decode it.

(defun grab-org-source (url)
  "Extract org-source from URL to a buffer named *grab-org-source*."
  (interactive "sURL: ")
  (switch-to-buffer (get-buffer-create "*grab-org-source*"))
  (erase-buffer)
  (org-mode)
  (insert
   (with-current-buffer
       (url-retrieve-synchronously url)
     (let (start)
       (re-search-forward
        "<a class=\"org-source\" href=\"data:text/plain;charset=US-ASCII;base64,\\([^\"]*\\)\\\">" nil t)
       (base64-decode-string  (match-string 1))))))

What else could we do with this? One idea would be to generate data uris for each code block that you could open in your browser. For example, here we generate a list of data uris for each code block in the buffer. We don't take care to label them or make it easy to see what they are, but if you click on one, you should see a plain text version of the block. If this is done a lot, it might even make sense to change the mime type to download the code in some native app.

(org-element-map (org-element-parse-buffer) 'src-block
  (lambda (src-block)
    (source-data-uri (org-element-property :value src-block))))
(source source source source source source)

I am not sure if this is better or worse than the other approach. I have not tested it very thoroughly, but it seems like it should work pretty generally. I imagine you could also embed other kinds of files in the html, if for some reason you did not want to put the files on your server. Overall this seems to lack some elegance in searching for data, e.g. like RDF or RDFa is supposed to enable, but it might be a step in that direction, using org-mode and Emacs as the editor.

Copyright (C) 2015 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.10

source
Discuss on Twitter