Step 2: Building References & API docs¶
Concepts¶
Referencing¶
Another important Sphinx feature is that it allows referencing across documents. This is another powerful way to tie documents together.
The simplest way to do this is to define an explicit reference object:
.. _reference-name:
Cool section
------------
Which can then be referenced with :ref:
:
:ref:`reference-name`
Which will then be rendered with the title of the section Cool section.
Sphinx also supports :doc:`docname`
for linking to a document.
Semantic Descriptions and References¶
Sphinx also has much more powerful semantic referencing capabilties, which knows all about software development concepts.
Say you’re creating a CLI application. You can define an option for that program quite easily:
.. option:: -i <regex>, --ignore <regex>
Ignore pages that match a specific pattern.
That can also be referenced quite simply:
:option:`-i`
Sphinx includes a large number of these semantic types:
- Module
- Class
- Method
External References¶
Sphinx also includes a number of pre-defined references for external concepts. Things like PEP’s and RFC’s:
You can learn more about this at :pep:`8` or :rfc:`1984`.
You can read more about this in the Sphinx Inline markup docs.
Automatically generating this markup¶
Of course, Sphinx wants to make your life easy.
It includes ways to automatically create these object definitions for your own code.
This is called audodoc
,
which allows you do to syntax like this:
.. automodule:: crawler
and have it document the full Python module importable as crawler
.
You can also do a full range of auto functions:
.. autoclass::
.. autofunction::
.. autoexception::
Warning
The module must be importable by Sphinx when running. We’ll cover how to do this in the Tasks below.
You can read more about this in the Sphinx autodoc
docs.
Tasks¶
Referencing Code¶
Let’s go ahead and add a cookbook to our documentation. Users will often come to your project to solve the same problems. Including a Cookbook or Examples section will be a great resource for this content.
In your cookbook.rst
,
add the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | Cookbook
Crawl a web page
The most simple way to use our program is with no arguments.
Simply run:
python main.py -u <url>
to crawl a webpage.
Crawl a page slowly
To add a delay to your crawler,
use -d:
python main.py -d 10 -u <url>
This will wait 10 seconds between page fetches.
Crawl only your blog
You will want to use the -i flag,
which while ignore URLs matching the passed regex::
python main.py -i "^blog" -u <url>
This will only crawl pages that contain your blog URL.
|
Note
Live Preview: Cookbook
Remember, you will need to use :option:
blocks here.
This is because they are referencing a command line option for our program.
Adding Reference Targets¶
Now that we have pointed at our CLI options,
we need to actually define them.
In your cli.rst
file,
add the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | Command Line Options
These flags allow you to change the behavior of Crawler.
Check out how to use them in the Cookbook.
-d <sec>, --delay <sec>
Use a delay in between page fetchs so we don't overwhelm the remote server.
Value in seconds.
Default: 1 second
-i <regex>, --ignore <regex>
Ignore pages that match a specific pattern.
Default: None
|
Note
Live Preview: Command Line Options
Here you are documenting the actual options your code takes.
Try it out¶
Let’s go ahead and build the docs and see what happens. Do a:
make html
Here you will see that the :option:
blocks magically become links to the definition.
This is your first taste of Semantic Markup.
With Sphinx,
we are able to simply say that something is a option
,
and then it handles everything for us;
linking between the definition and the usage.
Importing Code¶
Being able to define options and link to them is pretty neat. Wouldn’t it be great if we could do that with actual code too? Sphinx makes this easy, let’s take a look.
We’ll go ahead and create an api.rst
that will hold our API reference:
1 2 3 4 5 6 7 8 | Crawler Python API
Getting started with Crawler is easy.
The main class you need to care about is crawler.main.Crawler
crawler.main
automodule: crawler.main
|
Note
Live Preview: Crawler Python API
Remember, you’ll need to use the .. autoclass::
directive to pull in your source code.
This will render the docstrings of your Python code nicely.
Tell Sphinx about your code¶
When Sphinx runs autodoc,
it imports your Python code to pull off the docstrings.
This means that Sphinx has to be able to see your code.
We’ll need to add our PYTHONPATH
to our conf.py
so it can import the code.
If you open up your conf.py
file,
you should see something close to this on line 18:
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
As it notes,
you need to let it know the path to your Python source.
In our example it will be ../src/
,
so go ahead and put that in this setting.
Note
You should always use relative paths here. Part of the value of Sphinx is having your docs build on other people’s computers, and if you hard code local paths that won’t work!
Try it out¶
Now go ahead and rengerate your docs and look at the magic that happened:
make html
Your Python docstrings have been magically imported into the project.
Tie it all together¶
Now let’s link directly to that for users who come in to the project.
Update your index.rst
to look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | Crawler Step 2 Documentation
User Guide
toctree:
install
support
cookbook
Programmer Reference
toctree:
cli
api
|
Note
Live Preview: Crawler Step 2 Documentation
One last time, let’s rebuild those docs:
make html
Warning
You now have awesome documentation! :)
Now you have a beautiful documentation reference that is coming directly from your code. This means that every time you change your code, it will automatically be reflected in your documentation.
The beauty of this approach is that it allows you to keep your prose and reference documentation in the same place. It even lets you semantically reference the code from inside the docs. This is amazingly powerful and a great way to write documentation.
Extra Credit¶
Have some extra time left? Let’s look through the code to understand what’s happening here more.
Look through intersphinx¶
Intersphinx allows you to bring the power of Sphinx references to multiple projects. It lets you pull in references, and semantically link them across projects. For example, in this guide we reference the Sphinx docs a lot, so we have this intersphinx setting:
intersphinx_mapping = {
'sphinx': ('http://sphinx-doc.org/', None),
}
Which allows us to add a prefix to references and have them resolve:
:ref:`sphinx:inline-markup`
We can also ignore the prefix, and Sphinx will fall back to intersphinx references if none exist in the current project:
:ref:`inline-markup`
You can read more about this in the intersphinx
docs.
Understand the code¶
A lot of the magic that is happening in Importing Code above is actually in the source code.
Check out the code for crawler/main.py
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | import time
from optparse import OptionParser
# Python 3 compat
try:
from urlparse import urlparse
except ImportError:
from urllib.parse import urlparse
import requests
from bs4 import BeautifulSoup
from utils import log, should_ignore
class Crawler(object):
"""
Main Crawler object.
Example::
c = Crawler('http://example.com')
c.crawl()
:param delay: Number of seconds to wait between searches
:param ignore: Paths to ignore
"""
def __init__(self, url, delay, ignore):
self.url = url
self.delay = delay
if ignore:
self.ignore = ignore.split(',')
else:
self.ignore = []
def get(self, url):
"""
Get a specific URL, log its response, and return its content.
:param url: The fully qualified URL to retrieve
"""
response = requests.get(url)
log(url, response.status_code)
return response.content
def crawl(self):
"""
Crawl the URL set up in the crawler.
This is the main entry point, and will block while it runs.
"""
html = self.get(self.url)
soup = BeautifulSoup(html, "html.parser")
for tag in soup.findAll('a', href=True):
link = tag['href']
parsed = urlparse(link)
if parsed.scheme:
to_get = link
else:
to_get = self.url + link
if should_ignore(self.ignore, to_get):
print('Ignoring URL: {url}'.format(url=to_get))
continue
self.get(to_get)
time.sleep(self.delay)
def run_main():
"""
A small wrapper that is used for running as a CLI Script.
"""
parser = OptionParser()
parser.add_option("-u", "--url", dest="url", default="http://docs.readthedocs.org/en/latest/",
help="URL to fetch")
parser.add_option("-d", "--delay", dest="delay", type="int", default=1,
help="Delay between fetching")
parser.add_option("-i", "--ignore", dest="ignore", default='',
help="Ignore a subset of URL's")
(options, args) = parser.parse_args()
c = Crawler(url=options.url, delay=options.delay, ignore=options.ignore)
c.crawl()
if __name__ == '__main__':
run_main()
|
As you can see, we’re heavily using RST in our docstrings. This gives us the same power as we have in Sphinx, but allows it to live within the code base.
This approach of having the docs live inside the code is great for some things. However, the power of Sphinx allows you to mix docstrings and prose documentation together. This lets you keep the amount of
Moving on¶
Could it get better? In fact, it can and it will. Let’s go on to Step 3: Keeping Documentation Up to Date.