API¶
consumo¶
Content Consumption Analyzer.
ConsumoError
¶
MissingMetadataError
¶
Bases: ConsumoError
Raised when a backend can't get the duration of a file from its metadata.
NoCacheError
¶
Bases: ConsumoError
Raised when an argument doesn't have a cache key assigned to it.
UnsupportedMIMETypeError
¶
Bases: ConsumoError
Raised when a file doesn't have the expected MIME type.
calculate_html_consumption_time(html, words_per_minute=265, multimedia_duration_resolver=None)
¶
Calculate the consumption time of an HTML file in seconds.
Uses concurrency to get the duration of any multimedia in the file to avoid any possible throttling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html
|
FilePath
|
Path to the HTML file whose consumption time will be calculated. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
multimedia_duration_resolver
|
Function used to get the duration of a multimedia file. |
None
|
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content of the HTML file. |
Source code in src/consumo/lib/file/html.py
calculate_mass_media_consumption_time(container, words_per_minute=265)
¶
Calculate the consumption time of a text container file in seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
FilePath
|
Path to a file primarily meant for text. Supported types are EPUB, MOBI and PDF. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content of the file. |
Source code in src/consumo/lib/file/mass_media.py
calculate_reading_time(word_count, cjk_character_count, words_per_minute=265)
¶
Calculate the reading time in seconds based on word count.
Supports Chinese, Japanese, and Korean (CJK) by having its reading speed as 1.8867924528 (500 / 265) that of the word one. This is done because, in the Medium formula, the average reading speed for words is 265 per minute, while the average for non-alphabetical languages is 500 per character.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
word_count
|
int
|
The number of words in the text. |
required |
cjk_character_count
|
int
|
The number of CJK characters in the text. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
Returns:
| Type | Description |
|---|---|
int
|
How long in seconds it would take to read the text. |
Source code in src/consumo/lib/file/text.py
calculate_text_consumption_time(container, words_per_minute=265)
¶
Calculate the consumption time of a plain text file in seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
FilePath
|
Path to the plain text file whose consumption time will be calculated. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content in the plain text file. |
Source code in src/consumo/lib/file/text.py
calculate_url_consumption_time(url, words_per_minute=265)
¶
Calculate the consumption time of a URL in seconds.
Avoids code duplication by downloading the HTML of the URL to a temporary
file, to use the HTML backend calculate_html_consumption_time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
HttpUrl
|
URL pointing to the content whose consumption time will be analyzed. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content the URL points to. |
Raises:
| Type | Description |
|---|---|
ConnectionError
|
When the HTML content of the URL wasn't downloaded. |
Source code in src/consumo/lib/url.py
calculate_viewing_time(image_count)
¶
Calculate the time for viewing images based on count.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_count
|
NonNegativeInt
|
The number of images. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to view all the images. |
Source code in src/consumo/lib/file/image.py
extract_mass_media_text(container)
¶
Extract text from a text container file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
FilePath
|
Path to a file primarily meant for text. Supported types are EPUB, MOBI and PDF. |
required |
Returns:
| Type | Description |
|---|---|
str
|
All the text content in the container. |
Source code in src/consumo/lib/file/mass_media.py
extract_multimedias(soup)
¶
Get all the multimedia sources from an HTML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
soup
|
BeautifulSoup
|
The HTML file as parsed by BeautifulSoup. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of all the multimedia sources. |
Source code in src/consumo/lib/file/html.py
format_time(total_seconds)
¶
Format the duration/consumption time given in seconds in a *h *m *s format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
total_seconds
|
int
|
The duration/consumption time in seconds of the content. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The duration/consumption time in a *h *m *s format. |
Source code in src/consumo/lib/formatting.py
get_custom_player_duration(html)
¶
Parse the JSON data in an HTML file provided for SEO to get video duration.
Designed with videos using custom players like the BBC's smp-toucan-player in mind.
The supported format for duration is ISO 8601.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html
|
FilePath
|
Path to the HTML file whose content will be parsed for JSON data containing duration information. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The duration reported by the JSON data as an integer representing |
int
|
seconds. |
Source code in src/consumo/lib/file/html.py
get_file_duration(file, words_per_minute=265)
¶
Get the duration or calculate the consumption time of a file in seconds.
Support is based on MIME type.
Supported types are:
- "audio":
get_multimedia_duration. - "image":
calculate_viewing_time. - "video":
get_multimedia_duration.
Supported types/subtypes are:
- "application/epub+zip":
calculate_mass_media_consumption_time. - "application/pdf":
calculate_mass_media_consumption_time. - "application/x-mobipocket-ebook":
calculate_mass_media_consumption_time. - "text/html":
calculate_html_consumption_time. - "text/plain":
calculate_text_consumption_time.
Directories are unsupported.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file
|
FilePath
|
The path to the file whose duration or consumption time will be analyzed. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content in the file. |
Raises:
| Type | Description |
|---|---|
UnsupportedMIMETypeError
|
When the MIME type is unsupported. |
Source code in src/consumo/lib/handlers/file.py
get_html_multimedia_duration(html, src)
¶
Get the duration of a multimedia file in an HTML file.
Tries to treat the multimedia file as if it was hosted online, then tries to resolve its path if that fails.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html
|
FilePath
|
Path to the HTML file where the multimedia file was found. |
required |
src
|
str
|
Path used for the file's "src" attribute. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The duration of the content in seconds. |
Source code in src/consumo/lib/file/html.py
get_multimedia_duration(container)
¶
Get the duration from a multimedia container or URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
FilePath | HttpUrl
|
Either a path to a multimedia container or a URL. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The duration in seconds of the content. |
Raises:
| Type | Description |
|---|---|
MissingMetadataError
|
If the duration can't be found from the metadata. |
Source code in src/consumo/lib/file/multimedia.py
get_url_duration(url, words_per_minute=265, depth=0, cache=True, cache_dir=Path.cwd(), get_cached_resolver=dummy_get_cached_resolver, cache_resolver=dummy_cache_resolver)
¶
Get the duration or calculate the consumption time of a URL in seconds.
Gets the duration of media from hosting platforms or direct file links, and calculates the consumption time otherwise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
HttpUrl
|
The URL of the content whose duration or consumption time will be analyzed. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
depth
|
NonNegativeInt
|
How many levels to recursively follow URLs on the page. |
0
|
cache
|
bool
|
Whether to cache results in a database for later reuse. Values are invalidated based on time. |
True
|
cache_dir
|
Path
|
The path to where the cache will be stored. |
cwd()
|
get_cached_resolver
|
Callable[[Path, str], int]
|
Function for getting a value from a cache system whose signature consists of cache directory, key, and time (date) for cache invalidation. |
dummy_get_cached_resolver
|
cache_resolver
|
Callable[[Path, str, int, int], None]
|
Function for storing a value in a cache system whose signature consists of cache directory, key, value, and time (date) for cache invalidation. |
dummy_cache_resolver
|
Warning
get_cached_resolver and cache_resolver have dummy default values. You have
to implement your own cache functions if you want to use cache!
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content the URL points to. |
Source code in src/consumo/lib/handlers/url.py
get_url_multimedia_duration(url)
¶
Get the duration of a multimedia container hosted online.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
HttpUrl
|
URL pointing to where the multimedia container is hosted. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The duration of the content in seconds. |
Source code in src/consumo/lib/file/multimedia.py
get_word_count(text)
¶
Get the number of words from text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text where the number of words will be counted from. |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
The number of words in the text. |