API¶
consumo¶
Content Consumption Analyzer.
ConsumoError
¶
MissingMetadataError
¶
Bases: ConsumoError
Raised when a backend can't get the duration of a file from its metadata.
calculate_html_consumption_time(html, words_per_minute=265, multimedia_duration_resolver=None)
¶
Calculate the consumption time of an HTML file in seconds.
Uses concurrency to get the duration of any multimedia in the file to avoid any possible throttling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html
|
FilePath
|
Path to the HTML file whose consumption time will be calculated. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
multimedia_duration_resolver
|
Function used to get the duration of a multimedia file. |
None
|
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content of the HTML file. |
Source code in src/consumo/lib/file/html.py
calculate_mass_media_consumption_time(container, words_per_minute=265)
¶
Calculate the consumption time of a text container file in seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
FilePath
|
Path to a file primarily meant for text. Supported types are EPUB, MOBI and PDF. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content of the file. |
Source code in src/consumo/lib/file/mass_media.py
calculate_reading_time(word_count, cjk_character_count, words_per_minute=265)
¶
Calculate the reading time in seconds based on word count.
Supports Chinese, Japanese, and Korean (CJK) by having its reading speed as 1.8867924528 (500 / 265) that of the word one. This is done because, in the Medium formula, the average reading speed for words is 265 per minute, while the average for non-alphabetical languages is 500 per character.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
word_count
|
int
|
The number of words in the text. |
required |
cjk_character_count
|
int
|
The number of CJK characters in the text. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
Returns:
| Type | Description |
|---|---|
int
|
How long in seconds it would take to read the text. |
Source code in src/consumo/lib/file/text.py
calculate_text_consumption_time(container, words_per_minute=265)
¶
Calculate the consumption time of a plain text file in seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
FilePath
|
Path to the plain text file whose consumption time will be calculated. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content in the plain text file. |
Source code in src/consumo/lib/file/text.py
calculate_url_consumption_time(url, words_per_minute=265)
¶
Calculate the consumption time of a URL in seconds.
Avoids code duplication by downloading the HTML of the URL to a temporary
file, to use the HTML backend calculate_html_consumption_time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
HttpUrl
|
URL pointing to the content whose consumption time will be analyzed. |
required |
words_per_minute
|
NonNegativeInt
|
Reading speed in words per minute. |
265
|
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to consume the content the URL points to. |
Raises:
| Type | Description |
|---|---|
ConnectionError
|
If the HTML content of the URL wasn't downloaded. |
Source code in src/consumo/lib/url.py
calculate_viewing_time(image_count)
¶
Calculate the time for viewing images based on count.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_count
|
NonNegativeInt
|
The number of images. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The time in seconds to view all the images. |
Source code in src/consumo/lib/file/image.py
extract_mass_media_text(container)
¶
Extract text from a text container file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
container
|
FilePath
|
Path to a file primarily meant for text. Supported types are EPUB, MOBI and PDF. |
required |
Returns:
| Type | Description |
|---|---|
str
|
All the text content in the container. |
Source code in src/consumo/lib/file/mass_media.py
extract_multimedias(soup)
¶
Get all the multimedia sources from an HTML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
soup
|
BeautifulSoup
|
The HTML file as parsed by BeautifulSoup. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of all the multimedia sources. |
Source code in src/consumo/lib/file/html.py
format_time(total_seconds)
¶
Format the duration/consumption time given in seconds in a *h *m *s format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
total_seconds
|
int
|
The duration/consumption time in seconds of the content. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The duration/consumption time in a *h *m *s format. |
Source code in src/consumo/lib/formatting.py
get_custom_player_duration(html)
¶
Parse the JSON data in an HTML file provided for SEO to get video duration.
Designed with videos using custom players like the BBC's smp-toucan-player in mind.
The supported format for duration is ISO 8601.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html
|
FilePath
|
Path to the HTML file whose content will be parsed for JSON data containing duration information. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The duration reported by the JSON data as an integer representing |
int
|
seconds. |
Source code in src/consumo/lib/file/html.py
get_html_multimedia_duration(html, src)
¶
Get the duration of a multimedia file in an HTML file.
Tries to treat the multimedia file as if it was hosted online, then tries to resolve its path if that fails.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html
|
FilePath
|
Path to the HTML file where the multimedia file was found. |
required |
src
|
str
|
Path used for the file's "src" attribute. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The duration of the content in seconds. |
Source code in src/consumo/lib/file/html.py
get_multimedia_duration(url)
¶
Get the duration of a multimedia container hosted online.
Tries to treat the URL as if it was from a hosting platform, then tries to get the duration from the container as if the URL pointed directly to it if that fails.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
HttpUrl
|
URL pointing to where the multimedia container is hosted. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The duration of the content in seconds. |
Source code in src/consumo/lib/file/multimedia.py
get_url_multimedia_duration(url, src)
¶
Get the duration of a multimedia hosted online.
Tries to treat the file as if it had an absolute path, then tries to resolve its path if that fails.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
HttpUrl
|
URL where the multimedia file was originally found for path resolution. |
required |
src
|
str
|
Path used for the multimedia file's "src" attribute. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The duration of the content in seconds. |
Source code in src/consumo/lib/url.py
get_word_count(text)
¶
Get the number of words from text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text where the number of words will be counted from. |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
The number of words in the text. |