HTML Extract
Legacy API Notice
This API is now deprecated, you should try the new and improved Browser Bot API.
We will continue to operate this API into the future for pre-existing users however no more feature updates will be applied.
Extract specific HTML tag contents or attributes from complex HTML or XHTML content.
This is a flexible API which allows you to parse and extract any data from HTML documents.
You can search for data using a CSS/jQuery style selector.
Tag selector examples:
- ".super": find all elements with the class "super"
- "img.avatar": find all "img" tags with the class "avatar"
- "img[width=32]": find all "img" tags with the attribute "width" equaling "32"
- "img[src*=cool]": find all "img" tags with the attribute "src" containing the string "cool"
- "a[href]": find all "a" tags which have the "href" attribute set
- "#special-id": find all elements with the id "special-id"
You can also combine selectors, for example:
- "div a": find all "a" tags which are contained within (children of) "div" tags
- "div > a": find all "a" tags which descend directly from "div" tags
End Point
https://neutrinoapi.net/html-extract-tags
Test API
Parameter | Required | Type | Default | Description |
---|
content | yes | string | | The HTML content. This can be either a URL to load from, a file upload (multipart/form-data) or an HTML content string |
tag | yes | string | | The HTML tag(s) to extract data from. This can just be a simple tag name like 'img' OR a CSS/jQuery style selector |
attribute | no | string | | If set, then extract data from the specified tag attribute. If not set, then data will be extracted from the tags inner content |
base-url | no | string | | The base URL to replace into relative links |
Parameter | Type | Description |
---|
total | integer | The total number of values extracted |
values | array | Array of extracted values |
Characteristic | Value | Description |
---|
Avg Latency | 20ms | Average RTT for requests within the same data center/region |
Max Rate | 2/second | Maximum inbound request rate. Exceeding this will result in request throttling |
Max Concurrency | 250 | Maximum concurrent/simultaneous requests. Exceeding this will result in error code 06 [TOO MANY CONNECTIONS] |