HTML Extract

Legacy API Notice

This API is now deprecated, you should try the new and improved Browser Bot API. We will continue to operate this API into the future for pre-existing users however no more feature updates will be applied.

Extract specific HTML tag contents or attributes from complex HTML or XHTML content.
This is a flexible API which allows you to parse and extract any data from HTML documents.
You can search for data using a CSS/jQuery style selector.

Tag selector examples:

".super": find all elements with the class "super"
"img.avatar": find all "img" tags with the class "avatar"
"img[width=32]": find all "img" tags with the attribute "width" equaling "32"
"img[src*=cool]": find all "img" tags with the attribute "src" containing the string "cool"
"a[href]": find all "a" tags which have the "href" attribute set
"#special-id": find all elements with the id "special-id"

You can also combine selectors, for example:

"div a": find all "a" tags which are contained within (children of) "div" tags
"div > a": find all "a" tags which descend directly from "div" tags

End Point

https://neutrinoapi.net/html-extract-tags

Test API

API Request

Parameter	Required	Type	Description
content	yes	string	The HTML content. This can be either a URL to load from, a file upload (multipart/form-data) or an HTML content string
tag	yes	string	The HTML tag(s) to extract data from. This can just be a simple tag name like 'img' OR a CSS/jQuery style selector
attribute	no	string	If set, then extract data from the specified tag attribute. If not set, then data will be extracted from the tags inner content
base-url	no	string	The base URL to replace into relative links

API Response

Parameter	Type	Description
total	integer	The total number of values extracted
values	array	Array of extracted values

API Performance

Characteristic	Value	Description
Avg Latency	20ms	Average RTT for requests within the same data center/region
Max Rate	2/second	Maximum inbound request rate. Exceeding this will result in request throttling
Max Concurrency	250	Maximum concurrent/simultaneous requests. Exceeding this will result in error code 06 [TOO MANY CONNECTIONS]