|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface LinkExtractor
This interface is used to extract the links from the downloaded markup as part of the crawl session. User can give their own custom implementation for LinkExtractor interface to extract the links from the downloaded resource. In Site Capture context, each downloaded resource is considered a WebResource.
There is an OOTB implementation - PatternLinkExtractor - which uses regular expression to extract the links from the downloaded markup.
Refer to the developer guide for details and usage of PatternLinkExtractor.
Method Summary | |
---|---|
java.util.List<ResourceURL> |
extract(WebResource resource)
Method is used to parse the WebResource and find a list of links based on the algorithm specified. |
Method Detail |
---|
java.util.List<ResourceURL> extract(WebResource resource)
resource
- A WebResource object which contains the information regarding the downloaded resource as part of crawl session.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |