com.fatwire.crawler
Interface LinkExtractor


public interface LinkExtractor

This interface is used to extract the links from the downloaded markup as part of the crawl session. User can give their own custom implementation for LinkExtractor interface to extract the links from the downloaded resource. In Site Capture context, each downloaded resource is considered a WebResource.

There is an OOTB implementation - PatternLinkExtractor - which uses regular expression to extract the links from the downloaded markup.

Refer to the developer guide for details and usage of PatternLinkExtractor.


Method Summary
 java.util.List<ResourceURL> extract(WebResource resource)
          Method is used to parse the WebResource and find a list of links based on the algorithm specified.
 

Method Detail

extract

java.util.List<ResourceURL> extract(WebResource resource)
Method is used to parse the WebResource and find a list of links based on the algorithm specified.

Parameters:
resource - A WebResource object which contains the information regarding the downloaded resource as part of crawl session.
Returns:
Must return a list of ResourceURL object that is found in the downloaded HTML as per the algorithm for this method.


Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved