com.fatwire.crawler
Class WebResource

java.lang.Object
  extended by com.fatwire.crawler.WebResource

public class WebResource
extends java.lang.Object

This class is used to represent a resource which is downloaded as part of crawl session. A resource can be a java script file, an HTML file, an image etc, which is part of the web site.


Constructor Summary
WebResource(ResourceURL url)
          Constructor for creating an object of WebResource.
 
Method Summary
 void addURL(ResourceURL link)
          Adds a URL as found on this resource.
 void addURLs(java.util.List<ResourceURL> links)
          Adds a list of URL's as found in this WebResource.
 byte[] getBinaryData()
          The method is used to get the binary data for a resource that is downloaded as part of the crawl session.
 java.lang.String getContentEncoding()
          Get content encoding for the downloaded WebResource.
 java.lang.String getContentType()
          Returns content type of the downloaded WebResource.
 long getDownloadTime()
          Returns the time it took to download this resource in milliseconds.
 org.apache.http.Header[] getHeaders()
          Used to get all the headers inside the HTTP response for a WebResource.
 long getResourceSize()
          Used to get content size of the downloaded resource.
 int getStatusCode()
          Get the status code for the response while accessing the WebResource.
 java.lang.String getText()
          Used to return the HTTP body as a string if text conversion is possible.
 java.net.URI getURI()
          Returns the URI for the downloaded resource.
 ResourceURL getURL()
          Returns the URL which was used to download this resource.
 java.util.Set<ResourceURL> getURLs()
          The method is used to return the list of the url's that are found inside the web resource.
 boolean load(org.apache.http.HttpResponse response)
          Reads the HttpResponse and sets various HTTP parameters on this resource.
 void setContentEncoding(java.lang.String contentEncoding)
          Setter for content encoding.
 void setContentType(java.lang.String value)
          Setter for the contentType.
 void setDownloadTime(long elapsed)
          Sets the time taken to download the WebResource in milliseconds.
 void setStatusCode(int statusCode)
          Setter for the HTTP response status code.
 

Constructor Detail

WebResource

public WebResource(ResourceURL url)
Constructor for creating an object of WebResource.

Parameters:
ResourceURL - Takes a ResourceUrl object as parameter.
Method Detail

getText

public java.lang.String getText()
Used to return the HTTP body as a string if text conversion is possible.

Returns:
HTTP body as a String if text conversion is possible
Throws:
java.io.UnsupportedEncodingException

getURLs

public java.util.Set<ResourceURL> getURLs()
The method is used to return the list of the url's that are found inside the web resource. The url's needs to be populated with the link extractor inside a web resource.

Returns:
the links as they are found on this resource.

getURL

public ResourceURL getURL()
Returns the URL which was used to download this resource.

Returns:
ResourceURL that was used to download this resource.

getURI

public java.net.URI getURI()
Returns the URI for the downloaded resource.

Returns:
the URI of this resource.

getBinaryData

public byte[] getBinaryData()
The method is used to get the binary data for a resource that is downloaded as part of the crawl session.

Returns:
the HTTP body in bytes array.

load

public boolean load(org.apache.http.HttpResponse response)
Reads the HttpResponse and sets various HTTP parameters on this resource.

Parameters:
response - HttpResponse for the web resource.
Returns:
true if there was a HTTP body.
Throws:
java.lang.IllegalStateException
java.io.IOException

addURLs

public void addURLs(java.util.List<ResourceURL> links)
Adds a list of URL's as found in this WebResource.

Parameters:
links - List of ResourceURL

addURL

public void addURL(ResourceURL link)
Adds a URL as found on this resource.

Parameters:
link - ResourceURL

getHeaders

public org.apache.http.Header[] getHeaders()
Used to get all the headers inside the HTTP response for a WebResource.

Returns:
HTTP response headers.

setStatusCode

public void setStatusCode(int statusCode)
Setter for the HTTP response status code.

Parameters:
statusCode - Status code for response.

getStatusCode

public int getStatusCode()
Get the status code for the response while accessing the WebResource.

Returns:
the HTTP status code, -1 is not set.

setContentType

public void setContentType(java.lang.String value)
Setter for the contentType.

Parameters:
value -

getContentEncoding

public java.lang.String getContentEncoding()
Get content encoding for the downloaded WebResource.

Returns:
String return encoding for the content

setContentEncoding

public void setContentEncoding(java.lang.String contentEncoding)
Setter for content encoding.

Parameters:
contentEncoding -

getContentType

public java.lang.String getContentType()
Returns content type of the downloaded WebResource.

Returns:
the ContentType HTTP header.

setDownloadTime

public void setDownloadTime(long elapsed)
Sets the time taken to download the WebResource in milliseconds.

Parameters:
elapsed - Time elapsed in downloading a WebResource.

getDownloadTime

public long getDownloadTime()
Returns the time it took to download this resource in milliseconds.

Returns:
the time to download, -1 if not set

getResourceSize

public long getResourceSize()
Used to get content size of the downloaded resource.

Returns:
byte size of the HTTP body.


Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved