Problem with cookie in HttpUrlConnection
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Sven_Reiss
Posted On:   Tuesday, March 25, 2003 12:58 AM

Hello, I crawl a few websites and get the following error: java.lang.StackOverflowError at java.util.Properties.getProperty(Properties.java:480) at java.lang.System.getProperty(System.java:574) at sun.security.action.GetPropertyAction.run(GetPropertyAction.java:66) at java.security.AccessController.doPrivileged(Native Method) at java.io.BufferedWriter. (BufferedWriter.java:91) at java.io.BufferedWriter. (BufferedWriter.java:70) at java.io.PrintStream.init(PrintStream.java:77) at java.io.PrintStream. (PrintStream.java:117) at sun.net.www.http.HttpClient.openServer(HttpClient.java:388) at sun.net.www.http.HttpClient.openServer(HttpClient.java   More>>

Hello,


I crawl a few websites and get the following error:


			
java.lang.StackOverflowError
at java.util.Properties.getProperty(Properties.java:480)
at java.lang.System.getProperty(System.java:574)
at sun.security.action.GetPropertyAction.run(GetPropertyAction.java:66)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.BufferedWriter. (BufferedWriter.java:91)
at java.io.BufferedWriter. (BufferedWriter.java:70)
at java.io.PrintStream.init(PrintStream.java:77)
at java.io.PrintStream. (PrintStream.java:117)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:388)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:602)
at sun.net.www.http.HttpClient.>(HttpClient.java:303)
at sun.net.www.http.HttpClient.>(HttpClient.java:264)
at sun.net.www.http.HttpClient.New(HttpClient.java:336)
at sun.net.www.http.HttpClient.New(HttpClient.java:317)
at sun.net.www.http.HttpClient.New(HttpClient.java:312)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:481)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:472)
at net.action.misc.HttpClient.getConnection(HttpClient.java:44)
at net.action.misc.HttpClient.getConnectionWithCookie(HttpClient.java:56)
at net.action.misc.HttpClient.getPage(HttpClient.java:86)
at net.action.downloadpages.DownloadLoop.loadPages(DownloadLoop.java:281)
at net.action.downloadpages.DownloadLoop.loadPages(DownloadLoop.java:304)


The last line I got 1009 times. The line in my Code is where I set the Cookie in my HttpUrlConnection.

httpCon.setRequestProperty(CONST.COOKIESTRING, cookie);

When I got these error my alllinks counter ist around 30.000 uniquies links. My Cookie counter ist around 28.000 cookies with the same String and there are 2000 hrefs left. all hrefs from the same top level domain.


my stack size is 256 mb, and the problem is the same when I do it as loop or recursion.


The recursion (or loop) runs in this way:


1) I load the index page parse it for hrefs and safe then in a vector called links. I only vist hrefs from this tld.

2) than I check this links against a vector called alllinks. in this vector all links are unique.

3) start the download with the first url in the links vector.

4) every 200 downloads I write the lucene index back and close my HttpClient and set it to null. than I call System.gc(); System.runFinalization();.


5) start with the next tld.


when I look a the runtime stack with optimizeit I see every thing is cleaned except the System Properties for the cookies.


Can anybody help me?


			
public final class HttpClient {

private HttpURLConnection httpCon = null;
private URL url = null;
private int response_code = 999;
private String cookie = CONST.EMPTYSTRING;
private StringBuffer line = null;
private String page = null;

static Logger logger = Logger.getLogger(HttpClient.class.getName());

public HttpClient() {
System.getProperties().setProperty("sun.net.client.defaultConnectTimeout", "60000");
System.getProperties().setProperty("sun.net.client.defaultReadTimeout", "60000");
}
public int checkHref(String href) throws Exception {
makeUrl(href);
getConnection();
return response_code;
}
private void getConnection() throws Exception{
try {
httpCon = (HttpURLConnection)url.openConnection();
if(!cookie.equals(CONST.EMPTYSTRING)) {
logger.debug("set cookie: " + cookie);
httpCon.setRequestProperty(CONST.COOKIESTRING, cookie);
}
httpCon.connect();
responseCode();
}
catch(IOException ioe) {
logger.error("Fehler beim Connect: " + ioe.getMessage());
httpCon = null;
throw ioe;
}
}
private void getConnectionWithCookie() throws Exception{
String cookie_header = null;
try {
getConnection();
if(cookie.equals(CONST.EMPTYSTRING)) {
cookie_header = httpCon.getHeaderField(CONST.COOKIESTRING);
if(cookie_header == null) cookie_header = httpCon.getHeaderField(CONST.SETCOOKIESTRING);
if(cookie_header != null)
if(cookie_header.indexOf(CONST.SEMIKOLON) != -1)
cookie_header = cookie_header.substring(0, cookie_header.indexOf(CONST.SEMIKOLON));
if(cookie_header != null)
cookie = cookie_header;
else {
cookie = null;
cookie = CONST.EMPTYSTRING;
}
cookie_header = null;
}
}
catch(IOException ioe) {
logger.error("Fehler beim ConnectWithCookie: " + ioe.getMessage());
httpCon = null;
cookie_header = null;
throw ioe;
}
}
public String getPage(String serverUrl) throws Exception {
page = null;
page = CONST.EMPTYSTRING;
try {
logger.info("Laden von :" + serverUrl + ":");
makeUrl(serverUrl);
if(url != null) {
getConnectionWithCookie();
if(httpCon != null)
if(response_code == 200) page = readPage();
}
if(!page.equals(CONST.EMPTYSTRING))
logger.info(page.length() + " Bytes geladen Response Code: " + response_code);
}
catch(Exception e) {
logger.debug("Fehler getPage Exception: " + e.toString());
shutConnection();
throw e;
}
shutConnection();
return page;
}
public int getResponseCode() { return response_code; }
private void makeUrl(String serverUrl) throws Exception {
try { url = new URL(serverUrl); }
catch(MalformedURLException me) {
logger.error("Fehler beim Url erstellen: " + me.getMessage());
url = null;
}
}
private String readPage() throws Exception {
line = new StringBuffer();
InputStreamReader isr = null;
BufferedReader in = null;
String temp = null;
try {
isr = new InputStreamReader(httpCon.getInputStream());
in = new BufferedReader(isr);
temp = CONST.EMPTYSTRING;
while((temp = in.readLine()) != null) {
line.append(CONST.BLANKSTRING);
temp = temp.replace(' ', ' ');
line.append(temp.trim());
line.append(CONST.BLANKSTRING);
}
}
catch(IOException ioe) {
logger.error("Fehler beim Laden der Seite: " + ioe.getMessage());
throw ioe;
}
finally {
temp = null;
if(in != null) in.close();
in = null;
if(isr != null) isr.close();
isr = null;
}
return line.toString();

}
private void responseCode() throws Exception {
response_code = 999;
try { response_code = httpCon.getResponseCode(); }
catch (IOException ioe) {
logger.error("Fehler beim Response: " + ioe.getMessage());
throw ioe;
}
}
public String getCookie() { return cookie; }
public void setCookie(String value) {
if(value != null) cookie = value;
}
private void shutConnection() throws Exception {
if(httpCon != null) httpCon.disconnect();
httpCon = null;
url = null;
line = null;
}
}
   <<Less

Re: Problem with cookie in HttpUrlConnection

Posted By:   Stephen_Ostermiller  
Posted On:   Tuesday, March 25, 2003 03:46 PM

The cookies are not the problem. The fact that you are getting that last line a thousand times is the problem.


A stack overflow causes when you have too much recursion. Each time you recurse, you add some information to a data structure called the stack. The stack has a limited (and usually fixed by the runtime environment) size.


The solution is to not recurse.


Your problem can be maintaing a HashTable of URLs to fetch and having a loop go over this table while it has something in it. Pseudocode:


HashTable listOfURLs = new HashTable();

public void spider(){
listOfURLs.add(new URL("http://initialurl.com/"));
while (listOfURLs.size() > 0){
load(listOfURLs.first());
}
}

public void load(URL url){
// load the url, put all the links into the list of urls
}
About | Sitemap | Contact