Web crawling administrator

We are a company who download open and public data from many different websites. These websites needs to be supervised by a person who has knowledge and experience of HTML.

We are looking for a source administrator to look after our sources, a list of sites to be crawled regularly by our system.
Existing sources need to be repaired when the sites change structure, and sometimes we want to add new sources.
A source consists of one or more hubs: sections or starting URLs. For example, a city could have two hubs: one for published meeting protocols and one for public announcements. The
job for the source admin is to identify the different hubs, and configure our system to crawl them correctly.

The job of the source admin is more about understanding the structure of web sites than programming. Therefore our source admin probably has a light background in web development rather than systems development.

Skill set: HTML (mandatory), CSS, JavaScript, AJAX, JSON, basic regular expressions.

Our basic need is a technical person who knows HTML, CSS and basic web technologies, who can look at the HTML source code of sites and decide how to best express a suitable crawling configuration. Sometimes the sites will be regular HTML sites, but sometimes they load their content with JavaScript, using AJAX requests. A source administrator must understand how this works as well. Using this knowledge and aided by our simple interface, the person will fix broken configurations and add new sites from a list provided by us. It is positive if you as a person are curious and driven.

There may also be system administrator duties (we run Ubuntu Linux servers on Google Compute Engine) that we need to perform, to keep the crawlers running. These adjacent areas are not required skills and they will not be performed for the first months, but they are possible ways to grow the job responsibility.

Detta är en jobbannons med titeln "Web crawling administrator Göteborg" hos företaget Crawlica AB och publicerades på webbjobb.io den 22 mars 2017 klockan 14:04.

Hur du söker jobbet

Ansökan sker via e-post till [email protected].

