Introducing Go-Ski! 🎉

Go-Ski Logo

Why Go-Ski? 🤔

Go-Ski is an open-source library designed to make web scraping incredibly fast and straightforward. Built on top of the robust chromedp package, it leverages the Chrome DevTools Protocol to offer a simplified and organized way to traverse and scrape the web.

The Inspiration Behind Go-Ski 🌱

While working on various projects involving large language models, I realized the immense potential that could be unlocked by enabling these models to interact with the web. However, navigating the web programmatically in a human-like manner presented its challenges.

Through my research and experience in creating bots, I noticed that many human interactions with websites follow a similar pattern: go to a specific site, authenticate if necessary, navigate to the target page, and consume some type of data.

Go-Ski Logo

The Problem of Redundancy 🔄

This workflow was precisely how my bots operated. However, as I started to create multiple workflows, I found myself repeating code and making minor tweaks to targets. This redundancy led me to seek a more efficient approach, and that's where the concept of "procedures" in Go-Ski was born.

Introducing Procedures in Go-Ski 🛠️

In Go-Ski, you define a procedure like this:

login_example.go
func LoginExample() {
	proc := core.NewProcedures(true)
 
	// Define actions
	proc.Actions = []core.Action{
		{
			Type:  core.Navigate,
			URL: "https://example.com/login",
		},
		{
			Type: core.Sleep,
			Delay: 2 * time.Second,
		},
		{
			Type: core.FormSubmit,
			FormDetails: &core.FormDetails{
				Fields: []core.FormField{
					{XPath: "/html/form/p[1]/input", Value: "example_user"},
					{XPath: "/html/form/p[2]/input", Value: "example_password"},
				},
				Submit: "/html/body/main/div[2]/fieldset/form/input",
				Delay:  1 * time.Second,
			},
		},
		{
			Type: core.Click,
			XPath: "/html/button",
		},
		{
			Type: core.Sleep,
			Delay: 5 * time.Second,
		},
		{
			Type: core.SwitchToIframe,
			IframeXPath: "/html/body/iframe",
		},
		{
			Type: core.Sleep,
			Delay: 5 * time.Second,
		},
		{
			Type: core.Scrape,
			XPath: "/html/body/h1",
		},
	}
 
	// Create a context
	ctx := context.Background()
 
	var targetInfo []*target.Info
 
	// Execute actions
	err := proc.Execute(ctx, targetInfo)
	if err != nil {
		panic(err)
	}
}

The goal is to make useful scrapers in a structured format, with the ability to scale.

Breaking Down Actions 📚

Each procedure in Go-Ski is made up of actions. These actions can be of various types:

  • Navigate: Targets to click on.
  • Form Submit: Injects data into a form and clicks a defined submit button.
  • Scrape: Returns the targeted content from an element on the page.
  • Delay: Adds a defined delay (in seconds).
  • Iframe Switch: Swtiches between iframes.

The beauty of Go-Ski's procedures is that they are extremely fast to develop and eliminate a lot of redundancy when traversing the web programmatically.