Web-Scraping Automation – Joseph Bulliner

Role: Automation builder • Focus: Structured data pipelines from web sources

Overview

These pipelines were designed to pull structured information from websites: product data, pricing, and related metadata. The focus was on consistency, resilience, and generating outputs that could actually be used for decision-making or further analysis.

Core Capabilities

Automated extraction from target pages using predictable selectors or patterns.
Transformation into normalized structures (e.g., JSON, CSV).
Handling of pagination, multiple categories, and edge cases.
Basic resilience to minor site layout changes.

Example Flow

for category in categories:
    page = 1
    while True:
        html = fetch_page(category, page)
        items = parse_items(html)
        if not items:
            break
        for item in items:
            record = normalize_item(item)
            save_record(record)
        page += 1

Outcomes

Built reusable patterns for future scraping and data collection tasks.
Improved understanding of site structures, rate limiting, and basic robustness.
Demonstrated the ability to go from raw HTML to useful, structured data.