In this tutorial, we’ll build a desktop app that:
✅ Extracts links from files (.txt, .pdf, .html)
✅ Filters links (include/exclude keywords)
✅ Checks if links are broken
✅ Displays results with colors (🟢 working / 🔴 broken)
✅ Uses a modern GUI with PySide6
📦 Step 1: Install Dependencies
First, install required packages:
pip install PySide6 requests PyPDF2
🧠 Step 2: Import Required Libraries
We start by importing everything we need:
import os
import sys
import re
import requests
import time
import platform
import subprocess
from PySide6.QtWidgets import *
from PySide6.QtCore import Qt, QThread, Signal, QTimer
from PySide6.QtGui import QColor, QIcon, QGuiApplication
import PyPDF2
💡 Explanation:
os, re → file handling + regex
requests → check links
PySide6 → GUI framework
PyPDF2 → extract text from PDFs
🧵 Step 3: Create a Background Worker (QThread)
We use a thread so the UI doesn’t freeze while scanning.
class LinkWorker(QThread):
found = Signal(str, bool)
progress = Signal(int)
finished = Signal()
💡 Why?
GUI apps must stay responsive, so heavy work runs in a thread.
🔍 Step 3.1: Initialize Worker
def __init__(self, folder, file_types, check_broken, include_words=None, exclude_words=None):
super().__init__()
self.folder = folder
self.file_types = file_types
self.check_broken = check_broken
self.include_words = include_words or []
self.exclude_words = exclude_words or []
self.seen_links = set()
self._running = True
💡 Features:
Avoid duplicate links
Support include/exclude filters
Allow stopping process
📂 Step 3.2: Scan Files
def run(self):
all_files = []
for root, _, files in os.walk(self.folder):
for f in files:
ext = os.path.splitext(f)[1].lower()
if (ext == '.txt' and self.file_types['txt']) or \
(ext == '.pdf' and self.file_types['pdf']) or \
(ext in ['.html', '.htm'] and self.file_types['html']):
all_files.append(os.path.join(root, f))
💡 What happens:
Recursively scans folders
Filters only selected file types
🔗 Step 3.3: Extract Links
urls = re.findall(r'https?://[^\s"\'>]+', text)
💡 Regex explained:
Matches http:// or https://
Stops at spaces or quotes
📄 Handle PDF Files
reader = PyPDF2.PdfReader(f)
for page in reader.pages:
text = page.extract_text()
🎯 Step 3.4: Apply Filters
if self.include_words and not any(w in url for w in self.include_words):
continue
if self.exclude_words and any(w in url for w in self.exclude_words):
continue
💡 Example:
Include: google
Exclude: facebook
🌐 Step 3.5: Check Broken Links
def check_link(self, url):
try:
res = requests.get(url, timeout=10)
return not (200 <= res.status_code < 400)
except:
return True
💡 Logic:
200–399 → OK
400+ → broken
🖥️ Step 4: Build the GUI
Create the main window:
class LinkApp(QWidget):
def __init__(self):
super().__init__()
self.setWindowTitle("LinkGuardian")
self.setMinimumSize(1000, 600)
📁 Step 4.1: Folder Selection
self.path_input = QLineEdit()
self.path_input.setReadOnly(True)
browse_btn = QPushButton("Browse")
browse_btn.clicked.connect(self.browse_folder)
def browse_folder(self):
folder = QFileDialog.getExistingDirectory(self)
if folder:
self.path_input.setText(folder)
self.folder = folder
⚙️ Step 4.2: Options (Checkboxes)
self.txt_checkbox = QCheckBox(".txt")
self.pdf_checkbox = QCheckBox(".pdf")
self.html_checkbox = QCheckBox(".html")
self.check_broken_checkbox = QCheckBox("Check Broken Links")
🔍 Step 4.3: Filters
self.include_input = QLineEdit()
self.include_input.setPlaceholderText("Include words")
self.exclude_input = QLineEdit()
self.exclude_input.setPlaceholderText("Exclude words")
▶️ Step 4.4: Start Scan
def start_scan(self):
self.worker = LinkWorker(
self.folder,
{
'txt': self.txt_checkbox.isChecked(),
'pdf': self.pdf_checkbox.isChecked(),
'html': self.html_checkbox.isChecked()
},
self.check_broken_checkbox.isChecked(),
self.include_input.text().split(","),
self.exclude_input.text().split(",")
)
self.worker.found.connect(self.add_link)
self.worker.start()
🎨 Step 5: Display Results
def add_link(self, link, is_broken):
item = QListWidgetItem(link)
color = QColor("red") if is_broken else QColor("green")
item.setForeground(color)
self.results_list.addItem(item)
💡 Result:
🟢 Green → Working link
🔴 Red → Broken link
📊 Step 6: Progress Bar
self.progress_bar = QProgressBar()
self.progress_bar.setMaximum(100)
Update it from the worker:
self.worker.progress.connect(self.progress_bar.setValue)
📋 Step 7: Copy All Links
def copy_all_links(self):
links = "\n".join(
self.results_list.item(i).text()
for i in range(self.results_list.count())
)
QGuiApplication.clipboard().setText(links)
🌍 Step 8: Open Links on Double Click
def open_item(self, item):
url = item.text()
if platform.system() == "Windows":
os.startfile(url)
else:
subprocess.Popen(["xdg-open", url])
🚀 Step 9: Run the App
if __name__ == "__main__":
app = QApplication(sys.argv)
window = LinkApp()
window.show()
sys.exit(app.exec())
🎉 Final Result
You now have a professional desktop tool that:
✔ Extracts links from files
✔ Filters intelligently
✔ Detects broken links
✔ Displays results beautifully
✔ Runs smoothly with threads
💡 Bonus Ideas
Want to upgrade it further?
Export results to CSV
Add domain grouping
Add link preview
Add multi-threaded link checking (faster 🚀)
United States
NORTH AMERICA
Related News
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
20h ago
UCP Variant Data: The #1 Reason Agent Checkouts Fail
6h ago

Décryptage technique : Comment builder un téléchargeur de vidéos Reddit performant (DASH, HLS & WebAssembly)
16h ago
How Braze’s CTO is rethinking engineering for the agentic area
10h ago
Encryption Protocols for Secure AI Systems: A Practical Guide
20h ago