Available for opportunities · Veszprém, Hungary

Abonyi János
Software Developer & AI Engineer

Junior software developer focused on backend engineering and applied AI — RAG systems, LLM orchestration, and polyglot data architectures.

Abonyi János

About

Backend engineering meets applied AI.

I'm a junior software developer from Veszprém, Hungary, currently pursuing an MSc in Computer Science Engineering at the University of Pannonia, after completing my BSc at Eötvös Loránd University (ELTE) with a Software Design specialization.

I have a passion for learning how applied-AI systems work — retrieval-augmented generation, multi-database orchestration with LLM tool calling, and ML pipelines.

I care about systems that are correct under failure: idempotent handlers, transactional outboxes, guardrails before data crosses a trust boundary. I also enjoy turning messy real-world data into something searchable and useful.

Selected work

Projects

RAGAI SafetyGuardrails

Autonomous Compliance Auditor

Corrective-RAG service that audits responses for legal & privacy compliance.

A LangGraph-orchestrated Corrective-RAG (CRAG) pipeline that audits knowledge-base and bot responses for adherence to legal, GDPR/HIPAA privacy, and company policy. It screens input for prompt-injection and jailbreaks, redacts PII before anything reaches the LLM, grades retrieved documents, and falls back to web search when local context is insufficient.

  • Fail-safe ordering: input guardrails → PII redaction happen before text reaches embeddings or the LLM.
  • Two-stage input screening — cheap regex first, LLM classifier second.
  • CRAG correction: documents are graded; weak local context triggers a Tavily web search before generation.
  • Grounded generation as a structured ComplianceAudit, re-checked for grounding/toxicity with one bounded self-correction.
LangGraphOpenAIPineconePresidioTavilyFastAPIPydantic
GraphRAGDistributed SystemsPolyglot Persistence

Digital HR Architect

Polyglot-persistence HR brain with multi-hop GraphRAG reasoning.

An AI-powered HR platform that unifies five database paradigms — Relational, Graph, Document, Vector, and Big Data — behind one orchestration layer. An LLM uses function calling to route natural-language questions across PostgreSQL, Neo4j, Firestore, Pinecone, and BigQuery ML, chaining results to answer questions no single store can.

  • Multi-hop GraphRAG: e.g. 'find a great listener mentored by someone who knows Python' chains Pinecone → Neo4j → PostgreSQL → BigQuery ML.
  • Transactional outbox pattern keeps five stores consistent without a distributed commit — business row + event commit atomically in one PG transaction.
  • At-least-once delivery with idempotent handlers (MERGE / upsert), dead-letter queue, and a drift detector across all stores.
  • BigQuery ML logistic-regression model predicts employee flight risk directly in SQL.
PostgreSQLNeo4jFirestorePineconeBigQuery MLOpenAICloud Pub/SubStreamlit
RAGFull-stackReranking

Quiz Solver

RAG study assistant that answers exam questions from your own documents.

Upload PDF / DOCX / TXT study materials — including scanned PDFs — and get instant, source-cited answers to any question, including A/B/C/D multiple choice. Each subject lives in its own isolated knowledge base with full Q&A history.

  • Retrieve top-20 from Pinecone, rerank to top-5 with Cohere, then answer with GPT-4o at low temperature for grounded responses.
  • 3-stage PDF extraction with OCR fallback (pdfjs-dist → pdf-parse → Mistral OCR) for fully scanned documents.
  • Per-knowledge-base Pinecone namespaces for full document isolation.
  • Answers rendered as Markdown + LaTeX (KaTeX), always in the question's language.
Next.jsReactTypeScriptOpenAIPineconeCohereSupabaseMistral OCR
CloudComputer VisionBackend

PhotoVault

Upload photos and search them by what's actually in them.

A cloud-native photo app: every uploaded image is auto-analyzed by Google Cloud Vision, which returns content labels, so you can search 'dog', 'mountain', or 'receipt' with no manual tagging. Built to compose three Google Cloud services behind one Flask app.

  • Upload → Cloud Storage, analyze → Vision API labels, store metadata → Firestore, all in one pipeline.
  • Case-insensitive label search via a lowercased labelsLower array and Firestore array_contains.
  • Containerized with Docker and Gunicorn, deployed to Google Cloud Run with identity-based auth.
PythonFlaskGoogle Cloud StorageCloud Vision APIFirestoreDockerCloud Run
Machine LearningData ScienceTime Series

F1 Driver Telemetry Classifier

Identifying F1 drivers from their telemetry-derived driving style.

An ML project that analyzes Formula 1 telemetry (speed, throttle, brake, RPM, gear) to distinguish driving styles between drivers and build a predictive model that identifies a driver from a lap. Data is pulled per-lap with FastF1, distance-resampled into uniform time series, and turned into per-lap feature matrices.

  • Telemetry ingestion via FastF1 with local caching across Grand Prix sessions.
  • Distance-based resampling to align laps into comparable, uniform feature vectors.
  • Exploratory analysis (box/violin/Q-Q plots) comparing throttle, brake and speed distributions per driver.
  • Supervised classification model to attribute laps to drivers (VER, RUS, NOR, PIA).
PythonFastF1pandasscikit-learnmatplotlibseaborn

Journey

Experience & Education

  1. MSc, Computer Science

    2025 – present

    University of Pannonia

    Computer Science MSc.

  2. Software Developer Trainee

    Jul 2025 – Apr 2026

    One Identity

    Backend development on Safeguard for Privileged Sessions (SPS): implementing features, fixing bugs, extending test coverage, and participating in code reviews (Gerrit).

  3. LLM based machine learning decision-support system

    Jul – Sep 2024

    University of Pannonia

    Processing dummy university admissions data, data visualization, LLM-based report generation, prompt engineering, and applied machine-learning algorithms.

  4. BSc, Computer Science (Software Design specialization)

    2022 – 2025

    Eötvös Loránd University (ELTE)

    Computer Science BSc, Software design specialization. Thesis: an LLM-based machine-learning decision-support system.

  5. Bilingual Secondary School

    2017 – 2022

    Balatonalmádi Bilingual Gymnasium

    English bilingual graduation and C1 advanced English certificate.

Toolbox

Skills & Languages

Languages

PythonC#JavaSQLCypher

Databases

PostgreSQLMSSQLNeo4jPineconeFirestoreBigQuerySupabase

AI / ML

RAGLLM APIsLangGraph/LangChainSupervised MLPrompt EngineeringEmbeddings

Cloud / DevOps

Google CloudCloud RunDockerGitCI/CDGitLabJenkinsAzure DevOps

AI Agents & Tooling

Claude CodeAntigravityCodexn8nOpenclawHermes agent

Spoken languages

  • HungarianNative
  • EnglishC1 — complex
  • GermanB2 — complex

Interactive

Ask me anything

A retrieval-augmented chatbot grounded in my real CV and projects. It embeds your question, searches a Pinecone vector index, and answers only from what it finds — the same RAG stack I build with.

Ask my portfolio anything

RAG over my CV & projects — Pinecone + OpenAI

I'm an AI assistant grounded in János's real CV and projects. Try a question: