WHO WE ARE:
Talend, a leader in data integration and data governance, is changing the way the world makes decisions(e)In order to compete and win, IT and business leaders need data that they can trust and understand instantly(e)Talend Data Fabric is the only platform that seamlessly combines an extensive range of data integration and governance capabilities to actively manage the health of corporate information(e)This unified approach is unique and essential to delivering complete, clean, and uncompromised data in real-time to all employees(e)It has made it possible to create innovations like the Talend Trust Score™, an industry-first assessment that instantly quantifies the reliability of any data set.
Over 7,250 customers have chosen Talend to run their businesses on healthy data(e)Talend is recognized as a leader in its field by leading analyst firms and industry media.
We pride ourselves in our values of Passion, Agility, Team Spirit, and Integrity(e)Every one of our 1,400 employees brings a certain je ne sais quoi that makes Talend special.
Sujet de stage
Modèles de langage : application au nettoyage de données
Contexte :
La société Talend se positionne comme leader mondial dans le domaine de la qualité de données [1](e)Elle développe des outils cloud pour la manipulation et la préparation de données comme Talend Data Preparation, Talend Data Stewardship et Talend Pipeline Designer(e)Ces outils permettent à l’utilisateur de manipuler et nettoyer efficacement de multiples sources de données, de façon à transformer des données brutes et les rendre disponibles pour les utilisateurs métiers et les différentes applications de ses clients(e)Dans un tel contexte, garantir une qualité optimale des données tout au long des chaines de traitement est un enjeu primordial(e)Talend met à disposition de ses clients différents moyens pour évaluer la qualité de leurs de données, de détecter de potentielles anomalies ou incohérences, et de les traiter de façon à les corriger(e)
Objectif du stage :
Le nettoyage automatique d’un jeu de données de grande taille reste à ce jour un défi technique(e)Le développement des méthodes d'apprentissage automatique a permis des avancées majeures dans le traitement de données homogènes telles que les images ou le texte(e)Même si leurs performances restent limitées en ce qui concerne les données tabulaires [2], des études récentes ont démontré leur potentiel notamment grâce à la prolifération des modèles de langage [3](e)L'objectif du stage sera d'explorer les différentes possibilités d'utilisation des modèles de langage pour le nettoyage des données dans le cadre des applications Talend(e)Le stagiaire sera intégré dans l'équipe Lab au sein du département R&D(e)
Profil candidat :
· BAC +5 en fin de cycle d’école d’ingénieur ou Master 2 en université, spécialisé(e) en IA/data science, informatique et/ou mathématiques appliquées.
· Une expérience pratique en apprentissage automatique (de préférence en NLP) est requise.
· La maîtrise de l’anglais est exigée
· Compétences souhaitées : python, apprentissage automatique, NLP
· Durée : 5-6 mois (dès mars 2023)
#LI-HM1
AND NOW, A LITTLE ABOUT US:
Talend has received some pretty impressive accolades along the way:
- 7,250+ global customers rely on Talend for their data health
- Named a Leader for Data Integration Tools by Gartner (for the 7th year in a row)
- Named a Leader for Data Integration Tools by Gartner (for the 4th year in a row)
- Named a Leader in The Forrester Wave™: Enterprise Data Fabric (for the 2nd time in a row)
- Ranked in the DBTA “100 Companies that Matter Most in Data”
We are passionate about helping companies become more data driven; and, if we can be honest, we are all geeks at heart who pride ourselves on the vibrant company culture that we have built.
As a global employer, Talend believes our success depends on diversity, inclusion and mutual respect among our team members(e)We want to look like our customers, and we recruit, develop and retain the most hardworking people from a diverse candidate pool(e)We are committed to making all employment decisions on the basis of business need, merit, capability and equality of opportunity(e)All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment(e)Please contact us via email to request an accommodation at accommodation@talend.com(e)We appreciate your interest in Talend.
Notice to Recruiters and Staffing Agencies
Talend SAS and its affiliates ("Talend") have an internal recruiting, or talent acquisition, department (“TA”)(e)Talend may supplement this internal capability from time to time with assistance from temporary staffing agencies, placement services, and professional recruiters ("Agency")(e)Agencies are hereby specifically directed NOT to contact Talend employees directly in an attempt to present candidates(e)To protect the interests of all parties, Talend will not accept unsolicited resumes from any source other than directly from a candidate(e)Any unsolicited resumes sent to Talend, including unsolicited resumes sent to a Talend mailing address, fax machine, email address or any other means, directly to Talend employees, or to Talend's resume database will be considered Talend property and therefore will NOT be liable for any placement resulting from the receipt of an unsolicited resume.
Agency agreements will only be valid if in writing and signed by an officer of Talend or his or her designee(e)No other Talend employee is authorized to bind Talend to any agreement regarding the placement of candidates by Agencies(e)Talend hereby specifically rejects, and denies any liability under, any agreement purporting to be accepted based on negative consent, negotiation with a candidate, performance, or any means other than the signature of a Talend officer.