## Legacy Data Migration Guide

### Overview

The legacy `master_employers` collection combines employer authentication and company data in a single document. This migration splits the data into the new normalized schema: `users` and `companies`.

**Migration Scope**: 
- Only migrate records where `posted_type = 1` (direct employers)
- Only migrate non-deleted records (`is_deleted != 1`)
- Records with `posted_type != 1` (ATS, Data entry) are excluded

**ID Strategy**:
- `companies.id` → Use existing `master_employers.id` (preserve for backward compatibility)
- `users.id` → Generate new auto-increment ID (separate sequence)
- After migration, update `companies` sequence to max(id) + 1 for future inserts

**Migration Strategy**: 
1. Keep `master_employers` collection intact during migration
2. Run migration script to populate the new `users` and `companies` collections
3. Validate data integrity
4. Deploy application with new schema
5. Dispose legacy collection after successful validation (30 days)

**Prerequisites**:
- New collections (`users`, `companies`) already exist with proper indexes
- Application deployed with new schema
- Backup of `master_employers` collection completed

---

### Field Mapping: `master_employers` → New Collections

#### → `companies` Collection

| Legacy Field | New Field | Transformation |
|--------------|-----------|----------------|
| `id` | `id` | Direct copy |
| `company_name` | `name` | Direct copy |
| - | `name_ar` | Set to `null` (not in legacy) |
| `company_name` | `slug` | Generate: `slugify(company_name + "-" + id)` for uniqueness |
| `logo_pic` | `logo_url` | Prepend S3 base URL if path only |
| `company_logo` | `cover_image_url` | Prepend S3 base URL if path only |
| `website` | `website` | Direct copy |
| `industry_id` | `industry` | Map ID to industry name |
| `no_of_employee` | `employee_count` | Parse to integer (fallback: `number_of_emp`) |
| `founded_year` | `founded_year` | Parse to integer |
| `description` | `description` | Direct copy |
| - | `description_ar` | Set to `null` (not in legacy) |
| `country_id` | `country_id` | Direct copy |
| `city_id` | `city_id` | Direct copy |
| `reffered_by` | `referral_source` | Direct copy (fallback: `referrer`) |
| `utm_source` | `referred_by` | Direct copy |
| `status` | `status` | Map: `1` → `"active"`, `0` → `"inactive"` |
| `emailverified` | `is_verified` | Map: `1` → `true`, `0` → `false` |
| `registered_date` | `verified_at` | Copy if `emailverified = 1`, else `null` (Note: using registration date as proxy) |
| `created_at` | `created_at` | Direct copy |
| `updated_at` | `updated_at` | Direct copy |

#### → `users` Collection

| Legacy Field | New Field | Transformation |
|--------------|-----------|----------------|
| - | `id` | Generate new auto-increment ID |
| `id` | `company_id` | Use legacy `id` as company reference |
| - | `role_id` | Set to `1` (Owner role) |
| - | `invited_by_id` | Set to `null` (owner, not invited) |
| `email` | `email` | Direct copy |
| `password` | `password_hash` | Direct copy (already hashed) |
| `mobile_number` | `phone` | Direct copy |
| `country_code` | `phone_country_code` | Map numeric code to string (e.g., `1` → `"+1"`, `971` → `"+971"`) |
| `employer_name` | `first_name` | First word of `employer_name` |
| `employer_name` | `last_name` | Remaining words of `employer_name` |
| - | `avatar_url` | Set to `null` (not in legacy) |
| `employer_designation` | `designation` | Direct copy |
| - | `employee_code` | Set to `null` (not in legacy) |
| - | `is_owner` | Set to `true` |
| - | `is_deleted` | Set to `false` (only non-deleted records migrated) |
| `status` | `status` | Map: `1` → `"active"`, `0` → `"inactive"` |
| `emailverified` | `is_email_verified` | Map: `1` → `true`, `0` → `false` (fallback: `verify_email`) |
| - | `has_2fa_enabled` | Set to `false` (not in legacy) |
| - | `two_fa_secret` | Set to `null` (not in legacy) |
| `registered_date` | `joined_at` | Direct copy |
| `last_login_date` | `last_login_at` | Parse to ISODate |
| `created_at` | `created_at` | Direct copy |
| `updated_at` | `updated_at` | Direct copy |

---

### Unused/Deprecated Fields

These fields from `master_employers` are **not migrated** (mark for review):

| Field | Reason |
|-------|--------|
| `approved_by_id` | Admin workflow - not needed in new schema |
| `approved_date` | Admin workflow - not needed |
| `case_sensitive` | Unknown purpose |
| `channel_id` | Legacy integration |
| `classification` | Always null |
| `com_class_type_id` | Legacy classification |
| `com_type_id` | Legacy company type |
| `company_no` | Duplicate of `mobile_number` |
| `contact_name` | Always null |
| `designation` | Always null (use `employer_designation`) |
| `dev_password` | Development only - security risk |
| `domain` | Can derive from email |
| `email_copy` | Duplicate/backup email |
| `emailverified_through` | Verification method tracking |
| `establishment_date` | Always null |
| `feedback_flag` | Legacy feature |
| `functional_id` | Legacy field |
| `identifier` | Legacy field |
| `is_updated` | Internal flag |
| `job_post_limitation` | Move to subscription features |
| `job_source` | Legacy tracking |
| `landline_number` | Rarely used |
| `location` | Redundant with city_id |
| `mailLanguage` | Move to user preferences |
| `notify_count` | Computed field |
| `old_password` | Security risk - do not migrate |
| `posted_type` | Legacy field |
| `profile_complete` | Computed field |
| `profile_completion` | Computed field |
| `profile_type` | Legacy field |
| `profile_updated_date` | Use `updated_at` |
| `redirect_preference` | Legacy UI preference |
| `remember_token` | Session token - regenerate |
| `sector_id` | Legacy - use `industry_id` |
| `set_password` | Internal flag |
| `source` | Legacy tracking |
| `state_id` | Always null |
| `total_app_cand` | Computed field |
| `total_followers` | Computed field |
| `total_invitations` | Computed field |
| `total_pos_jobs` | Computed field |
| `type` | Always "organisation" |
| `utm_campaign` | Move to analytics |
| `utm_medium` | Move to analytics |
| `valid_domain` | Internal validation flag |

---

### Migration Script (Pseudo-code)

```javascript
// Migration: master_employers → companies + users
// Only migrate direct employers (posted_type = 1) that are not deleted

var userIdCounter = 1; // Start user ID sequence
var slugSet = new Set(); // Track used slugs for uniqueness
var emailSet = new Set(); // Track used emails for uniqueness
var skippedDuplicateEmails = [];

db.master_employers.find({ 
  is_deleted: { $ne: 1 },
  posted_type: 1  // Direct employers only
}).forEach(function(emp) {
  
  // Skip duplicate emails (unique constraint)
  if (emp.email && emailSet.has(emp.email.toLowerCase())) {
    skippedDuplicateEmails.push({ id: emp.id, email: emp.email });
    return; // Skip this record
  }
  if (emp.email) {
    emailSet.add(emp.email.toLowerCase());
  }
  
  // Generate unique slug
  var baseSlug = emp.company_name ? slugify(emp.company_name + "-" + emp.id) : "company-" + emp.id;
  var slug = baseSlug;
  var slugCounter = 1;
  while (slugSet.has(slug)) {
    slug = baseSlug + "-" + slugCounter++;
  }
  slugSet.add(slug);
  
  // 1. Insert into companies collection (use existing id)
  var company = {
    _id: new ObjectId(),
    id: emp.id,                       // Keep existing ID for company
    name: emp.company_name || null,
    name_ar: null,                    // Not in legacy
    slug: slug,
    logo_url: formatImageUrl(emp.logo_pic),
    cover_image_url: formatImageUrl(emp.company_logo),
    website: emp.website || null,
    industry: emp.industry_id ? mapIndustryId(emp.industry_id) : null,
    employee_count: parseInt(emp.no_of_employee || emp.number_of_emp) || null,
    founded_year: parseInt(emp.founded_year) || null,
    description: emp.description || null,
    description_ar: null,             // Not in legacy
    country_id: emp.country_id || null,
    city_id: emp.city_id || null,
    referral_source: emp.reffered_by || emp.referrer || null,
    referred_by: emp.utm_source || null,
    status: emp.status === 1 ? "active" : "inactive",
    is_verified: emp.emailverified === 1,
    verified_at: emp.emailverified === 1 ? parseDate(emp.registered_date) : null,
    created_at: parseDate(emp.created_at) || new Date(),
    updated_at: parseDate(emp.updated_at) || new Date()
  };
  
  db.companies.insertOne(company);
  
  // 2. Insert into users collection (generate new id)
  var nameParts = (emp.employer_name || "").trim().split(" ");
  var user = {
    _id: new ObjectId(),
    id: userIdCounter++,              // New auto-increment ID for users
    company_id: emp.id,               // Reference to company (existing ID)
    role_id: 1,                       // Owner role (fixed)
    invited_by_id: null,              // Not in legacy (owner not invited)
    email: emp.email || null,
    password_hash: emp.password || null,
    phone: emp.mobile_number || null,
    phone_country_code: mapCountryCode(emp.country_code),
    first_name: nameParts[0] || null,
    last_name: nameParts.length > 1 ? nameParts.slice(1).join(" ") : null,
    avatar_url: null,                 // Not in legacy
    designation: emp.employer_designation || null,
    employee_code: null,              // Not in legacy
    is_owner: true,                   // Fixed for migration
    is_deleted: false,                // Only non-deleted records migrated
    status: emp.status === 1 ? "active" : "inactive",
    is_email_verified: (emp.emailverified === 1) || (emp.verify_email === 1),
    has_2fa_enabled: false,           // Not in legacy
    two_fa_secret: null,              // Not in legacy
    joined_at: parseDate(emp.registered_date),
    last_login_at: parseDate(emp.last_login_date),
    created_at: parseDate(emp.created_at) || new Date(),
    updated_at: parseDate(emp.updated_at) || new Date()
  };
  
  db.users.insertOne(user);
});

// Update the sequence counters for future inserts
db.sequences.updateOne(
  { _id: "users" },
  { $set: { seq: userIdCounter } },
  { upsert: true }
);

// Update companies sequence to max(id) + 1
var maxCompanyId = db.companies.find().sort({ id: -1 }).limit(1).toArray()[0];
if (maxCompanyId) {
  db.sequences.updateOne(
    { _id: "companies" },
    { $set: { seq: maxCompanyId.id + 1 } },
    { upsert: true }
  );
}

// Log skipped records
if (skippedDuplicateEmails.length > 0) {
  print("WARNING: Skipped " + skippedDuplicateEmails.length + " records due to duplicate emails:");
  skippedDuplicateEmails.forEach(function(r) {
    print("  - ID: " + r.id + ", Email: " + r.email);
  });
}

// Helper functions
function slugify(text) {
  if (!text) return null;
  return text.toLowerCase()
    .replace(/[^a-z0-9]+/g, '-')
    .replace(/^-|-$/g, '');
}

function mapCountryCode(code) {
  if (!code) return null;
  // If already a string with +, return as-is
  if (typeof code === 'string' && code.startsWith('+')) return code;
  // Map numeric codes
  return "+" + String(code);
}

function mapIndustryId(id) {
  // Map legacy industry IDs to industry names
  var industries = { 
    14: "Automotive",
    // Add all industry mappings here
  };
  return industries[id] || null;
}

function parseDate(dateStr) {
  if (!dateStr) return null;
  var d = new Date(dateStr);
  return isNaN(d.getTime()) ? null : d;
}

function formatImageUrl(path) {
  if (!path) return null;
  // If already a full URL, return as-is
  if (path.startsWith('http://') || path.startsWith('https://')) return path;
  // Otherwise prepend S3 base URL
  return S3_BASE_URL + path;
}
```

---

### Migration Checklist

- [ ] Backup `master_employers` collection
- [ ] Verify new collections exist with proper indexes
- [ ] Check for duplicate emails in legacy data (run validation query)
- [ ] Define S3_BASE_URL constant for image URLs
- [ ] Complete industry ID to name mapping
- [ ] Run migration script in staging environment
- [ ] Review skipped records (duplicate emails) and handle manually if needed
- [ ] Validate record counts match (accounting for skipped duplicates)
- [ ] Validate data integrity (spot check 100 records)
- [ ] Verify all dates are ISODate format
- [ ] Verify user sequence is set correctly
- [ ] Test application with new schema
- [ ] Run migration in production
- [ ] Monitor for errors (24-48 hours)
- [ ] Archive `master_employers` collection
- [ ] Remove legacy collection after 30 days

---

### Post-Migration Validation Queries

```javascript
// Verify record counts (only direct employers, excluding deleted)
var legacyCount = db.master_employers.countDocuments({ 
  is_deleted: { $ne: 1 },
  posted_type: 1
});

// Count migrated companies (those with IDs from legacy)
var companyCount = db.companies.countDocuments({
  id: { $in: db.master_employers.distinct("id", { is_deleted: { $ne: 1 }, posted_type: 1 }) }
});

// Count migrated users (owners only, linked to migrated companies)
var userCount = db.users.countDocuments({ 
  is_owner: true,
  company_id: { $in: db.master_employers.distinct("id", { is_deleted: { $ne: 1 }, posted_type: 1 }) }
});

print("Legacy (direct employers, non-deleted): " + legacyCount);
print("Migrated Companies: " + companyCount);
print("Migrated Users (owners): " + userCount);
print("Companies Match: " + (legacyCount === companyCount));
print("Users Match: " + (legacyCount === userCount));

// Check for duplicate email skips
var duplicateEmails = db.master_employers.aggregate([
  { $match: { is_deleted: { $ne: 1 }, posted_type: 1, email: { $ne: null } } },
  { $group: { _id: { $toLower: "$email" }, count: { $sum: 1 }, ids: { $push: "$id" } } },
  { $match: { count: { $gt: 1 } } }
]).toArray();

if (duplicateEmails.length > 0) {
  print("\nWARNING: Found " + duplicateEmails.length + " duplicate emails in legacy data:");
  duplicateEmails.forEach(function(d) {
    print("  - Email: " + d._id + ", Count: " + d.count + ", IDs: " + d.ids.join(", "));
  });
}

// Verify data integrity (sample)
db.master_employers.aggregate([
  { $match: { is_deleted: { $ne: 1 }, posted_type: 1 } },
  { $sample: { size: 10 } },
  { $lookup: { from: "companies", localField: "id", foreignField: "id", as: "company" } },
  { $lookup: { from: "users", localField: "id", foreignField: "company_id", as: "user" } },
  { $project: {
    legacy_id: "$id",
    legacy_email: "$email",
    new_user_email: { $arrayElemAt: ["$user.email", 0] },
    new_user_id: { $arrayElemAt: ["$user.id", 0] },
    legacy_company: "$company_name",
    new_company: { $arrayElemAt: ["$company.name", 0] },
    email_match: { $eq: ["$email", { $arrayElemAt: ["$user.email", 0] }] },
    company_match: { $eq: ["$company_name", { $arrayElemAt: ["$company.name", 0] }] }
  }}
]);

// Verify user sequence is set correctly
var maxUserId = db.users.find().sort({ id: -1 }).limit(1).toArray()[0];
var userSequence = db.sequences.findOne({ _id: "users" });
print("\nUser Sequence Validation:");
print("  Max User ID: " + (maxUserId ? maxUserId.id : "N/A"));
print("  Sequence Value: " + (userSequence ? userSequence.seq : "N/A"));
print("  Valid: " + (userSequence && maxUserId && userSequence.seq > maxUserId.id));

// Verify company sequence is set correctly
var maxCompanyId = db.companies.find().sort({ id: -1 }).limit(1).toArray()[0];
var companySequence = db.sequences.findOne({ _id: "companies" });
print("\nCompany Sequence Validation:");
print("  Max Company ID: " + (maxCompanyId ? maxCompanyId.id : "N/A"));
print("  Sequence Value: " + (companySequence ? companySequence.seq : "N/A"));
print("  Valid: " + (companySequence && maxCompanyId && companySequence.seq > maxCompanyId.id));

// Verify all dates are ISODate
var invalidDates = db.users.find({
  $or: [
    { created_at: { $type: "string" } },
    { updated_at: { $type: "string" } },
    { joined_at: { $type: "string" } },
    { last_login_at: { $type: "string" } }
  ]
}).count();
print("\nDate Validation:");
print("  Users with string dates: " + invalidDates);
```

---
